Some of you might have heard about Google’s Android team proposal to introduce wakelocks (aka suspend-blockers) to the Linux kernel. While there was a real issue being solved in the kernel side, the benefits on the user-space side were dubious at best, and after a huge discussion, they finally didn’t get in.
During this discussions the dynamic and static power management were described and discussed at length, and there was a bit of talk about Maemo(MeeGo) vs Android approaches to power management, but there was so much more than that.
Some people have problems with the battery on Android devices, for some people it’s just fine, some people have problems with the Maemo, other don’t, so in general; your mileage might vary. But given the extremely different approaches, it’s easy to see in which cases you would have better luck with Maemo, and in which with Android–Although I do think it’s obvious which approach is superior, but I am biased.
An interesting achievement was shared by Thiago Maciera, who managed to get ‘5 days and a couple of minutes‘ out of the Nokia N9 while traveling, and actually using it–and let’s remember this is a 1450 mAh battery. Some Android users might have trouble believing this, but hopefully this blog post would explain what’s going on.
So lets go ahead and explore the two approaches.
Dynamic Power Management
Perhaps the simplest way to imagine dynamic power management, is the gears of a manual transmission car. You go up and down depending on how much power does the system actually needs.
In some hardware, such as OMAP chips, it’s possible to have quite a lot of fine control on the power states of quite a lot of devices, plus different operating power points on the different cores. For example, it might be possible to have some devices, such as the display on, and active, some other devices partially off, like speaker, and other completely off, like USB. And based on the status of the whole system, whole blocks can be powered off, other with low voltage levels, etc.
Linux has a framework to deal properly with this kind of hardware, the runtime power management, that originally came from the embedded world, and a lot from OMAP development, but is now available to everyone.
The idea is very simple; devices should sleep as much as possible. This means that if you have a sound device that needs chunks of 100ms, and the system is not doing anything else but playing sound, then most of the devices go to sleep, even the CPU, and the CPU is only waken up when it needs to write data for the audio device. Even better is to configure the sound device for chunks of 1 second, so the system can sleep even more.
Obviously, some co-operation between kernel and user-space is needed. Say, if you have an instant messenger program that needs to do work every minute, and a mail program that is configured to check mail every 10 minutes, you would want them to do work at the same time when they align at every 10 minutes. This is sometimes called IP heartbeat; the system wakes up for a beat, and then immediately goes back to sleep. And there are kernel facilities as well, such as range timers.
All this is possible thanks to very small latencies required for devices to go to sleep and wakeup, and have intermediary modes (e.g. on, inactive, retention, off), so, for example a device might be able to go to inactive mode in 1ms, retention in 2ms, and off in 5ms (totally invented numbers). Again, the more sleep, the better. Obviously, this is impossible on x86 chips, which have huge latencies–at least right now, and it’s something Intel is probably trying to improve effusively. All these latencies are known by the runtime pm framework in the kernel, and based on that it and the usage, it figures out what is the lowest power state possible without breaking things.
Note I’m not a power management expert, but you cant watch a colleague of mine explain the OMAP 3 power-managment on this video:
And there’s plenty of
Update: That was the wrong link, here are the resources.
Static Power Management
Static power management has two modes: on and off. That’s it.
OK, that’s not exactly the case in general, but it is in the Android context; the system is either suspended, or active, and it’s easy to know in which mode you are; if the screen is on, it’s active, and if it’s off; it’ is suspended (ideally).
There’s really not much more than that. The complexity comes from the problem of when to suspend; you might have turned off the display, but there might be a system service that still needs to do work, so this service grabs a suspend blocker which, as the name suggests, prevents the system from suspending (until the lock is released). This introduces a problem; a rouge program might grab a ‘suspend blocker’ and never release it, which means your phone will never suspend, and thus the battery would drain rather quickly. So, some security mechanisms are introduced to grant permissions selectively to use suspend blockers.
And moreover, Android developers found race conditions in the suspend sequences in certain situations that were rather nasty (e.g. the system tries to suspend at the same time the user clicks a button, and the system never wakes up again), and these were acknowledged as real issues that would happen on all systems (including PC’s and servers, albeit rarely, because they don’t suspend so often), and got fixed (or at least they tried).
First of all, it’s important to know that if you have dynamic pm working perfectly, you reach exactly the same voltage usage than static pm, so in ideal cases they both behave exactly the same.
The problem is that it’s really hard for dynamic pm to reach that ideal case, in reality systems are able to sleep for certain periods of time, after which they are woken up, often times unnecessarily, and as I already explained; that’s not good. So the goal of a dynamic pm system is to increase those periods of time as much as possible, thus maximizing battery life. But there’s a point of diminished returns (read this or this for expert explanations), so, if the system manages to sleep 1s in average, there’s really not much more to gain if it sleeps 2s, or even 10s. These goals were quite difficult to achieve in the past (not these, I invented those numbers), but not so much any more thanks to several mechanisms that have been introduced and implemented through the years. So it’s fair to say that the sweet spot of dynamic pm has been achieved.
This means that today a system that has been fine-tuned for dynamic pm can reach reach a decent battery life compared to one that uses static pm in most circumstances. But for some use-cases, say, you leave your phone on your desk and you don’t use it at all, static pm would allow it to stay alive for weeks, or even months, while dynamic pm can’t possibly achieve that any time soon. Hopefully you would agree, that nobody cares about those use-cases were you are not actually using the device.
And of course, you only need one application behaving badly and waking up the system constantly, and the battery life is screwed. So in essence, dynamic pm is hard to achieve.
Android developers argued that was one of the main reasons to go for static pm; it’s easier to achieve, specially if you want to support a lot of third party applications (Android Market) without compromising battery life. While this makes sense, I wasn’t convinced by this argument; you still can have one application that is behaving badly (grabbing suspend blockers the whole time), and while permissions should help, the application might still request the permission, and the user grant it (who reads incomprehensible warnings before clicking ‘Yes’ anyway?).
So, both can get screwed by bad apps (although it’s likely that it’s harder in the static pm case, albeit not that much).
But actually, you can have both static and dynamic power management, and in fact, Android does. But that doesn’t mean Android automatically wins, as I explained, the system needs to be fine-tuned for dynamic pm, and that has never been a focus of Android (there’s no API’s or frameworks for that, etc.). So, for example, a Nokia N9 phone might be able to sleep 1s in average, while an Android phone 100ms (when not suspended). This means when you are actually using the device (the screen is on), chances are, a system fine-tuned for dynamic pm (Nokia N9) would consume less battery, than an Android device.
That is the main difference between the two. tl;dr: dynamic pm is better for active usage.
So, if Android developers want to improve the battery usage while on active usage (which I assume is what the users want), they need to fine-tune the system for dynamic pm, and thus sleep as much as possible, hopefully reaching the sweet spot. But wait a second… If Android is using dynamic pm anyway, and they tune the system to the point of diminishing returns; there is not need for static pm. Right? Well, that’s my thinking, but I didn’t manage to make Android developers realize that in the discussion.
Plus, there’s a bunch of other reasons while static pm is not palatable for non-Android systems (aka. typical Linux systems), but I won’t go into those details.
Nokia’s bet was on dynamic, Google’s bet was on static, and in the end we agreed to disagree, but I think it’s clear from the outcome in real-world situations who was right–N9 users experiencing more than one day of normal usage, and even more than two. Sadly, only Nokia N9 users would manage to experience dynamic pm in full glory (at the moment).
But not all is lost, and this in my opinion is the most important aspect. Dynamic pm lives on the Linux kernel mainline through the runtime power management API. This is not a Nokia invention that will die with the Nokia N9; it’s a collaborative effort where Nokia, TI, and other companies worked together, and now not only benefits OMAP, but other embedded chips, and even non-embedded ones. Being upstream means it’s good, and it has been blessed by many parties, and has gone through many iterations, and finally it probably doesn’t look like the original OMAP SRF code at all. Sooner or later your phone (if you don’t have an N9) will benefit from this effort, and it might even be an Android phone, your netbook (if not already benefiting in some way), and even your PC.
Android’s suspend blockers are not upstream, and it’s quite unlikely that they will ever be, or that you would see them used in any system other than Android, and there’s good reasons for that. Matthew Garrett did an excellent job of summarizing what went wrong with the story of suspend blockers on his presentation ‘Android/Linux kernel: Lessons learned’, but unfortunately the Linux foundation seems to be doing a poor job of providing those videos (I cannot watch them any more, even though I registered
, and they haven’t been helpful through email conversations).
Update: I managed to get the video from the Linux Foundation and pushed it to YouTube:
Here is part of the discussion on LKML, if you want to read it for some strange reason. WARNING; it’s huge.