gst-av 0.4; better performance for flac, vorbis and mp3 (part 2)

This is a continuation of my previous post. Based on the feedback I decided to do two things; investigate the strange FLAC high CPU usage with FFmpeg, and get more accurate measurements.

GStreamer sucks

It turns out that GStreamer flac parser uses four times more CPU than FFmpeg’s decoder. Thanks to perf, I was able to quickly figure out the biggest offenders: GStreamer’s horrible bitstream reader (GST_BIT_READER_READ_BITS) was by far the worst.

53.03% libgstbase-0.10.so.0.26.0
24.78% libavcodec.so.52.72.2
17.35% libgstxiph.so
1.52% libc-2.12.1.so

This is on my laptop just running the parser (filesrc ! flacparse ! fakesink), in total it was taking 2.67s.

After reading the code and trying different things, I decided to go for something similar to what FFmpeg is doing, and I also borrowed pieces of the architecture-specific optimizations, now it even looks ok:

72.68% libavcodec.so.52.72.2
14.20% libgstxiph.so
4.00% libc-2.12.1.so

And it takes 0.81s.

But how much would this affect battery life on the N900?

Smart battery script

I tried different ideas, and after refreshing myself on statistics I wrote this script in Ruby that runs all the tests, gathers the battery capacity in a separate thread, and finally generates a report per test. Much easier than before.

Since I’m already working on FLAC, I decided to also apply some patches that split the decoder from the parser, and optimizations from Måns Rullgård (good thing I grabbed them because he seems to have left the project and deleted his repos).

Battery life graph

Battery life


Battery drain graph

Battery drain

So, yeah, much better now ;)

But how credible are these results? Well, judge by yourself, listed below are the raw measurements, the samples are the differences in capacity (mAh) measured each 10 minutes, from which the drain and battery life are calculated.

== baseline ==
samples: 3, 3, 3, 3, 4, 5
drain: 21.00±1.87mA
life: 65.39±4.77h
== av flac ==
samples: 9, 8, 8, 8, 7, 8, 7
drain: 47.14±1.45mA
life: 28.19±0.87h
== flac ==
samples: 11, 11, 11, 11, 11, 11
drain: 66.00±0.00mA
life: 20.00±0.00h
== av mp3 ==
samples: 11, 11, 11, 11, 11, 10
drain: 65.00±0.91mA
life: 20.33±0.30h
== nokiamp3 ==
samples: 12, 12, 12, 12, 12, 12
drain: 72.00±0.00mA
life: 18.33±0.00h
== av vorbis ==
samples: 10, 11, 11, 10, 11, 11
drain: 64.00±1.15mA
life: 20.67±0.38h
== vorbis ==
samples: 19, 18, 18, 19, 18, 19
drain: 111.00±1.22mA
life: 11.90±0.13h

If you are interested in the code: gst-av, gst-maemo-xiph. Enjoy ;)

gst-av 0.3; better performance for vorbis and mp3

So, I’ve been working on gst-av, a GStreamer plug-in to use FFmpeg codecs (only audio for now), in order to get it in good shape for ogg support. First, I had to fix oggdemux and flacparse to be compatible with tagreadbin, it seems I managed to do it (with the help of a patch from Sreerenj Balachandran), so now the custom tracker extractors are not needed any more.

Then, with a bit of work I managed to get not only vorbis, but flac, and mp3 working.

That was good, but was it really worth it? Tuomas Kulve did a nice comparison of gst-av vs the default vorbisdec, and I wanted to do something similar, however, running a series of tests each taking 20 hours to complete wasn’t so appealing.

So I asked in #meego and #maemo IRC channels for a simple way to measure battery drain reliably, and automatically. It seems powertop can do that on some platforms, but Maemo’s powertop is a very different beast. Fortunately, the folks at #maemo seem to have been busy trying to get all possible information from the battery, and they pointed me to a very nice powerscript. However, I got some tips to get even better results (from ShadowJK, DocScrutinizer, and SpeedEvil), and the result is this maemo-battery script (needs i2c-tools, and root permissions), which essentially prints the current charge of the battery each 10 minutes.

With this I was ready, but just to be clear how to properly measure battery draw; make sure you are in offline mode, plug your headphones (otherwise pulse-audio would run extra algorithms), and immediately blank the screen.

Here are the results (units in hours of battery life):

These results show that vorbis with FFmpeg is massively better than libvorbis, so my work wasn’t in vain :). But it’s also interesting that FFmpeg’s mp3 decoder is slightly better than Nokia’s proprietary one. Also, FFmpeg still needs some work to complete with libflac. My guess is that these decoders can’t be optimized much further; now the bottlenecks would have to be pulseaudio and gstreamer.

This is the raw data (in mA); I ran my script for one hour for each test, and some I ran multiple times just to verify; the results seem to vary ±1 mA.

current -- mp3: 63, vorbis: 110, flac: 62
gst-av -- mp3: 61, vorbis: 62, flac: 69

gst-ffmpeg

Why not use gst-ffmpeg? You might ask. Initially that’s what I tried, but it doesn’t support vorbis, nor flac, which seems to fit GStreamer’s tradition of getting away from FFmpeg as much as possible. Then when I read the code it was clear to me that it was overly complicated; I’m familiar with FFmpeg’s API (it’s unbelievably simple), so I decided to play around, and see if I could get something working; I did, and the result was incredibly simple, and oh so sweet :) As a comparison, gst-ffmpeg is 16357 lines of code, gst-av is 563 (sure, gst-av does much less; just what is needed). Another reason that goes hand-in-hand with this, is the ability to tweak it; my goal is to get the absolutely best performance, and for that I want to be able to understand what the code is doing. And finally, gst-ffmpeg is using deprecated API.

What about performance?

The difference is not that big: ~1.6h of battery life, but it’s something.

current: 63, gst-av: 61, gst-ffmpeg: 66

What now?

Now we need to package FFmpeg; probably just include the codecs we need, and then ogg support might include these instead. Any volunteers?

Update

It turns out the issue was flacparse which is total crap: it’s using 4 times more CPU time than FFmpeg’s decoder just for parsing. After fixing it now it takes only 20%. I’m trying to get new measurements in a more automated and precise way now. I’ve pushed the code to my repo already.