gst-av 0.6 released; more reliable

gst-av is a GStreamer plug-in to provide support for libav (fork of FFmpeg), it is similar to gst-ffmpeg, but without GStreamer politics, which means all libav plugins are supported, even if there are native GStreamer alternatives; VP8, MP3, Ogg, Vorbis, AAC, etc.

This release takes care of a few corner-cases, and has support for more versions of FFmpeg.

Here are the goods:
http://code.google.com/p/gst-av/downloads/list

And here’s the short-log:

Felipe Contreras (19):
      adec: flush buffer on EOS
      adec: improve timestamp reset
      adec: avoid deprecated av_get_bits_per_sample_fmt()
      adec: avoid FF_API_GET_BITS_PER_SAMPLE_FMT
      vdec: properly initialize input buffer
      parse: add more H.264 parsing checks
      parse: fix H.264 parsing for bitstream format
      get_bits: add show_bits function
      build: set runpath for libav
      vdec: fix potential leaks
      vdec: use libav pts stuff
      vdec: get delayed pictures on eos
      build: trivial improvements
      parse: trivial fix
      h264enc: fix static function
      vdec: add support for old reordered_opaque
      adec: add support for old sample_fmt
      adec: add support for really old bps()
      adec: add support for all MPEG-1 audio

Mark Nauwelaerts (1):
      parse: be less picky regarding some reserved value

Scrobbler for Maemo, now both on N900, and N9

Version 2.0 finally moved to Fremantle stable, so everybody can start using it 🙂

If you are not familiar with it, this package will see what music you are listening on Maemo devices, and scrobble to your favorite service, either last.fm, libre.fm, or both.

I already explained the features in an earlier blog entry, along with an explanation of how to make use of the “love” feature.

But now I also managed to port this to Harmattan, and it works perfectly on my Nokia N9. Interestingly enough, the new UI has a “favorite” feature directly integrated, it took me some time, as it’s not publicly documented, but I finally managed to hook into it, so everything works seamlessly 🙂

I was rather impressed by how easy it was to port it, I was able to leave all the GLib bits intact, even libsoup is still supported, and libconio, so I only had to make changes regarding the new qmafw. Thanks to the Qt guys for using the GLib’s mainloop by default, it certainly made things easier for me 🙂

Update

You can find a debian package here: maemo-scrobbler 2.0-2.

Then, you would need to create a file ~/.config/scrobbler like this:

[lastfm]
username=foo
password=bar
            
[librefm]
username=foo
password=bar

That’s it 🙂

gst-av 0.5 released; now with video encoding and decoding support

gst-av is a GStreamer plug-in to provide support for libav (formerly FFmpeg), it is similar to gst-ffmpeg, but without GStreamer politics, which means all libav plugins are supported, even if there are native GStreamer alternatives; VP8, MP3, Ogg, Vorbis, AAC, etc.

In addition, it is much simpler (2654 vs 16575 LOC), has better performance, and has a bit of extra features (such as less latency), and doesn’t use deprecated API’s. In a previous post I measured exactly how much improvement compared to gst-ffmpeg there is; it’s not much, but it’s some.

IOW; it’s possible that gst-av is the only GStreamer codec plug-in you would ever need 🙂
Continue reading

My ARM development notes

These are my notes to get useful cross-compilation, even with autotools, and GStreamer stuff.

toolchain

The convention is to have ‘arm-linux-gcc‘ and so on, so that you can compile with ‘make CROSS_COMPILE=arm-linux-‘, the kernel and many other projects assume this is the default.

First, you would need ‘~/bin‘ to be on your path, so make sure you have it on ‘~/.bash_profile‘ (export PATH="$HOME/bin:$PATH") or whatever your favorite shell uses.

I use CodeSourcery (GNU/Linux 2009q3), you can fetch it from here.

cd ~/bin
toolchain=/opt/arm-2009q3
for x in $toolchain/bin/arm-none-linux-gnueabi-*
do
ln -s $x arm-linux-${x#$toolchain/bin/arm-none-linux-gnueabi-}
done

QEMU

This is needed for sb2 in order to kind of emulate an ARM system.

git clone git://git.savannah.nongnu.org/qemu.git
cd qemu
git checkout -b stable v0.12.5
./configure --prefix=/opt/qemu --target-list=arm-linux-user
make install

sbox2

This is needed to avoid most of the pain caused by autotools (thank you GNU… not!).

git clone git://gitorious.org/scratchbox2/scratchbox2.git
cd scratchbox2
git checkout -b stable 2.1
./autogen.sh --prefix=/opt/sb2
make install

Add sb2 to the PATH:
export PATH=/opt/sb2/bin:$PATH

sb2 target

Now it’s time to configure a target.

cd /opt/arm-2009q3/arm-none-linux-gnueabi/libc/
sb2-init -c /opt/qemu/bin/qemu-arm armv7 /opt/arm-2009q3/bin/arm-none-linux-gnueabi-gcc

You can check that it works with:
sb2 gcc --version

GStreamer

We are going to install everything into ‘/opt/arm/gst‘, so:

export PKG_CONFIG_PATH=/opt/arm/gst/lib/pkgconfig

You can skip the steps here and go directly to deployment if you download and extract this tarball on your target.

zlib

This is needed by GLib’s gio (which cannot be configured out).

wget -c http://zlib.net/zlib-1.2.5.tar.gz
tar -xf zlib-1.2.5.tar.gz
cd zlib-1.2.5
sb2 ./configure --prefix=/opt/arm/gst
sb2 make install

glib

GLib has bugs (623473, 630910) detecting zlib (thank you Mattias… not!). So either apply my patches, or do the C_INCLUDE_PATH/LDFLAGS hacks below:

export C_INCLUDE_PATH='/opt/arm/gst/include' LDFLAGS='-L/opt/arm/gst/lib'

git clone git://git.gnome.org/glib
cd glib
git checkout -b stable 2.24.1
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-static --with-html-dir=/tmp/dump
sb2 make install

gstreamer

git clone git://anongit.freedesktop.org/gstreamer/gstreamer
cd gstreamer
git checkout -b stable RELEASE-0.10.29
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --disable-loadsave --with-html-dir=/tmp/dump
sb2 make install

liboil

Needed by many GStreamer components.

git clone git://anongit.freedesktop.org/liboil
cd liboil
git checkout -b stable liboil-0.3.17
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-static --with-html-dir=/tmp/dump
sb2 make install

gst-plugins-base

git clone git://anongit.freedesktop.org/gstreamer/gst-plugins-base
cd gst-plugins-base
git checkout -b stable RELEASE-0.10.29
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --with-html-dir=/tmp/dump
sb2 make install

gst-plugins-good

git clone git://anongit.freedesktop.org/gstreamer/gst-plugins-good
cd gst-plugins-good
git checkout -b stable RELEASE-0.10.23
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --with-html-dir=/tmp/dump
sb2 make install

Deployment

So now we have everything installed in ‘/opt/arm/gst‘, but how to run on the target? Just copy the exact same files into the target on the exact same location, and then:

export PATH=/opt/arm/gst/bin:$PATH

That’s it, you can run gst-launch, gst-inspect, and so on.

Development

Ok, it should be clear how to do development from the previous steps, but in case it wasn’t clear, here’s how to:

gst-dsp

Each time you want to cross-compile, you need to tell pkg-config where to find the packages:

export PKG_CONFIG_PATH=/opt/arm/gst/lib/pkgconfig

git clone git://github.com/felipec/gst-dsp.git
cd gst-dsp
git checkout -b stable v0.8.0
make

Note that gst-dsp doesn’t use autotools, so sb2 is not needed.

Now, once you have the plugin (libgstdsp.so), copy to ‘/opt/arm/gst/lib/gstreamer-0.10‘ on the target.

And finally, you can run real gst-launch pipelines:
gst-launch playbin2 uri=file://$PWD/file.avi

Note: If you are missing some elements, play around with flags (flags=65 for native video-only)

Do some more development, type make, copy, repeat 🙂

Enjoy 😉

GStreamer, embedded, and low latency are a bad combination

This has been a known fact inside Nokia (MeeGo) for quite a long time due to various performance issues we’ve had to workaround, but for some reason it wasn’t acknowledged as an issue when it was brought up in the mailing list.

So, in order to proof beyond reasonable doubt that there is indeed an issue, I wrote this test. It is very minimal, there’s essentially nothing of a typical GStreamer pipeline, just an element and an app that pushes buffers to it, that’s it. But then, optionally, a queue (typical element in a GStreamer pipeline) is added in the middle, which is a thread-boundary, and then the fun begins:

Graph for x86
Graph for arm

The buffer size legends corresponds to exponentiation (5 => 2 ^ 5 = 32), and the CPU time is returned by the system (getrusage) in ms. You can see that in ARM systems not only more CPU time is wasted, but adding a queue makes things worst at a faster rate.

Note that this test is doing nothing, just pushing buffers around, all the CPU is wasted doing GStreamer operations. In a real scenario the situation is much worst because there isn’t only one, but multiple threads, and many elements involved, so this wasted CPU time I measured has to be multiplied many times.

Now, this has been profiled before, and everything points out to pthread_mutex_lock which is only a problem when there’s contention, which happens more often in GStreamer when buffers are small, then the futex syscall is issued, is very bad in ARM, although it probably depends on which specific system you are using.

Fortunately for me, I don’t need good latency, so I can just push one second buffers and forget about GStreamer performance issues, if you are experiencing the same, and can afford high latency, just increase the buffer sizes, if not, then you are screwed :p

Hopefully this answers Wim’s question of what a “small buffer” means, how it’s not good, and when it’s a problem.

Update

Ok, so the discussion about this continued in the mailing list, and it was pointed out that that the scale is logarithmic, so the exponential result was expected. While that is true, the logarithmic scale matches what people experience; how else would you plot the range from 10ms to 1s? Certainly not linearly.

But there’s a valid point; the results should not be surprising. We can take the logarithmic scale out of the equation by dividing the total CPU time by the number of buffers actually pushed, as Robert Swain did in the comments, that should give a constant number, which is the CPU time it took to do a push. The results indeed converge to a constant number:

queue: 0.078, direct: 0.011

This means that in a realistic use case of pushing one buffer each 10ms through a queue, the CPU usage on this particular processor (800mhz) is 0.78%.

Also, there’s this related old bug that recently got some attention and a new patch from Wim, so I gave it a try (I had to compile GStreamer myself so the results are not comparable with the previous runs).

Before:
queue: 0.074, direct: 0.011

After:
queue: 0.065, direct: 0.007

So the improvement for the queue case is around 12%, while the direct case is 31%.

Not bad at all, but the conclusion is still the same. If you use GStreamer, try to avoid as many elements as possible, specially queues, and try to have the biggest buffer size you can afford, which means that having good performance and low latency is tricky.

Update 2

Stefan Kost suggested to use ‘queue’ instead of ‘queue2’, and I got a pandaboard, so here are the results with OMAP4.

pandaboard (2 core, 1GHz):
queue: 0.017, direct: 0.004

N900:
queue: 0.087, direct: 0.021

i386 (2 core, 1.83GHz):
queue: 0.0059, direct: 0.0015

So, either futex got better on Cortex A9, or OMAP4 is so powerful it can’t be considered embedded :p

gst-av 0.4; better performance for flac, vorbis and mp3 (part 2)

This is a continuation of my previous post. Based on the feedback I decided to do two things; investigate the strange FLAC high CPU usage with FFmpeg, and get more accurate measurements.

GStreamer sucks

It turns out that GStreamer flac parser uses four times more CPU than FFmpeg’s decoder. Thanks to perf, I was able to quickly figure out the biggest offenders: GStreamer’s horrible bitstream reader (GST_BIT_READER_READ_BITS) was by far the worst.

53.03% libgstbase-0.10.so.0.26.0
24.78% libavcodec.so.52.72.2
17.35% libgstxiph.so
1.52% libc-2.12.1.so

This is on my laptop just running the parser (filesrc ! flacparse ! fakesink), in total it was taking 2.67s.

After reading the code and trying different things, I decided to go for something similar to what FFmpeg is doing, and I also borrowed pieces of the architecture-specific optimizations, now it even looks ok:

72.68% libavcodec.so.52.72.2
14.20% libgstxiph.so
4.00% libc-2.12.1.so

And it takes 0.81s.

But how much would this affect battery life on the N900?

Smart battery script

I tried different ideas, and after refreshing myself on statistics I wrote this script in Ruby that runs all the tests, gathers the battery capacity in a separate thread, and finally generates a report per test. Much easier than before.

Since I’m already working on FLAC, I decided to also apply some patches that split the decoder from the parser, and optimizations from Måns Rullgård (good thing I grabbed them because he seems to have left the project and deleted his repos).

Battery life graph

Battery life


Battery drain graph

Battery drain

So, yeah, much better now 😉

But how credible are these results? Well, judge by yourself, listed below are the raw measurements, the samples are the differences in capacity (mAh) measured each 10 minutes, from which the drain and battery life are calculated.

== baseline ==
samples: 3, 3, 3, 3, 4, 5
drain: 21.00±1.87mA
life: 65.39±4.77h
== av flac ==
samples: 9, 8, 8, 8, 7, 8, 7
drain: 47.14±1.45mA
life: 28.19±0.87h
== flac ==
samples: 11, 11, 11, 11, 11, 11
drain: 66.00±0.00mA
life: 20.00±0.00h
== av mp3 ==
samples: 11, 11, 11, 11, 11, 10
drain: 65.00±0.91mA
life: 20.33±0.30h
== nokiamp3 ==
samples: 12, 12, 12, 12, 12, 12
drain: 72.00±0.00mA
life: 18.33±0.00h
== av vorbis ==
samples: 10, 11, 11, 10, 11, 11
drain: 64.00±1.15mA
life: 20.67±0.38h
== vorbis ==
samples: 19, 18, 18, 19, 18, 19
drain: 111.00±1.22mA
life: 11.90±0.13h

If you are interested in the code: gst-av, gst-maemo-xiph. Enjoy 😉

gst-av 0.3; better performance for vorbis and mp3

So, I’ve been working on gst-av, a GStreamer plug-in to use FFmpeg codecs (only audio for now), in order to get it in good shape for ogg support. First, I had to fix oggdemux and flacparse to be compatible with tagreadbin, it seems I managed to do it (with the help of a patch from Sreerenj Balachandran), so now the custom tracker extractors are not needed any more.

Then, with a bit of work I managed to get not only vorbis, but flac, and mp3 working.

That was good, but was it really worth it? Tuomas Kulve did a nice comparison of gst-av vs the default vorbisdec, and I wanted to do something similar, however, running a series of tests each taking 20 hours to complete wasn’t so appealing.

So I asked in #meego and #maemo IRC channels for a simple way to measure battery drain reliably, and automatically. It seems powertop can do that on some platforms, but Maemo’s powertop is a very different beast. Fortunately, the folks at #maemo seem to have been busy trying to get all possible information from the battery, and they pointed me to a very nice powerscript. However, I got some tips to get even better results (from ShadowJK, DocScrutinizer, and SpeedEvil), and the result is this maemo-battery script (needs i2c-tools, and root permissions), which essentially prints the current charge of the battery each 10 minutes.

With this I was ready, but just to be clear how to properly measure battery draw; make sure you are in offline mode, plug your headphones (otherwise pulse-audio would run extra algorithms), and immediately blank the screen.

Here are the results (units in hours of battery life):

These results show that vorbis with FFmpeg is massively better than libvorbis, so my work wasn’t in vain :). But it’s also interesting that FFmpeg’s mp3 decoder is slightly better than Nokia’s proprietary one. Also, FFmpeg still needs some work to complete with libflac. My guess is that these decoders can’t be optimized much further; now the bottlenecks would have to be pulseaudio and gstreamer.

This is the raw data (in mA); I ran my script for one hour for each test, and some I ran multiple times just to verify; the results seem to vary ±1 mA.

current -- mp3: 63, vorbis: 110, flac: 62
gst-av -- mp3: 61, vorbis: 62, flac: 69

gst-ffmpeg

Why not use gst-ffmpeg? You might ask. Initially that’s what I tried, but it doesn’t support vorbis, nor flac, which seems to fit GStreamer’s tradition of getting away from FFmpeg as much as possible. Then when I read the code it was clear to me that it was overly complicated; I’m familiar with FFmpeg’s API (it’s unbelievably simple), so I decided to play around, and see if I could get something working; I did, and the result was incredibly simple, and oh so sweet 🙂 As a comparison, gst-ffmpeg is 16357 lines of code, gst-av is 563 (sure, gst-av does much less; just what is needed). Another reason that goes hand-in-hand with this, is the ability to tweak it; my goal is to get the absolutely best performance, and for that I want to be able to understand what the code is doing. And finally, gst-ffmpeg is using deprecated API.

What about performance?

The difference is not that big: ~1.6h of battery life, but it’s something.

current: 63, gst-av: 61, gst-ffmpeg: 66

What now?

Now we need to package FFmpeg; probably just include the codecs we need, and then ogg support might include these instead. Any volunteers?

Update

It turns out the issue was flacparse which is total crap: it’s using 4 times more CPU time than FFmpeg’s decoder just for parsing. After fixing it now it takes only 20%. I’m trying to get new measurements in a more automated and precise way now. I’ve pushed the code to my repo already.