gst-av 0.5 released; now with video encoding and decoding support

gst-av is a GStreamer plug-in to provide support for libav (formerly FFmpeg), it is similar to gst-ffmpeg, but without GStreamer politics, which means all libav plugins are supported, even if there are native GStreamer alternatives; VP8, MP3, Ogg, Vorbis, AAC, etc.

In addition, it is much simpler (2654 vs 16575 LOC), has better performance, and has a bit of extra features (such as less latency), and doesn’t use deprecated API’s. In a previous post I measured exactly how much improvement compared to gst-ffmpeg there is; it’s not much, but it’s some.

IOW; it’s possible that gst-av is the only GStreamer codec plug-in you would ever need 🙂
Continue reading

Advertisements

Why Linux is the most important software project in history

Here’s another post that for some people is obvious, but there are other (e.g. high level managers) that might not necessarily see the importance of Linux, in fact, I have been surprised by many open source developers who don’t seem to be familiar with how Linux works (they think it’s just something that works?). The fact of the matter is that Linux is light years ahead of any other software project, open or closed, I’ll try to explain why.

BTW. By “Linux”, I mean the kernel, not the ecosystem. Which is the name of the project.

Everywhere

First of all, Linux runs everywhere; desktops, smartphones, routers, web servers, supercomputers, TVs, refrigerators, tablets, even on the stock market (London, NY, Johannesburg, etc.).

Not only it runs anywhere, but in many areas it’s the undisputed #1. In smartphones, Android already has the most market share, and grabbing more and more. In supercomputers, Linux has 92% of the TOP500. On servers, it’s 64%.

There’s a lot of benefits to having a single kernel that runs on all kinds of hardware, I’ll mention two examples.

One is the improvements in power consumption that all the embedded people have been pushing for not only benefit laptops and desktops, but even servers. Also, features like dynamic power management that works really well on embedded influence the desktop and server hardware.

Another one is the VFS scalability patches. Nick Piggin found some issues with 64 CPUs machines that require some reorganization of VFS. The problem is that it’s tricky to test issues with these patches, however, since also have a real-time community (they have their own patches, but eventually will get merged), they could find issues on these patches more easily.

Here’s a nice interview with Jim Zemlin from Linux Foundation that explains where Linux comes from, where it is, how it might very well become the building block of all devices in the future.

Collaboration

There’s many open projects, but not many where everyone is involved and working together. The Linux Foundation issues a yearly report of how the kernel is being developed, and you can see competing companies working together, such as Red Hat (12.4%), Novell (7.0%), IBM (6.9%), Intel (5.8%), Oracle (2.3%), Renesas (1.4.%), SGI (1.3%), Fujitsu (1.2%), Nokia (1.0%), HP (1.0%), Google (0.8%), AMD (0.8%), etc.

This is true synergy, not the management bullshit kind; nobody alone (Microsoft) can compete with what everyone together can produce (Linux).

Communication

The traffic of the main mailing list (LKML) is astronomical; 250 messages per day, but that’s only one mailing list, there are around 200 subsystem lists. Many of these lists have a lot of traffic as well, one I’m subscribed to is linux-media, which has around 30 messages per day.

And there’s a reason why so many people can follow so much traffic without it becoming a total mess. All kernel mailing lists follow common guidelines; you don’t have to be subscribed!, don’t do reply-to munging, encourage cross-posting, cc the right people, trim unnecessary context, and don’t top post.

Also very important is to send patches through the mailing list. You don’t even have to think about it, just type ‘git send-email’ with a proper –cc-cmd, and Linux’s get_maintainer.pl script would find the right maintainers and contributors to cc, and the proper mailing lists to send the patch to. Again, no need to be subscribed. More next.

Patches

I have explained before why sending patches through the mailing list is superior to bugzilla. But it can be summarized as; you don’t need to be subscribed, you don’t need to login anywhere, you don’t need to search for the right component, etc.

As an example I put myself. I am paid by Nokia to work on GStreamer stuff, yet, even though I have a bugzilla account and everything, it’s easier for me to submit (and get merged) patches to Linux (mostly on my free time); it’s just one command. It’s not only easier for me to submit patches, but also to review them; just click reply. I’m not even going to mention closed source, which is horrible in this area.

It is also rewarding that usually that the response to patches is immediate, however, sometimes there are so many comments that patch series have 3, 5, 10, even 30 revisions before they are accepted. This is great for quality reasons, not only for for the project, the developers involved also learn a lot.

Using a mailing list also means that it’s easy to switch from reviewing a patch to a new discussion, based on the patch, thus helping communication.

Speed

It is not surprising that such a fine tuned process the results of producing stable releases each three months like clockwork, introducing major features, and from 5 to 6 patches per hour.

I can’t really explain how many things are going on in Linux, but Jonathan Corbet does in his Kernel Report.

Conclusion

So, Linux is an incredibly massive endeavor, easy and fun to work with, an unstoppable behemoth in the software industry, and IMO companies trying to stand in its way are going to realize their mistake in a painful way.

MeeGo scales, because Linux scales

To me, and a lot of people, it’s obvious why MeeGo scales to a wide variety of devices, but apparently that’s not clear to other people, so I’ll try to explain why that’s the case.

First, let’s divide the operating system:

  1. Kernel
  2. Drivers
  3. Adaptation
  4. System Frameworks
  5. Application Framework
  6. Applications

“Linux” can mean many things, in the case of Android, Linux means mostly the Kernel (which is heavily modified), and in some cases the Drivers (although sometimes they have to be written from scratch), but all the layers above are specific to Android.

On Maemo, MeeGo, Moblin, and LiMo, “Linux” means an upstream Kernel (no drastic changes), upstream Drivers (which means they can be shared with other upstream players as they are), but also means “Linux ecosystem”; D-Bus, X.org, GStreamer, GTK+/Qt/EFL, etc. Which means they take advantage of already existing System and Application Frameworks. And all they have to do, is build the Applications, which is not an easy task, but certainly easier than having to do all the previous ones.

Now, the problem when creating MeeGo, is that for reasons I won’t (can’t?) explain here, Maemo and Moblin were forced to switch from GTK+ to Qt. This might have been the right move in the long term, but it means rewriting two very big layers of the operating system, in fact, the two layers that differentiate the various mobile platforms for the most part. And this of course means letting go of a lot of talent that helped build both Maemo and Moblin.

For better or worse, the decision was made, and all we could do is ride along with it. And maturizing MeeGo, essentially means maturizing these two new layers being written not entirely from scratch (as Qt was already there), but pretty much (as you have to add new features to it, and build on top).

Now, did MeeGo fail? Well, I don’t know when this UI can be considered mature enough, but sooner or later, it will be (I do think it will be soon). The timeframe depends also on your definition of “mature”, but regardless of that, it will happen. After that, MeeGo will be ready to ship on all kinds of devices. All the hardware platform vendors have to do, is write the drivers, and the adaptation, and they already do anyway for other sw platforms.

Needless to say, the UI is irrelevant to the hardware platform.

So, here’s the proof that the lower layers are more than ready:

Just after a few months of announcing MeeGo IVI, these guys were able to write a very impressive application thanks to QML, and ignore the official UI.

The OMAP4 guys went for the full MeeGO UI. No problems.

Even though Freescale is probably not that committed to MeeGo, it’s easier to create demo using it (Qt; Nomovok) rather than other platforms. It’s even hardware accelerated.

Renesas also chose the Nomovok demo to show their hardware capabilities.

MeeGo 1.1 running on HTC’s HD2

One guy; yes, one guy. Decides to run MeeGo on his HTC, and succeeds. Of course, he uses the work already done by Ubuntu for HD2, but since MeeGo is close to upstream, the same kernel can be used. Sure, it’s slow (no hardware acceleration), and there’s many things missing, but for a short amount of time spent by hobbyists, that’s pretty great already.

This is one is not so impressive, but also shows the work of one guy porting MeeGo to Nexus S

And running on Archos 9. Not very impressive UI, but the point is that it runs on this hw.

Conclusion

So, as you can see MeeGo is already supported in many hardware platforms; not because the relevant companies made a deal with Nokia or Intel; they don’t have to. The only thing they have to do is support Linux; Linux is what allows them to run MeeGo, and Linux is what allows MeeGo to run on any hardware platform.

This is impossible with WP7 for numerous reasons; it’s closed source, it’s proprietary, it’s Microsoft, etc. It’s not so impossible to do the same with Android, but it’s more difficult than with MeeGo because they don’t share anything with a typical linux ecosystem; they are on a far away island on their own.

Nokia; from a burning platform, to a sinking platform

I’ve been thinking a lot about this decision to use WP7 from Nokia, as I’m sure many people have, but I’ve wanted to wait for the dust to settle down before blogging, so here’s what I think; it doesn’t make any sense from any point of view.

Technically, there is nothing that can compare to the linux kernel, which works on everything; supercomputers, mobile phones, TVs, routers, web servers, desktops, refrigerators, etc. Not only does it work, but it works well, much better than everything else. As an example, the work that has been done to scale linux’s vfs to many processors (64) does benefit embedded, because some operations are more granular. Or the work on power management lead by embedded helps web servers, where decreasing power consumption is also very much wanted. This creates a environment of synergy never seen before, where even competitors work together. Linux won the kernel race, and its use would only increase; the ones that try to fight against it would only fail miserably.
Continue reading

My ARM development notes

These are my notes to get useful cross-compilation, even with autotools, and GStreamer stuff.

toolchain

The convention is to have ‘arm-linux-gcc‘ and so on, so that you can compile with ‘make CROSS_COMPILE=arm-linux-‘, the kernel and many other projects assume this is the default.

First, you would need ‘~/bin‘ to be on your path, so make sure you have it on ‘~/.bash_profile‘ (export PATH="$HOME/bin:$PATH") or whatever your favorite shell uses.

I use CodeSourcery (GNU/Linux 2009q3), you can fetch it from here.

cd ~/bin
toolchain=/opt/arm-2009q3
for x in $toolchain/bin/arm-none-linux-gnueabi-*
do
ln -s $x arm-linux-${x#$toolchain/bin/arm-none-linux-gnueabi-}
done

QEMU

This is needed for sb2 in order to kind of emulate an ARM system.

git clone git://git.savannah.nongnu.org/qemu.git
cd qemu
git checkout -b stable v0.12.5
./configure --prefix=/opt/qemu --target-list=arm-linux-user
make install

sbox2

This is needed to avoid most of the pain caused by autotools (thank you GNU… not!).

git clone git://gitorious.org/scratchbox2/scratchbox2.git
cd scratchbox2
git checkout -b stable 2.1
./autogen.sh --prefix=/opt/sb2
make install

Add sb2 to the PATH:
export PATH=/opt/sb2/bin:$PATH

sb2 target

Now it’s time to configure a target.

cd /opt/arm-2009q3/arm-none-linux-gnueabi/libc/
sb2-init -c /opt/qemu/bin/qemu-arm armv7 /opt/arm-2009q3/bin/arm-none-linux-gnueabi-gcc

You can check that it works with:
sb2 gcc --version

GStreamer

We are going to install everything into ‘/opt/arm/gst‘, so:

export PKG_CONFIG_PATH=/opt/arm/gst/lib/pkgconfig

You can skip the steps here and go directly to deployment if you download and extract this tarball on your target.

zlib

This is needed by GLib’s gio (which cannot be configured out).

wget -c http://zlib.net/zlib-1.2.5.tar.gz
tar -xf zlib-1.2.5.tar.gz
cd zlib-1.2.5
sb2 ./configure --prefix=/opt/arm/gst
sb2 make install

glib

GLib has bugs (623473, 630910) detecting zlib (thank you Mattias… not!). So either apply my patches, or do the C_INCLUDE_PATH/LDFLAGS hacks below:

export C_INCLUDE_PATH='/opt/arm/gst/include' LDFLAGS='-L/opt/arm/gst/lib'

git clone git://git.gnome.org/glib
cd glib
git checkout -b stable 2.24.1
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-static --with-html-dir=/tmp/dump
sb2 make install

gstreamer

git clone git://anongit.freedesktop.org/gstreamer/gstreamer
cd gstreamer
git checkout -b stable RELEASE-0.10.29
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --disable-loadsave --with-html-dir=/tmp/dump
sb2 make install

liboil

Needed by many GStreamer components.

git clone git://anongit.freedesktop.org/liboil
cd liboil
git checkout -b stable liboil-0.3.17
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-static --with-html-dir=/tmp/dump
sb2 make install

gst-plugins-base

git clone git://anongit.freedesktop.org/gstreamer/gst-plugins-base
cd gst-plugins-base
git checkout -b stable RELEASE-0.10.29
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --with-html-dir=/tmp/dump
sb2 make install

gst-plugins-good

git clone git://anongit.freedesktop.org/gstreamer/gst-plugins-good
cd gst-plugins-good
git checkout -b stable RELEASE-0.10.23
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --with-html-dir=/tmp/dump
sb2 make install

Deployment

So now we have everything installed in ‘/opt/arm/gst‘, but how to run on the target? Just copy the exact same files into the target on the exact same location, and then:

export PATH=/opt/arm/gst/bin:$PATH

That’s it, you can run gst-launch, gst-inspect, and so on.

Development

Ok, it should be clear how to do development from the previous steps, but in case it wasn’t clear, here’s how to:

gst-dsp

Each time you want to cross-compile, you need to tell pkg-config where to find the packages:

export PKG_CONFIG_PATH=/opt/arm/gst/lib/pkgconfig

git clone git://github.com/felipec/gst-dsp.git
cd gst-dsp
git checkout -b stable v0.8.0
make

Note that gst-dsp doesn’t use autotools, so sb2 is not needed.

Now, once you have the plugin (libgstdsp.so), copy to ‘/opt/arm/gst/lib/gstreamer-0.10‘ on the target.

And finally, you can run real gst-launch pipelines:
gst-launch playbin2 uri=file://$PWD/file.avi

Note: If you are missing some elements, play around with flags (flags=65 for native video-only)

Do some more development, type make, copy, repeat 🙂

Enjoy 😉

GStreamer, embedded, and low latency are a bad combination

This has been a known fact inside Nokia (MeeGo) for quite a long time due to various performance issues we’ve had to workaround, but for some reason it wasn’t acknowledged as an issue when it was brought up in the mailing list.

So, in order to proof beyond reasonable doubt that there is indeed an issue, I wrote this test. It is very minimal, there’s essentially nothing of a typical GStreamer pipeline, just an element and an app that pushes buffers to it, that’s it. But then, optionally, a queue (typical element in a GStreamer pipeline) is added in the middle, which is a thread-boundary, and then the fun begins:

Graph for x86
Graph for arm

The buffer size legends corresponds to exponentiation (5 => 2 ^ 5 = 32), and the CPU time is returned by the system (getrusage) in ms. You can see that in ARM systems not only more CPU time is wasted, but adding a queue makes things worst at a faster rate.

Note that this test is doing nothing, just pushing buffers around, all the CPU is wasted doing GStreamer operations. In a real scenario the situation is much worst because there isn’t only one, but multiple threads, and many elements involved, so this wasted CPU time I measured has to be multiplied many times.

Now, this has been profiled before, and everything points out to pthread_mutex_lock which is only a problem when there’s contention, which happens more often in GStreamer when buffers are small, then the futex syscall is issued, is very bad in ARM, although it probably depends on which specific system you are using.

Fortunately for me, I don’t need good latency, so I can just push one second buffers and forget about GStreamer performance issues, if you are experiencing the same, and can afford high latency, just increase the buffer sizes, if not, then you are screwed :p

Hopefully this answers Wim’s question of what a “small buffer” means, how it’s not good, and when it’s a problem.

Update

Ok, so the discussion about this continued in the mailing list, and it was pointed out that that the scale is logarithmic, so the exponential result was expected. While that is true, the logarithmic scale matches what people experience; how else would you plot the range from 10ms to 1s? Certainly not linearly.

But there’s a valid point; the results should not be surprising. We can take the logarithmic scale out of the equation by dividing the total CPU time by the number of buffers actually pushed, as Robert Swain did in the comments, that should give a constant number, which is the CPU time it took to do a push. The results indeed converge to a constant number:

queue: 0.078, direct: 0.011

This means that in a realistic use case of pushing one buffer each 10ms through a queue, the CPU usage on this particular processor (800mhz) is 0.78%.

Also, there’s this related old bug that recently got some attention and a new patch from Wim, so I gave it a try (I had to compile GStreamer myself so the results are not comparable with the previous runs).

Before:
queue: 0.074, direct: 0.011

After:
queue: 0.065, direct: 0.007

So the improvement for the queue case is around 12%, while the direct case is 31%.

Not bad at all, but the conclusion is still the same. If you use GStreamer, try to avoid as many elements as possible, specially queues, and try to have the biggest buffer size you can afford, which means that having good performance and low latency is tricky.

Update 2

Stefan Kost suggested to use ‘queue’ instead of ‘queue2’, and I got a pandaboard, so here are the results with OMAP4.

pandaboard (2 core, 1GHz):
queue: 0.017, direct: 0.004

N900:
queue: 0.087, direct: 0.021

i386 (2 core, 1.83GHz):
queue: 0.0059, direct: 0.0015

So, either futex got better on Cortex A9, or OMAP4 is so powerful it can’t be considered embedded :p

gst-av 0.4; better performance for flac, vorbis and mp3 (part 2)

This is a continuation of my previous post. Based on the feedback I decided to do two things; investigate the strange FLAC high CPU usage with FFmpeg, and get more accurate measurements.

GStreamer sucks

It turns out that GStreamer flac parser uses four times more CPU than FFmpeg’s decoder. Thanks to perf, I was able to quickly figure out the biggest offenders: GStreamer’s horrible bitstream reader (GST_BIT_READER_READ_BITS) was by far the worst.

53.03% libgstbase-0.10.so.0.26.0
24.78% libavcodec.so.52.72.2
17.35% libgstxiph.so
1.52% libc-2.12.1.so

This is on my laptop just running the parser (filesrc ! flacparse ! fakesink), in total it was taking 2.67s.

After reading the code and trying different things, I decided to go for something similar to what FFmpeg is doing, and I also borrowed pieces of the architecture-specific optimizations, now it even looks ok:

72.68% libavcodec.so.52.72.2
14.20% libgstxiph.so
4.00% libc-2.12.1.so

And it takes 0.81s.

But how much would this affect battery life on the N900?

Smart battery script

I tried different ideas, and after refreshing myself on statistics I wrote this script in Ruby that runs all the tests, gathers the battery capacity in a separate thread, and finally generates a report per test. Much easier than before.

Since I’m already working on FLAC, I decided to also apply some patches that split the decoder from the parser, and optimizations from Måns Rullgård (good thing I grabbed them because he seems to have left the project and deleted his repos).

Battery life graph

Battery life


Battery drain graph

Battery drain

So, yeah, much better now 😉

But how credible are these results? Well, judge by yourself, listed below are the raw measurements, the samples are the differences in capacity (mAh) measured each 10 minutes, from which the drain and battery life are calculated.

== baseline ==
samples: 3, 3, 3, 3, 4, 5
drain: 21.00±1.87mA
life: 65.39±4.77h
== av flac ==
samples: 9, 8, 8, 8, 7, 8, 7
drain: 47.14±1.45mA
life: 28.19±0.87h
== flac ==
samples: 11, 11, 11, 11, 11, 11
drain: 66.00±0.00mA
life: 20.00±0.00h
== av mp3 ==
samples: 11, 11, 11, 11, 11, 10
drain: 65.00±0.91mA
life: 20.33±0.30h
== nokiamp3 ==
samples: 12, 12, 12, 12, 12, 12
drain: 72.00±0.00mA
life: 18.33±0.00h
== av vorbis ==
samples: 10, 11, 11, 10, 11, 11
drain: 64.00±1.15mA
life: 20.67±0.38h
== vorbis ==
samples: 19, 18, 18, 19, 18, 19
drain: 111.00±1.22mA
life: 11.90±0.13h

If you are interested in the code: gst-av, gst-maemo-xiph. Enjoy 😉