My ARM development notes

These are my notes to get useful cross-compilation, even with autotools, and GStreamer stuff.

toolchain

The convention is to have ‘arm-linux-gcc‘ and so on, so that you can compile with ‘make CROSS_COMPILE=arm-linux-‘, the kernel and many other projects assume this is the default.

First, you would need ‘~/bin‘ to be on your path, so make sure you have it on ‘~/.bash_profile‘ (export PATH="$HOME/bin:$PATH") or whatever your favorite shell uses.

I use CodeSourcery (GNU/Linux 2009q3), you can fetch it from here.

cd ~/bin
toolchain=/opt/arm-2009q3
for x in $toolchain/bin/arm-none-linux-gnueabi-*
do
ln -s $x arm-linux-${x#$toolchain/bin/arm-none-linux-gnueabi-}
done

QEMU

This is needed for sb2 in order to kind of emulate an ARM system.

git clone git://git.savannah.nongnu.org/qemu.git
cd qemu
git checkout -b stable v0.12.5
./configure --prefix=/opt/qemu --target-list=arm-linux-user
make install

sbox2

This is needed to avoid most of the pain caused by autotools (thank you GNU… not!).

git clone git://gitorious.org/scratchbox2/scratchbox2.git
cd scratchbox2
git checkout -b stable 2.1
./autogen.sh --prefix=/opt/sb2
make install

Add sb2 to the PATH:
export PATH=/opt/sb2/bin:$PATH

sb2 target

Now it’s time to configure a target.

cd /opt/arm-2009q3/arm-none-linux-gnueabi/libc/
sb2-init -c /opt/qemu/bin/qemu-arm armv7 /opt/arm-2009q3/bin/arm-none-linux-gnueabi-gcc

You can check that it works with:
sb2 gcc --version

GStreamer

We are going to install everything into ‘/opt/arm/gst‘, so:

export PKG_CONFIG_PATH=/opt/arm/gst/lib/pkgconfig

You can skip the steps here and go directly to deployment if you download and extract this tarball on your target.

zlib

This is needed by GLib’s gio (which cannot be configured out).

wget -c http://zlib.net/zlib-1.2.5.tar.gz
tar -xf zlib-1.2.5.tar.gz
cd zlib-1.2.5
sb2 ./configure --prefix=/opt/arm/gst
sb2 make install

glib

GLib has bugs (623473, 630910) detecting zlib (thank you Mattias… not!). So either apply my patches, or do the C_INCLUDE_PATH/LDFLAGS hacks below:

export C_INCLUDE_PATH='/opt/arm/gst/include' LDFLAGS='-L/opt/arm/gst/lib'

git clone git://git.gnome.org/glib
cd glib
git checkout -b stable 2.24.1
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-static --with-html-dir=/tmp/dump
sb2 make install

gstreamer

git clone git://anongit.freedesktop.org/gstreamer/gstreamer
cd gstreamer
git checkout -b stable RELEASE-0.10.29
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --disable-loadsave --with-html-dir=/tmp/dump
sb2 make install

liboil

Needed by many GStreamer components.

git clone git://anongit.freedesktop.org/liboil
cd liboil
git checkout -b stable liboil-0.3.17
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-static --with-html-dir=/tmp/dump
sb2 make install

gst-plugins-base

git clone git://anongit.freedesktop.org/gstreamer/gst-plugins-base
cd gst-plugins-base
git checkout -b stable RELEASE-0.10.29
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --with-html-dir=/tmp/dump
sb2 make install

gst-plugins-good

git clone git://anongit.freedesktop.org/gstreamer/gst-plugins-good
cd gst-plugins-good
git checkout -b stable RELEASE-0.10.23
./autogen.sh --noconfigure
sb2 ./configure --prefix=/opt/arm/gst --disable-nls --disable-static --with-html-dir=/tmp/dump
sb2 make install

Deployment

So now we have everything installed in ‘/opt/arm/gst‘, but how to run on the target? Just copy the exact same files into the target on the exact same location, and then:

export PATH=/opt/arm/gst/bin:$PATH

That’s it, you can run gst-launch, gst-inspect, and so on.

Development

Ok, it should be clear how to do development from the previous steps, but in case it wasn’t clear, here’s how to:

gst-dsp

Each time you want to cross-compile, you need to tell pkg-config where to find the packages:

export PKG_CONFIG_PATH=/opt/arm/gst/lib/pkgconfig

git clone git://github.com/felipec/gst-dsp.git
cd gst-dsp
git checkout -b stable v0.8.0
make

Note that gst-dsp doesn’t use autotools, so sb2 is not needed.

Now, once you have the plugin (libgstdsp.so), copy to ‘/opt/arm/gst/lib/gstreamer-0.10‘ on the target.

And finally, you can run real gst-launch pipelines:
gst-launch playbin2 uri=file://$PWD/file.avi

Note: If you are missing some elements, play around with flags (flags=65 for native video-only)

Do some more development, type make, copy, repeat ๐Ÿ™‚

Enjoy ๐Ÿ˜‰

GStreamer, embedded, and low latency are a bad combination

This has been a known fact inside Nokia (MeeGo) for quite a long time due to various performance issues we’ve had to workaround, but for some reason it wasn’t acknowledged as an issue when it was brought up in the mailing list.

So, in order to proof beyond reasonable doubt that there is indeed an issue, I wrote this test. It is very minimal, there’s essentially nothing of a typical GStreamer pipeline, just an element and an app that pushes buffers to it, that’s it. But then, optionally, a queue (typical element in a GStreamer pipeline) is added in the middle, which is a thread-boundary, and then the fun begins:

Graph for x86
Graph for arm

The buffer size legends corresponds to exponentiation (5 => 2 ^ 5 = 32), and the CPU time is returned by the system (getrusage) in ms. You can see that in ARM systems not only more CPU time is wasted, but adding a queue makes things worst at a faster rate.

Note that this test is doing nothing, just pushing buffers around, all the CPU is wasted doing GStreamer operations. In a real scenario the situation is much worst because there isn’t only one, but multiple threads, and many elements involved, so this wasted CPU time I measured has to be multiplied many times.

Now, this has been profiled before, and everything points out to pthread_mutex_lock which is only a problem when there’s contention, which happens more often in GStreamer when buffers are small, then the futex syscall is issued, is very bad in ARM, although it probably depends on which specific system you are using.

Fortunately for me, I don’t need good latency, so I can just push one second buffers and forget about GStreamer performance issues, if you are experiencing the same, and can afford high latency, just increase the buffer sizes, if not, then you are screwed :p

Hopefully this answers Wim’s question of what a “small buffer” means, how it’s not good, and when it’s a problem.

Update

Ok, so the discussion about this continued in the mailing list, and it was pointed out that that the scale is logarithmic, so the exponential result was expected. While that is true, the logarithmic scale matches what people experience; how else would you plot the range from 10ms to 1s? Certainly not linearly.

But there’s a valid point; the results should not be surprising. We can take the logarithmic scale out of the equation by dividing the total CPU time by the number of buffers actually pushed, as Robert Swain did in the comments, that should give a constant number, which is the CPU time it took to do a push. The results indeed converge to a constant number:

queue: 0.078, direct: 0.011

This means that in a realistic use case of pushing one buffer each 10ms through a queue, the CPU usage on this particular processor (800mhz) is 0.78%.

Also, there’s this related old bug that recently got some attention and a new patch from Wim, so I gave it a try (I had to compile GStreamer myself so the results are not comparable with the previous runs).

Before:
queue: 0.074, direct: 0.011

After:
queue: 0.065, direct: 0.007

So the improvement for the queue case is around 12%, while the direct case is 31%.

Not bad at all, but the conclusion is still the same. If you use GStreamer, try to avoid as many elements as possible, specially queues, and try to have the biggest buffer size you can afford, which means that having good performance and low latency is tricky.

Update 2

Stefan Kost suggested to use ‘queue’ instead of ‘queue2’, and I got a pandaboard, so here are the results with OMAP4.

pandaboard (2 core, 1GHz):
queue: 0.017, direct: 0.004

N900:
queue: 0.087, direct: 0.021

i386 (2 core, 1.83GHz):
queue: 0.0059, direct: 0.0015

So, either futex got better on Cortex A9, or OMAP4 is so powerful it can’t be considered embedded :p

GStreamer development in embedded with sbox2

I’ve heard many people having problems cross-compiling GStreamer on ARM systems so I decided to write a step-by-step guide describing how I do it.

This guide is meant for scratchbox 2. You can read this old post where I explain how to install it. I’m using CodeSourcery 2009q3 but any compiler would do.

Continue reading

New project: gst-dsp, with beagleboard demo image

It took me a lot more than I expected, but I finally managed to get the beagleboard booting and happy with the latest linux kernel (2.6.32-rc3), DSS2, and dsp-bridge driver.

And then I could run gst-dsp: a native GStreamer plug-in to access Texas Instruments’ DSP algorithms for OMAP3 platforms. Which marks the time for making a public release.

Here’s the video ๐Ÿ™‚

The video playback is running on the beagleboard with gst-dsp and gst-omapfb (no X) with TI public DSP binaries; it’s a WVGA (800×480) MPEG-4 video. It’s not running as smoothly as I wanted; it seems the public binaries are a bit buggy, and there’s some problem with the dsp-bridge driver writing directly to the framebuffer, but at least it somewhat works ๐Ÿ™‚

The video recording is done with an N900, official system (which uses gst-dsp ;)), and the resulting video is ; 848×480 MPEG-4.

You can find the demo image for the beagleboard, along with instructions, here.

The kernel code is on gitorious; the important tag is v2.6.32-felipec1.

And gst-dsp is hosted on Google Code, although the git repository is actually on github.

The code wasn’t written from scratch, TI’s projects: tiopenmax and libbridge, helped a lot.

And of course many other people made the first release possible (see the shortlog at the end).

Enjoy ๐Ÿ˜‰

Andriy Shevchenko (1):
      base: fix a crash on send codec data

Felipe Contreras (180):
      Initial commit
      Register dsp node
      Add README
      Fix and update copyrights
      Add ALLOCATE_HEAP and ALLOCATE_SN to dsp_bridge
      Add handy dsp_send_message
      dummy: use dsp_send_message
      Rename gstdsp.* to plugin.*
      Makefile: cleanup
      dummy: trivial clanups
      Add log utility
      Use log utility
      dmm_buffer: size_t improvements
      dmm_buffer: always unmap when freeing
      dmm_buffer: use getpagesize()
      dmm_buffer: alignment improvements
      dmm_buffer: add user_data field
      Add MPEG-4 video decoder
      README: update
      mp4vdec: trivial cleanup
      mp4vdec: send signal to output_loop
      mp4vdec: flush output buffers too
      mp4vdec: reset output port
      mp4vdec: extra check for null buffer
      mp4vdec: use atomic operations for status
      mp4vdec: use more atomic operations for status
      mp4vdec: send stop signal before
      mp4vdec: re-use comm buffers
      dmm_buffer: reorganize a bit
      dmm_buffer: add dmm_buffer_reserve
      dmm_buffer: allow to re-reserve memory
      dmm_buffer: allow re-mapping
      mp4vdec: trivial cleanup
      dmm_buffer: unmap before unreserving
      mp4vdec: re-use mappings for output buffers
      mp4vdec: convert flush condition to semaphore
      Remove cond.h
      Rename mp4vdec to vdec
      vdec: trivial cleanup
      vdec: trivial reorganization
      vdec: prepare for multiple algos
      vdec: move create_node to dsp_start
      vdec: start dsp node after getting the caps
      vdec: initial support for H.264
      vdec: add Juha to authors list
      README: update
      vdec: cleanup
      vdec: make dsp_thread static
      vdec: reorganize a bit
      New base class
      Add new video encoder
      base: handle more commands
      base: reorganize got_message a bit
      venc: improve jpeg args
      venc: send jpeg dynamic params
      base: cleanup setup_output_buffers
      base: remove unused buffer_count
      base: reorganize a bit
      base: add use_pad_alloc option
      base: free mapped buffers on dsp_stop()
      base: be more verbose on get_slot()
      README: update
      Makefile: check for missing symbols
      New utility gstdsp_register()
      base: detect dsp errors
      base: properly handle dsp errors
      base: post error in the bus
      base: extra check for status in outout_loop()
      base: free events array
      base: reinitialize state on NULL->READY
      base: use circular buffer for timestamps
      base: increase ts_array
      base: increase mapping cache
      dummy: reorganize map_buffer
      dummy: input buffers don't need alignment
      dummy: cleanup
      dummy: don't map buffers
      venc: increase framesize limit for jpeg
      base: add gstdsp_post_error()
      venc: allocate a buffer when framesize is unaligned
      base: decrease wait for events timeout
      base: more error messages
      base: re-initialize on READY->PAUSED
      base: don't panic on wrong status
      base: destroy node at the right time
      base: catch playback completed message
      base: possible memleak fixes
      vdec: send codec data for MPEG-4
      base: make map cache optional
      plugin: set more proper ranks
      vdec: add framerate workaround
      vdec: remove gstdsp_send_buffer()
      base: add create_node() vmethod
      base: add parsing facilities
      Add h263 parser
      parse: update framesize only when unset
      Random cleanups
      base: add support for stream params
      venc: add H.263
      venc: use h263 by default
      Reorganize encoders
      base: send codec data for all the codecs
      base: keep trying if parse func fails
      base: trivial cleanup
      Trivial cleanups
      log: don't display info level
      log: decrease log level for buffer allocs
      log: add pr_test
      base: rename array to cache
      base: rename 'buffer' to 'comm'
      base: event cleanup
      base: reorganize a bit
      base: assume output buffer is always there
      base: remove out_buffer, use port buffer
      base: store input buffer
      base: trivial cleanup
      base: flush ports on stop
      base: plug some possible leaks
      base: make map_buffer() more conservative
      base: trigger semaphore after buffer modifications
      base: re-use input buffer
      base: add port index field
      Add async queue
      base: allow multiple buffers
      base: allow child elements to configure the ports
      vdec: increase the number of buffers to 2
      venc: trivial fixes
      log: add missing include
      base: re-enable queues properly
      venc: decrease input buffer size
      base: wait for eos
      base: possible fix
      Initial MPEG-4 video encoder support
      gstdspvenc.h: preemptively add H.264 to the list
      base: add send_codec_data() helper
      vdec: use send_codec_data()
      vdec: extra checks
      Add skip hack
      Revert "venc: forcing mpeg4 I frame each i_frame_interval"
      venc: reorganize stream/dynamic params
      base: trivial cleanup
      base: properly set param virt addr
      Add param argument to buffer callbacks
      Add buffer argument to buffer callbacks
      base: add buffer recv_cb
      base: add check for end addr alignment
      vdec: fix extra unref for codec-data
      base: trivial cleanups
      Rename dmm_buffer_flush() to dmm_buffer_clean()
      base: fix memory read
      dmm_buffer: clean instead of flush
      dmm_buffer: add cache 'flush' function back
      Use more proper cache functions
      base: handle bad node termination
      base: make EOS alignment an option
      jpegenc: enable eos align
      venc: improve integer framerate calculation
      venc: fix bitrate calculation
      venc: cleanup bitrate calculation
      venc: remove jpeg from bitrate calculation
      venc: tweak bitrate calculation
      venc: trivial cleanups
      venc: add 'quality' field
      venc: calculate smaller buffer sizes
      Fix some static analysis warnings
      log: avoid pr_info when gst debugging is off
      base: remove use_map_cache
      Trivial cleanups
      Cleanup type registrations
      base: improve some compiler hints
      dmm-buffer: check cache flush size
      base: properly free node resources
      Create custom dsp_node_t
      dsp-bridge: store node heap ourselves
      dsp-bridge: store node msgbuf ourselves
      dsp-bridge: cleanup node_free
      base: copy buffers when appropriate
      base: remove unnecessary cache flushing
      venc: set rate-control to variable
      base: post critical error mesages to the bus

Hoseok Chang (1):
      venc: tune mp4v parms for better performance

Juha Alanen (5):
      vdec: set profile based on the frame size
      vdec: improve H.263 args
      vdec: initial support for WMV9
      venc: set profile correctly for H.263 and MPEG4
      venc: disable single scan output for JPEG encoder

Marco Ballesio (8):
      vdec: fix srcpad setup
      venc: rename mp4venc_stream_params
      venc: add mp4venc_out_stream_params
      base: use proper buffer length
      venc: forcing mpeg4 I frame each i_frame_interval
      venc: reordered mp4venc_args initialization
      venc: added bitrate computation formula
      venc: propagate keyframes properly

Mark Nauwelaerts (2):
      base: safer buffer allocation and freeing
      base: fix element ref leak

Miguel Verdu (2):
      venc: tune MPEG-4 parameters
      venc: tune MPEG-4 parameters for quality

Renรฉ Stadler (3):
      base: fix thread leak
      base: advance timestamp pointer for empty output buffers
      base: don't use DSP flushing

Tim-Philipp Mรผller (1):
      base: unref unused output buffer when skipping output

OMAP3 public DSP binaries now work

It took some time but finally tiopenmax 0.3.5 was released. It’s essentially 0.3 plus DSP binaries that actually work.

I verified with gst-openmax (git omap branch) and they work just fine ๐Ÿ™‚ Thanks Daniel Dรญaz!

So people with OMAP3 hardware (beagleboard) can already try D1 MPEG-4 decoding using less than 15% of CPU. If you missed the demo, here it is:

Update: the latest information is actually here.

All the information was available in my previous post. One minor update is that I’ve made a tag (v2.6.28-tidspbridge) to the linux-omap tree on my github repo to make it extra easy for people to compile a stable kernel with the dsp-bridge driver. There’s many DSP fixes available and some performance improvements which are not in this tag, but I’ll make sure they are once 2.6.29-omap1 is tagged.

GStreamer hello world

Continuing my previous GStreamer introduction this new tutorial will guide you on your first GStreamer application written in C.

For anxious people the code is here.

The Basics

We first start creating a playbin, which is a GStreamer element:

GstElement *pipeline = gst_element_factory_make("playbin", "player");

Then we specify the URI we want to play, i.e. file://tmp/foobar.mp3

g_object_set(G_OBJECT(pipeline), "uri", uri, NULL);

Now, we would want to play this:

gst_element_set_state(GST_ELEMENT(pipeline), GST_STATE_PLAYING);

But nothing would happen if we don’t have a GLib mainloop:

GMainLoop *loop = g_main_loop_new(NULL, FALSE);
g_main_loop_run(loop);

And of course, we need to initialize GStreamer:

gst_init(&argc, &argv);

or, if you don’t have argc/argv available:

gst_init(NULL, NULL);

That’s it, at the end you should stop the pipeline and free it:

gst_element_set_state(GST_ELEMENT(pipeline), GST_STATE_NULL);
gst_object_unref(GST_OBJECT(pipeline));

Compilation

Nothing complicated here, pkg-config will tell us all we need to know:

gcc `pkg-config --cflags --libs gstreamer-0.10` hello.c -o gst_hello

The Bus

Great! we have something that plays, but it won’t even exit properly, nor show any errors.

In order to get messages, errors, and other important information while the mainloop is running we need a bus watch from this pipeline:

GstBus *bus = gst_pipeline_get_bus(GST_PIPELINE(pipeline));
gst_bus_add_watch(bus, bus_call, NULL);
gst_object_unref(bus);
static gboolean
bus_call(GstBus *bus,
	 GstMessage *msg,
	 gpointer user_data)
{
	switch (GST_MESSAGE_TYPE(msg)) {
	case GST_MESSAGE_EOS: break;
	case GST_MESSAGE_ERROR: break;
	default: break;
	}

	return true;
}

When we get the EOS message, we would like to exit the application, right?

g_main_loop_quit(loop);

And if we get an ERROR message, we would like to display it, and exit:

gchar *debug;
GError *err;

gst_message_parse_error(msg, &err, &debug);
g_free(debug);

g_error("%s", err->message);
g_error_free(err);

g_main_loop_quit(loop);

It seems we have everything!

Enjoy

Now you can listen to your music or watch videos with this. And remember, it receives a URI, so “test.avi” won’t work, you need the whole thing:

./gst_hello "file://$PWD/test.avi"

Or even better!

./gst_hello http://www.xiph.org/vorbis/listen/compilation-ogg-q4.ogg