gst-av 0.4; better performance for flac, vorbis and mp3 (part 2)

This is a continuation of my previous post. Based on the feedback I decided to do two things; investigate the strange FLAC high CPU usage with FFmpeg, and get more accurate measurements.

GStreamer sucks

It turns out that GStreamer flac parser uses four times more CPU than FFmpeg’s decoder. Thanks to perf, I was able to quickly figure out the biggest offenders: GStreamer’s horrible bitstream reader (GST_BIT_READER_READ_BITS) was by far the worst.

53.03% libgstbase-0.10.so.0.26.0
24.78% libavcodec.so.52.72.2
17.35% libgstxiph.so
1.52% libc-2.12.1.so

This is on my laptop just running the parser (filesrc ! flacparse ! fakesink), in total it was taking 2.67s.

After reading the code and trying different things, I decided to go for something similar to what FFmpeg is doing, and I also borrowed pieces of the architecture-specific optimizations, now it even looks ok:

72.68% libavcodec.so.52.72.2
14.20% libgstxiph.so
4.00% libc-2.12.1.so

And it takes 0.81s.

But how much would this affect battery life on the N900?

Smart battery script

I tried different ideas, and after refreshing myself on statistics I wrote this script in Ruby that runs all the tests, gathers the battery capacity in a separate thread, and finally generates a report per test. Much easier than before.

Since I’m already working on FLAC, I decided to also apply some patches that split the decoder from the parser, and optimizations from Måns Rullgård (good thing I grabbed them because he seems to have left the project and deleted his repos).

Battery life graph

Battery life


Battery drain graph

Battery drain

So, yeah, much better now ;)

But how credible are these results? Well, judge by yourself, listed below are the raw measurements, the samples are the differences in capacity (mAh) measured each 10 minutes, from which the drain and battery life are calculated.

== baseline ==
samples: 3, 3, 3, 3, 4, 5
drain: 21.00±1.87mA
life: 65.39±4.77h
== av flac ==
samples: 9, 8, 8, 8, 7, 8, 7
drain: 47.14±1.45mA
life: 28.19±0.87h
== flac ==
samples: 11, 11, 11, 11, 11, 11
drain: 66.00±0.00mA
life: 20.00±0.00h
== av mp3 ==
samples: 11, 11, 11, 11, 11, 10
drain: 65.00±0.91mA
life: 20.33±0.30h
== nokiamp3 ==
samples: 12, 12, 12, 12, 12, 12
drain: 72.00±0.00mA
life: 18.33±0.00h
== av vorbis ==
samples: 10, 11, 11, 10, 11, 11
drain: 64.00±1.15mA
life: 20.67±0.38h
== vorbis ==
samples: 19, 18, 18, 19, 18, 19
drain: 111.00±1.22mA
life: 11.90±0.13h

If you are interested in the code: gst-av, gst-maemo-xiph. Enjoy ;)

About these ads

24 thoughts on “gst-av 0.4; better performance for flac, vorbis and mp3 (part 2)

  1. So is there any good reason why the bitstream reader was so bad to begin with? Is it just a lack of anyone taking the time to optimise it or did you actually have to strip functionality?

  2. @Alex No, there’s no good reason. Nobody saw anything wrong in it (shocking), and apparently nobody profiled it.

    @MaxxCorp I doubt they would accept patches that resemble FFmpeg in any way.

    Anyway, Sebastian Dröge seems to have pushed bitstream reader functions that might not be that horrible (still nowhere near FFmpeg performance), and I sent some patch to bugzilla (bug 631200), which is always painful to me.

  3. @Alex: the reason is that the API was originally just added for completeness’ sake as convenience API, it was not actually used anywhere, certainly not in performance critical sections of code. Hence no one particularly cared about optimisations. Make it work first, then make it fast – I don’t find that particularly “shocking”.

    The flacparse element mentioned here was clearly labelled as “bad” plugin with a rank of 0 and not used in a normal flac playback scenario, for good reasons. I believe it attempts to parse the bitstream more fully and that’s probably one of the reasons it’s slower than the ffmpeg parser; some might argue this is more correct, but it’s probably not really required. There’s no reason not to implement the other approach and see how it goes.

    I don’t know why it’s necessary to make such a big fuss about nothing. Sure, it’s not perfect, but it can all be fixed up quite easily. A little more civility would be nice.

  4. @Tim
    I asked a valid question and the only person making a fuss is yourself. This blog post makes NO mention that the function was just a quick API compatibility mash up, THAT is why I asked as it seemed odd that core GStreamer functionality would be so inefficient.

    I take it from your post that it was never intended to be core functionality, fair enough, but how was I supposed to know that without asking? I am not the one who posted “GStreamer sucks”, I just responded in the context of the blog post.

    So perhaps the question should be, is there already an alternative function to do the same thing? If not, then how you possibly say its not core functionality for playing back a FLAC format file and how is it unreasonable to be upset that its inefficient?

  5. @Alex: sorry, only the first paragraph was meant for you, I didn’t format that very well. It’s of course a perfectly valid question, I was just trying to add some more context :)

  6. @Tim The bit reader API was released in January, on GStreamer 0.10.22, I guess it was intended to be used, otherwise there’s no point in providing it. By now third-parties might be relying on such functions, so you can’t change the API. And then new functions were added, like gst_bit_reader_get_bits_uint32_unchecked() but that API is still not ideal, you should see the FFmpeg one (int val = get_bits(r, x)), but who cares, GStreamer doesn’t have a review process, not even for API changes. You can do what slomo did; just commit whatever you want.

    And there’s a problem with writing something first, and worrying about performance later; the API is probably wrong already, and people are relying on it.

    Also, I’m not making a big fuzz, I’m just stating my opinion. I could have listed some reasons, like the fact that on Fremantle we found with G.729 that GStreamer didn’t handle 10ms well at all, and then I found that even for MP3 and Vorbis a good chunk of the time was spent pushing buffers around, so I had to write gst-av in order to provide 1s buffers to pulsesink, and now it turns I had to rewrite the parser, what’s left in the pipeline? There’s some people in Nokia thinking pulsink needs to be rewriting, so for my use-case what exactly is GStreamer providing?

    I can probably write a pulseaudio sink in a weekend, provide it to libavdevice, and write a MAFW renderer that uses FFmpeg all the way. It certainly would be easier than trying to fix GStreamer, whose community BTW is not even willing to accept patches in the mailing list which is the only way I’m willing to contribute fully.

    So yeah, GStreamer sucks.

    @Alex There’s no replacement for flacparse. If somebody writes a FLAC decoder he either writes all the metadata parsing, seeking, and all the stuff that is supposed to do a parser (like flacdec), or he uses flacparse. So really I had no option.

  7. @FelipeC
    Nothing you are saying really comes as too much of a surprise.

    I have been trying to learn GStreamer for about a month now and found it absolutely horrible to learn. There are very few examples of how to setup pipelines and a serious lack of documentation explaining in plain english things like when you need to use a queue, when you might need to do colourspace conversion, etc.

    The best examples I found of mixing multiple video streams into a single stream use the videotestsrc which I thought was excellent (I am trying to setup my own CCTV streaming) and you would THINK then that I could simply replace that source with v4l2src and all would work properly, but it doesn’t. GStreamer gives out cryptic error messages that are pretty meaningless unless you understand the inner workings.

    This is what really frustrates me about Linux in general. It has some real kick ass components that are either let down by a lack of optimisation but mostly by a lack of documentation that can be understood by anyone without a degree in programming. I am by no means a novice but have a real hard time with things like GStreamer and Pulseaudio, that seem unnecessarily convoluted or just simply do not work correctly when the documentation says it should.

    Back on topic though, if the bit reader was only added in January then how did it work before that? How does this relate to Maemo 5 which has had a package to support FLAC pretty much since the N900 launched? This all makes it sound like using GStreamer for MAFW was a stupid idea, so there must be a lot more to it than this because I doubt Nokia are THAT stupid (although its hard to tell, some of the decision they make).

    @Tim

    If the API was only added for completeness then how on earth did you handle FLAC files before?

  8. @Alex I agree with you, GStreamer is supposed to make things easier for the higher levels, but many times you end up having to even read the code to understand how you are supposed to do things. In many situations I’ve found FFmpeg easier to use, but of course it’s scope is limited.

    On the desktop, and currently on the N900, flacdec is used (filesrc ! fladec), that performs relatively ok, but I wanted the best performances, so I wanted to use FFmpeg’s FLAC decoder (filesrc ! flacparse ! avdec). This also has the advantage that is more proper, album art works properly, seeking works properly on the N900, and it doesn’t require special tracker custom metadata extractors.

  9. The flacparse element mentioned here was clearly labelled as “bad” plugin with a rank of 0 and not used in a normal flac playback scenario, for good reasons. I believe it attempts to parse the bitstream more fully and that’s probably one of the reasons it’s slower than the ffmpeg parser

    No, it’s not slower than ffmpeg’s parser. It is four times slower than ffmpeg’s _decoder_. I find that fact very amusing, bad or no bad.

  10. @René: It was four times slower than ffmpeg’s decoder mostly because it a) essentially was a flac decoder and b) the bitreader was inefficient. It should be 3 times faster as before now after making the bitreader use macros instead of using a function call to get a single bit. More performance improvements to come later.

    @Felipe: FWIW GstBitReader now has functions (well, macros) like int val = get_bits(r, x). The reason for the original get_bits(r, x, &val) was to have a return value to check if you actually have enough data to read x bits. There are now unchecking versions that simply return the value and don’t check anything.
    Apart from that there is a review process for everything non-trivial (unless it’s in gst-plugins-bad), including this change. What’s wrong with the API now in your opinion?

  11. @Sebastian What’s the point of having versions for 8, 16, and 32 if at the end you are returning an int? FFmpeg only has get_bits1 (1 bit), get_bits (16 bit), and get_bits_long (32 bit), that should be more than enough.

  12. @Felipe: The version for 8 bits “returns” (well, an out parameter) a guint8, the version for 16 bit a guint16, the version for 32 bit a guint32 and the version for 64 bit a guint64. What’s the problem with this?For the unchecked versions this is probably a bit useless (but they’re only macros anyway) except for the 32 bit and 64 bit version but for the checked versions with the out parameter this is very convenient as it allows you to store the result in whatever data type you want.

  13. @Sebastian: I don’t see the point of the checked version anyway, but I thought we were talking about the new API (unchecked), which always returns guint (even for 64 bits; probably a bug). Of course I would have pointed that out had the patches arrived in my inbox.

  14. @Felipe: It’s returning guint##bits . You could subscribe to the gst-cvs list to get all commits in your inbox ;) The point of the checked versions is that they check if you have enough data available to read what you want.

  15. Will this eventually have any benefit for N800? Yes that’s an “eight” in front.

  16. @Sebastian Oh, I didn’t see the ##bits… Well, I still think that’s not needed, just use an int, anyway, the compiler will promote all those to ints, and even if you assign to uint8_t the conversion will happen automatically.

    @Joseph Yeah, it should benefit N800 too, although the performance can’t be that great because it doesn’t have NEON.

  17. @Felipe: Well, for the functions with the out-parameter it is definitely needed. Consider code like this:

    uint8_t x;
    gst_bit_reader_get_bits_uint32 (reader, &t, 2);
    => the compiler will complain that x is not a 32 bit integer.

    But for the unchecked versions that really return the value you’re right of course. They have the 8/16 bit variants only for consistency reasons and as they’re only macros anyway they cause no bloat either.

  18. @Sebastian Again, I’m talking about the only ones I think matter, the “unchecked” ones. Whether or not they constitute bloat depends on your definition, or the compiler, but what I do know is that it makes the API ugly, and the code using such API.

    Compare:
    v = gst_bit_reader_get_bits_uint8_unchecked(r, 5)
    v = gst_bit_reader_get_bits_uint16_unchecked(r, 12)

    Vs:
    v = get_bits(r, 5)
    v = get_bits(r, 12)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s