How msn-pecan fixed a 6 year old bug, how Pidgin didn’t, and stole the fix

The bug we are talking about is the infamous switchboard timeout error which was very elusive, it happened randomly, and very often for some users in unknown conditions. Essentially you send a message, and after one minute you receive a notification telling you the message never arrived, after which you need to resend the message, and hope it will arrive this time.

Let’s see how the two projects approached this bug.

Pidgin

There have probably been many bug reports regarding this issue, but it’s very difficult to find old historic bugs in Pidgin’s new and old tracker. The modern version is reported in Pidgin’s tracker as #3330. There you can see people saying it happens a lot, that the priority should be increased, and many tickets were marked as duplicate. Developers however stayed in denial mode: they say it doesn’t happen to them, and then turn to the usual strategy: ask for irrelevant information such as a valgrind log, and to try again as it might have been magically fixed.

Then they try a simplistic workaround; re-send the message on failure. This doesn’t work on most cases, and even when it does, the message arrives more than 1 minute late. As usual, no developer did much else about the bug.

In the mean-time, Adium had many reports (2475, 2395, 6316, 6708, 6952, 7288, 9978, 11045, 11398, 11478) of the same bug. At this point something was very clear: it happens more often in OS X.

The interesting point is when Rasmus Hummelmose, an Adium user, logged to IRC to rant about the problem. He received the same response on both #adium and #pidgin; it’s a server problem, or it’s your slow connection, there’s nothing we can do. That didn’t convince him (it wasn’t true) and he effusively tried to explain that the issue was real and was affecting many users. He didn’t achieve anything more than upsetting the developers.

This is not the way to solve an important bug.

msn-pecan

The msn-pecan team on the other hand thought: hey, there’s a bug, and this guy can reproduce it, let’s fix it. I invited him to #msn-pecan. Rasmus was a bit reluctant; Why loose time with msn-pecan developers? Surely Pidgin developers must be capable enough to do the job. He changed his mind when I explained that the core parts of libpurple’s (Pidgin) msn were either developed or refined by me anyway, and therefore, Pidgin devs probably didn’t have the expertise required to identify this problem.

With that we started an endeavour to fix the problem through the weekend. I started by providing some infrastructure changes in order to visualize what was actually happening, Devid Antonio Filoni created Adium builds, and Rasmus tested, and provided feedback. We made some conjectures and discarded them with further testing and fixed some bugs along the way until we found a reliable way to reproduce: send a message, wait for 15 minutes of inactivity, and send another message.

After this is was clear that something bogus was happening with the network connection, but since we cannot fix all the elements involved, we implemented a simple fix: close the connection after 1 minute. That worked perfectly. Rasmus was happy, and we were too :)

That‘s how you do it.

The stealing

Logically after our success, Rasmus decided to rub the fix on the face of Pidgin and Adium developers, after all, they were the ones that said it was not a bug. But they were not impressed.

However, Daniel Ljungborg (aka Dimmuxx) was interested in the fix, and in good faith I pointed the commit message that explains the issue in detail.

Then I find out Ka-Hing Cheung, a Pidgin developer, implemented the fix as I described, but thought it was OK to not thank anybody, explain where the fix came from, or mention the msn-pecan project, or any external source at all. We (Rasmus, Devid and I) spent a weekend of our free time working hard to identify, fix, and verify the issue, and if you read the commit message you would think they came out with the solution:

Author: khc@pidgin.im
Timeout switchboard connections at 60 seconds, should Fixes #3330 for most people.

That is plagiarism, pure and simple, and unfortunately, it’s not the first time.

msn-pecan 0.1.0-rc1 ready for testing; on the way to the first serious release

The next msn-pecan release started as 0.0.20 but there are so many changes that
it’s going to be 0.1.0. It is way more stable than 0.0.19 but we still would
like to do more extensive testing, so we are rolling a release candidate in
order to fix critical bugs that might be lingering. Hopefully it will be the
only release candidate before the actual release.

The aim of 0.1.0 is going to be our “first serious release”, that doesn’t mean
the previous releases were bad, it just means that we were never truly
confident about the code being delivered until now.

Compared to 0.0.19:

  • Timeout issues fixed (switchboard error)
  • Better offline messages receiving support
  • Offline message sending support
  • Reorganization of P2P code (less crashes)
  • Several crash fixes
  • Adium improvements
  • Performance improvements
  • Massive code reorganization

Special thanks for Devid Antonio Filoni, and Andrea Piccinelli who have been
very active fixing issues and making sure msn-pecan is rock-solid. Also Rasmus
Hummelmose who was essential in fixing the timeout issues, it wouldn’t have
been possible without his testing. Also thanks to the Pidgin developers (we
picked some patches), and many other contributors.

This is the list of issues fixed so far:

  • 37: Pidgin leaves handle on files after transfers
  • 82: Implement sending of offline messages
  • 117: Received offline messages are being cut
  • 138: Translation is not whole integrated from Launchpad
  • 144: Unable to chat after message timed out
  • 155: Pidgin crashes after connecting (using NTLM Authorization Proxy Server)
  • 156: msn-pecan crash in msg_ack() at cvr/slplink.c:321
  • 157: msn-pecan crash in msn_switchboard_can_send() at switchboard.c:779
  • 158: msn-pecan crash in msn_switchboard_free() at switchboard.c:262
  • 159: Pidgin crash when connecting to MSN
  • 161: 0.0.19 ubuntu package
  • 163: Translations not working on win32
  • 164: msn-pecan crash in pecan_contact_get_personal_message() at ab/pecan_contact.c:616
  • 170: Crash upon sign in
  • 171: crash when disabling account
  • 174: Windows 7 RC and Pecan
  • 177: Offline messages of blocked contacts should not be displayed
  • 181: Too many timeout messages
  • 183: msn-pecan should use audio:// links with pidgin 2.6.0
  • 184: already showed OIM message show again using another client
  • 185: Add support for receiving winks
  • 133: pidgin crashed with SIGSEGV in msn_message_destroy()
  • 154: Pidgin Randomly Crashes
  • 166: proxy authorization support missing
  • 153: User Adding Problems

The diffstat is huge:

44 files changed, 3423 insertions(+), 3116 deletions(-)

For the source tarball, win32 installer and maemo package check the usual location:
http://code.google.com/p/msn-pecan/downloads/list

And the Adium build is here:
http://code.google.com/p/msn-pecan/wiki/AdiumBuilds

So, start the testing! And please report back any issues :)

Here is the current list of pending issues for 0.1.0 final:
http://code.google.com/p/msn-pecan/issues/list?q=label%3Amilestone-0.1.0

Finally here’s the shortlog:

     6  Andrea Piccinelli
     1  Chris Stafford
     1  David Geary
    29  Devid Antonio Filoni
     1  Devid Filoni
     4  Elliott Sales de Andrade
   214  Felipe Contreras
     2  Mike Ruprecht