The bug we are talking about is the infamous switchboard timeout error which was very elusive, it happened randomly, and very often for some users in unknown conditions. Essentially you send a message, and after one minute you receive a notification telling you the message never arrived, after which you need to resend the message, and hope it will arrive this time.
Let’s see how the two projects approached this bug.
There have probably been many bug reports regarding this issue, but it’s very difficult to find old historic bugs in Pidgin’s new and old tracker. The modern version is reported in Pidgin’s tracker as #3330. There you can see people saying it happens a lot, that the priority should be increased, and many tickets were marked as duplicate. Developers however stayed in denial mode: they say it doesn’t happen to them, and then turn to the usual strategy: ask for irrelevant information such as a valgrind log, and to try again as it might have been magically fixed.
Then they try a simplistic workaround; re-send the message on failure. This doesn’t work on most cases, and even when it does, the message arrives more than 1 minute late. As usual, no developer did much else about the bug.
The interesting point is when Rasmus Hummelmose, an Adium user, logged to IRC to rant about the problem. He received the same response on both #adium and #pidgin; it’s a server problem, or it’s your slow connection, there’s nothing we can do. That didn’t convince him (it wasn’t true) and he effusively tried to explain that the issue was real and was affecting many users. He didn’t achieve anything more than upsetting the developers.
This is not the way to solve an important bug.
The msn-pecan team on the other hand thought: hey, there’s a bug, and this guy can reproduce it, let’s fix it. I invited him to #msn-pecan. Rasmus was a bit reluctant; Why loose time with msn-pecan developers? Surely Pidgin developers must be capable enough to do the job. He changed his mind when I explained that the core parts of libpurple’s (Pidgin) msn were either developed or refined by me anyway, and therefore, Pidgin devs probably didn’t have the expertise required to identify this problem.
With that we started an endeavour to fix the problem through the weekend. I started by providing some infrastructure changes in order to visualize what was actually happening, Devid Antonio Filoni created Adium builds, and Rasmus tested, and provided feedback. We made some conjectures and discarded them with further testing and fixed some bugs along the way until we found a reliable way to reproduce: send a message, wait for 15 minutes of inactivity, and send another message.
After this is was clear that something bogus was happening with the network connection, but since we cannot fix all the elements involved, we implemented a simple fix: close the connection after 1 minute. That worked perfectly. Rasmus was happy, and we were too
That‘s how you do it.
Logically after our success, Rasmus decided to rub the fix on the face of Pidgin and Adium developers, after all, they were the ones that said it was not a bug. But they were not impressed.
However, Daniel Ljungborg (aka Dimmuxx) was interested in the fix, and in good faith I pointed the commit message that explains the issue in detail.
Then I find out Ka-Hing Cheung, a Pidgin developer, implemented the fix as I described, but thought it was OK to not thank anybody, explain where the fix came from, or mention the msn-pecan project, or any external source at all. We (Rasmus, Devid and I) spent a weekend of our free time working hard to identify, fix, and verify the issue, and if you read the commit message you would think they came out with the solution:
Timeout switchboard connections at 60 seconds, should Fixes #3330 for most people.
That is plagiarism, pure and simple, and unfortunately, it’s not the first time.