During this night (CEST) we upgraded the Jabber server to ejabberd 2.1.5 – this time the Debian package came available very soon. :-)
Tag: Downtimes
Upgrade to ejabberd 2.1.4
At about noon CEST today I just upgraded the server to ejabberd 2.1.4, which is a bugfixing release. Finally the Debian package is available. Sorry for the less than five minute short service downtime. ;-)
Once again the server was hit. At around 8pm yesterday evening the services were unreachable for about 30 minutes, the transports for about 60 minutes.
For sure there will be another incident, but this time everything will get logged. From what we then read out of the logs we will choose how we react. If it has something to do with Nimbuzz, we will block Nimbuzz completely.
Again we suffered a DoS attack. The Jabber services and website were down from ~10:45pm to 00:30am.
For sure we have to do something about this. We are already digging into this, also recieving feedback from other Jabber server admins. Maybe abusive Nimbuzz accounts are the reason and, if this turns out to be true, we are also thinking about firewalling Nimbuzz.
What would you think about blocking Nimbuzz? Are you having friends from Nimbuzz in your roster? Tell us your opinion.
…and it seems to work. Sorry for the 30 minute downtime because of that.
Due to until now unknown reasons the server which holds the Jabber services crashed four times in the last not even two hours. From one second to the other the ejabberd processes took every resource they could get, and even more. 8 gigs of RAM and 8 gigs of swap, everything gone. Plus a lot of CPU load. The machine was loaded that “top” refreshed just every 5 minutes and in the end just a hardware reset helped to reboot the machine.
For the tech geeks:
top – 19:56:21 up 31 min, 1 user, load average: 22.86, 13.11, 8.71
Tasks: 240 total, 3 running, 231 sleeping, 0 stopped, 6 zombie
Cpu(s): 1.4%us, 5.8%sy, 0.0%ni, 12.4%id, 80.3%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8190900k total, 8138972k used, 51928k free, 796k buffers
Swap: 8393848k total, 7276916k used, 1116932k free, 42404k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3239 ejabberd 20 0 15.7g 6.0g 460 S 23 76.7 3:06.65 beam.smp
We are looking into this issue. Maybe a severe bug with ejabberd, maybe a DoS attack. We don’t know, yet.
The server was offline this morning because of Kernel and MySQL upgrade. It would have happened faster if the server rebooted cleanly after “shutdown -r now” which it didn’t. So we had to send someone there to reset the machine manually.
We also upgraded Spectrum to support JID escaping. If this works after our tests (there seem to be some problems with clients who don’t support the unofficial % character which is used for @) I write more about this here.
Unfortunally there was a major problem with the database for all accounts of the jabber.hot-chilli.net domain (not accounts from other domains, like jabber.hot-chilli.eu).
Finally we decided to restore a backup from 4th/5th of May 2010 (day of the server move) and had to take the Jabber server down for about 2 hours.
Affected are just the contact lists and contact groups. This means that as an affected user of this you have to add/delete all buddies you changed since then.
We really apologize for the trouble caused, especially because the backup is one week old.
The question remains why we just got 20 rows of data inside our current database backup from this morning, missing 150000 (!) other rows. We will take a deep look into the backup process.