As of writing LiveJournal has returned to regular functioning and we can give you some details as to what happened and the consequences.
On 15 May at 03:22 pm PST, the LiveJournal status monitoring system informed us of an error that occurred in the 7th user cluster (cluster name is Bratwurst¹). Analysis showed that several database structures were corrupted, which required correction and subsequent data recovery from a mirroring server. Initial repair estimates were 5-6 hours, so in an effort to avoid data loss in the event of the mirroring server malfunction, we made the decision to switch off the whole cluster. While the 7th cluster was being restored (which in itself is a resource-intesive process, as it is necessary to send large volumes of data over a local network which is already occupied by processes related to the correct functioning of LiveJournal) our monitoring system informed us of an analogous error on the 9th cluster (cluster name is ChickenTikka²). This required the same procedures that were implemented for the 7th cluster, but given that our network was already operating at full capacity due to the work being done to fix the cluster, the 5-6 hour fix estimate increased significantly.
All users located in these cluster lost all access to LiveJournal, irrespective of which pages they tried to visit. Given that some of the information on users and their posts is collected from clusters in real time, and both the 7th and 9th clusters were temporarily turned off, other users (who were not from these clusters) also experienced situational problems with LiveJournal. For example, if an entry from one of the affected clusters was due to appear in a friends feed, the feed would not load and give a 500 or 503 error. Similar behaviour was experienced if your inbox contained a message from someone from the 7th or 9th clusters. If a user has a friend from the affected cluster in one or more of his friends groups, the post creation page would not be accessible. As a result of this incident LiveJournal was working with varying degrees of stability for different users.
As of 09:12 pm PST 35% of the 7th cluster is restored, and 1.5% of the 9th cluster.
10:51 pm PST: UC7 — 50%, UC9 — 5%.
To optimise the network load, the restoration of the 9th cluster was postponed until the 7th cluster was fully restored. The 7th cluster was fully restored at 03:41 am PST and work on the 9th cluster was resumed. At 08:09 am PST the system informed us that some problems were being experienced in the restoration of the 9th cluster, which required intervention. As a result, the 9th cluster was only restored by 05:42 pm PST, while LiveJournal functionality returned to normal at 06:40 pm on 16th May.
Some users from the 9th cluster may still experience difficulties in seeing posts created in the last couple of days. We are working on restoring them from the archives. If any of your entries have gone missing, we would appreciate it very much if you bring it to the attention of the LiveJournal Support staff (http://www.livejournal.com/support/submit.bml), as it will help us to complete the process significantly faster.
It is worth noting that scheduled entry and notification functionality was disabled during the down time. After restoration these services are once again functioning, so scheduled entries were posted with a delay and the notifications queue has grown quite large, meaning it will be another few hours before the system catches up.
We have yet to identify the reason for the almost simultaneous malfunction of two clusters and it will take some time to investigate. We will inform you the results of our investigation once it is completed.
All users of paid services will be compensated with a one-week extension due the downtime.
We sincerely apologise for any inconvenience this has caused.
¹ Bratwurst — a veal and/or pork sausage prepared with a mixture of herbs.
² ChickenTikka — an Afghan national dish popular worldwide from chicken breast marinated in yoghurt with spices.