Outage – Explosion at The Planet

Dear Members,

Introduction
We would like to offer you our sincere apologies for the interruption in service many of you experienced last weekend. During the outage our forum remained online and we attempted to keep all of our members up-to-date via our Service Status announcements.

What happened?
On Saturday 31 May at approximately 11pm GMT there was an explosion in The Planet Data Center in Houston Texas. Electrical gear shorted, creating an explosion and fire that knocked down three walls. Thankfully there were no human casualties.

On the instructions of the Fire Department, The Planet then turned off all power to the Data Center resulting in 9,000 servers being knocked offline.

How did the outage at The Planet affect StatCounter?
This affected StatCounter in a number of ways:

  • Some of our database servers went down
  • Our dns servers were temporarily offline
  • Some of our incoming mail servers went down
  • Our blog was unavailable
  • Some of our web servers went down
  • Some of our partitions were knocked offline

How did this mean for StatCounter members?
Different members were affected in different ways.

Members with projects on the following partitions were most seriously affected:
c1 (PN 0), c7 (PN 6) , c8 (PN 7), c14 (PN 13), c16 (PN 15), c17 (PN 16)
These members lost between 24 and 30 hours of stats over Sunday GMT and part of Monday morning.

Members with projects on the following partitions were unable to log into their accounts for a number of hours following the outage but stats continued to be recorded:
c4 (PN 3), c5 (PN 4), c6 (PN 5), c12 (PN 11)

New members and people who had just created new projects with StatCounter in the hours immediately prior to the outage temporarily “lost” their accounts/projects. This is because, since these projects were not on our last back-up, restoring the back-up did not “bring up” their projects. In this case, our advice is to generate a new project and re-install the StatCounter code on your site.

All other members lost about 3-5 hours of stats from approximately 11pm GMT on Saturday night. In addition members experienced difficulties reaching the StatCounter site and logging into their accounts.

As servers at The Planet come back online we continue to work to try to recover as many stats as possible to minimise the loss of information experienced.

Why doesn’t StatCounter have its own Data Center?
By outsourcing our server technology, we can keep costs down, minimize downtime and devote more resources to our members.

Why was StatCounter using The Planet?
StatCounter is powered by over 130 servers. These are located in a number of Data Centers in the United States and in Ireland and are spread among a number of hosting providers although our main hosting partner is The Planet.

We chose The Planet as our main hosting partner as they are the largest dedicated hosting service in the world and due to the apparent reliability of the service they provide.

We believed The Planet to be one of the most reliable and redundant data center providers in the world, particularly as they host servers in multiple centers in Houston and Dallas.

From The Planet website:
With multiple state-of-the art data centers located in Dallas and Houston, Texas, The Planet provides On Demand IT Infrastructure backed by complete redundancy in power, HVAC, fire suppression, network connectivity, and security. So if any of our data centers experiences a disruption for any reason, your eggs (or servers) are never in one basket.

The Planet have let us down, and in turn, we have let you down. For this we are truly sorry.

What did The Planet do wrong?
Accidents are a fact of life, however, we believe that, had The Planet operated in the professional manner we expected from an organisation of its standing, the disruption experienced could have been substantially lessened.

For example, The Planet have hosted our DNS for a number of years, however, it was only this weekend we discovered that, although our DNS servers are on different subnets within The Planet, the servers are actually all in the same location. We will be submitting a complaint to The Planet in this regard. We fully expected that The Planet would have implemented a geographical spread in our DNS servers – this was not something that we thought we would have to request or confirm – particularly since we have servers spread through all The Planet data centers. We have now secured the services of a new geographically spread, redundant DNS provider.

We also feel that the extent of the damage could have been acknowledged and communicated by The Planet in a more timely fashion. While we decided to implement our back-up plans early on, others waited many hours in the hope that The Planet would come back online, only to find that restoration of service was continually delayed.

In addition, we found that our efforts to communicate with The Planet were largely ignored or dismissed with a “template” response. This was particularly galling as we received a presentation glass globe (see below) and a letter from The Planet CEO on FRIDAY thanking us for being one of their largest customers… then Saturday… THIS!

Thank you For making a world of Difference – The Planet ???

Why couldn’t The Planet get the Data Center back more quickly?
We don’t know. Hundreds of angry customers have been asking this question.

What action did StatCounter take when this accident happened?
We immediately began work to restore full service as soon as possible.

  • Initially, and in the absence of any official information from The Planet, we worked to establish exactly what was causing the problems.
  • We started a thread in our Service Status forum to advise our members of the situation – this thread has been updated continuously.
  • We added a notice to out homepage to advise members that service was limited.
  • We procured the services of a properly redundant and geographically spread dns service and re-routed all our servers immediately.
  • We prioritised the restoration of all our affected partitions from our latest back-up taken in the hours before the outage in order to resume tracking stats.
  • We configured new servers.
  • We redirected web servers which were temporarily down due to the outage.
  • We responded to as many tickets as possible to try to explain the situation to our members.
  • We migrated our affected mail servers to a new data center.

How will StatCounter prevent this happening in the future?
The bitter irony of this recent episode is that we have been working on our new beta system since September last year. We decided to develop this new StatCounter system for a number of reasons, one of the major motivations being to improve the architecture of our system so as to insulate it against major outages such as the one just experienced. Considering that we have never before experienced an outage of this magnitude, we are bitterly disappointed that our new system was not up and running before this episode.

Once “normal” service is restored, work will continue on the beta project as planned. The sooner we launch the beta, the sooner we can minimise our vulnerability to this kind of outage.

I’m not happy – how do I complain?
We completely understand why you feel aggrieved. Should you wish to submit a complaint to us please do so by logging into your StatCounter account and clicking the “support” link in the top menu bar. Within this area you will be able to submit a ticket to us – we will endeavour to respond to you as soon as possible.

How do StatCounter feel about what happened?
We are so desperately sorry that any of our customers had to experience this outage. We also feel so humbled by the numerous messages of support that we have received. At a time when we feel so terribly for the interruption in service suffered by some of our members, we have been just bowled over to receive so many of messages of encouragement. While we always knew that we had a great bunch of members, your support and patience throughout this episode has been nothing short of incredible and served to help maintain team morale in a very difficult situation. We are so grateful.

Conclusion
We hope this blog post has gone some way towards summarising the main issues relating to the recent outage. Work continues to restore full service. If you have any comments or queries, please do post them below.

219 Comments

  1. Go ahead and continue your fantastic job. I’m sure the StatCounter team is doing his best to continue their project, congratulations !!!

    Conclusion : Incidents may happen … no one is responsible … 😉

  2. You guys are fantastic! I love everything about you and your service! Having said that, I have two small pieces of humble criticism…

    1. Bad things will always happen (PayPal, The Planet). It’s how you handle it that counts. You don’t have to chuck those companies under the bus! Sure, it’s not your dirty laundry – it’s theirs – but you’re airing it! Rise above. Show some class.

    2. In this day of leet speak and texted Twitter blogs, good writing is rare. This blog is well written, just one very small grammar tip to point out… subject-verb agreements. “The Planet” is a singular entity and requires a plural verb, as in “The Planet has,” not “The Planet have.”

    Collective nouns can go either way depending on how they’re used in the sentence. Consider these examples:

    “The band My Chemical Romance *is* coming in concert.”

    “The members of My Chemical Romance *are* coming in concert.”

  3. It’s only when such things happen that we beneficiaries of the services your dedication has provided appreciate just how much effort has been put in.

    Thanks for everything guys.

  4. I’ve been using Statcounter for some time and believe you provide a tremendous service. Accidents unfortunately happen and you handled the situation like the professionals you are.
    Thank you for the detailed explanation.
    Job well done guys – keep up the good work.
    Rob

  5. We had the same issues with the planet for several sites. Our complaints are largely ignored as the planets customer service has gone WAY downhill over the past few years.

    As a former customer, I highly recommend looking into another hosting company (Rackspace and Midphase are 2 companies that I am using).

    Great work on keeping us updated though!

  6. Stop being so European, and chill. 🙂 You guys have zero to apologize for, and all us who benefit from your service are grateful.
    Cheers,
    Annette
    (a Brit who has lived in the States for 27 years, and who learned to calm down a bit in California, but who admittedly still has her European Moments…!)

    StatCounter Team Response:

    LOL! Thanks Annette. 🙂

  7. Hey guys you are the best. So my attitude is when you have the best why mess with the rest. Keep up the great work.

  8. Another very satisfied StatCounter customer echoing the sentiments expressed by many here – a message of support to a wonderful service. All the best and thank you for the way in which you handled this very difficult situation.

  9. Having been in the IT industry for over 25 years I have to say that Statcounter is a first class operation. You were prepared to the max, you responded brilliantly and you communicated the problem to your users without hesitation. You went above and beyond the call of duty. Great job!!!

    Sincerely

    Michael Relfe
    www.KinesiologyAffiliates.com

  10. you really need some rest.
    the red lettered comment above my projects spells depsite in stead of despite.

    StatCounter Team Response:
    Apologies Liesbeth! You’re right – that’s another typo that we will blame on sleep deprivation – thank you for reporting!

  11. I’m sorry to hear about this trouble and I hope it’s resolved completely soon.

    However I still am unable to login on your site with my account that has been in place for years. I did get in on occasion yesterday but not at all today.

    StatCounter Team Response:
    Hi Kemi,

    Sorry to hear that you continue to have trouble. Are you using Internet Explorer as your browser?

    We have identified that some members are getting a “page not found” error when trying to log into their StatCounter accounts using MS Internet Explorer.

    The solution to this problem is to clear your cache – after this you should again be able to log in.

    To clear your cache go to Tools>Internet Options>General Tab> Delete Files. These instructions may not be exact for every version of MSIE but they will all be similar.

    Hope this helps!

  12. StatCounter is an excelent service and I am impressed by the detailed and compreensive answer you provided. Outstanding. I wish I could get such a transparent response from payed services I know off. Keep up the good job.

  13. Thanks for the detailed information………you guys are #1……….keep up the great work.

  14. All cool dudes!

    Excellent service you guys provide anyway. This is the first time it went down since I started using it, for over a year and a half, so all good! 🙂

  15. Your response was both thorough and quick. I appreciate your dedication to your users. This is good customer service!

  16. Hey StatCounter Guys:

    Since I set my project page as one of my quick boot-up buttons, I did not understand why my project did not show for a couple of days. But after seeing your notice I knew.

    Next time I wonder if you can show your big breaking news on own project pages also.

    Thank you for the hard work again, and please keep it up!

    ;D

    StatCounter Team Response:

    Thanks for the suggestion Kyoko – it’s been noted!

  17. Hi,

    Take it easy.

    Life is like that.

    Best wishes for statcounter and the team.

    🙂

  18. You guys have handled the situation very well. When I was not able to view my statcounter for hours, I immediately understood, it may be due to some technical glitch. But I was not aware about the issue created from the Planet side. Anyways Cheers to the entire Statcounter team for working day and night to sort out this problem. All Good Wishes 🙂

  19. A thoroughly understandable explanation – I don’t think it would be at all fair to blame Statcounter for the outage, especially given that you already had recognised the risks and were hoping to improve redundancy in the new version. Sometimes, these things just happen, and all you can do is communicate with your customers, which you did.

    Having said that, just one minor comment. I bookmark my project so only visit the main stats page through this. This meant that I was never sure whether the error connecting was at my end or yours, and it was only by chance that I thought to visit the Statcounter front page through the address bar, where I found full details. Might it not be an idea to set up an automatic redirect on occasions of mass outage, so that all bookmarks are sent to a central explanation page (e.g. this blog). On the other hand, perhaps because of the nature of the problem with the DNS this would not have been possible anyway. But it is just a thought for next time…

  20. At least you got it sorted quicker than it would take to get a reply from an email to PayPal 😉

    btw, typo: “On Saturday 31 May at approximately 11pm GMT there was an explosion in The Planet Date* Center…”

    StatCounter Team Response:
    Thanks Dan – all fixed. We’ll blame that on the sleep deprivation…!

Comments are closed.

Try Statcounter free for 30 days

No credit card required. Downgrade to the free plan anytime.

Try it for FREE!