- by David, "Q-Tips", Quiram
Here are some non-technical common sense things to avoid doing during a full blown disaster recovery, with or without an established plan. Ten things to not do that aren't always covered in plans or even discussions of disaster recovery.
- Panic. No really, don't panic. Things will work out or not. Panicking will just get the client nervous, your staff nervous, and you unfocused.
- In the words of Colin Powel - "Get mad, and then get over it." Don't get angry or if you are, vent and get over it. The emotions will limit your outlook and lead you to bad choices.
- Don't deviate from the plan. Stick to the plan. If you don't have a plan, make one up before you start doing anything. Know the next step you’re taking before you finish the step you’re at.
- Do not wing it….this leads to the bad place. Document your steps you have taken and notes on what you plan to do next. You WILL be interrupted and your train of thought will be broken and you WILL lose that one thought that takes you to the next step. Notes will give you the tool to get focused on the process you were working on.
- Do not work all night. Having you or your staff exhausted will only hinder the process. Take breaks, and space out staffing. Don’t get wrapped up in the drama of the moment. Remember, you are the calm one steering the company through the storm.
- Don't go quiet. Communicate to the stakeholders involved. They want to know what is going on. Control the communication flow so you can control the interruptions to you and your staff. Communicate to your staff. This will calm nerves and focus the process.
- Don't consume beverages with high amounts of caffeine and sugar, the so called energy drinks. As much the iconic IT staffer is tied with Redbull, Monster, and Mountain Dew, this will just hinder the process. These will burn you out quicker and break your focus.
- Don't consume alcohol. This should be a no brainer.
- Don't do it all yourself. Delegate to your staff. If you are stuck reach out to others for assistance. There are support contracts in place just for this, use them. You have associates in the IT field you can tap for a sounding board, use them!
- Don't forget to follow up after the recovery is done. The recovery process will have shown shortfalls in the plan, if there was one. It is essential to have a de-brief of the event to understand what happened, how it can be prevented, and how to make the process better. You just went through all that pain, learn from it.
- by David, "Gingerbread", Quiram
On Monday there was an issue with the Google mail system that affected 40,000 users (though there are some reports of up to 500,000 from non-Google sources). The users could not access their email and contacts during the outage. At first it was suspected the data was lost, but Google later reported that was not the case, the data was inaccessible due to a storage software update issue. The data access was resorted within 24 hours.
I have made the recommendation in a blog back in September of using Gmail as a tool to make your business more resilient to a localized disaster (i.e. loss of building, vandalism, theft) and keeping your email accessible during the recovery. I still recommend it. In fact in light of how the issue was handled by Google, I recommend it even more. .
Google addressed the issue quickly and fixed it. The storage software update was halted as soon as the forums were lit up with requests into why users "lost" their emails and contacts. Think about it, the forums got the news of issues. Then that information was communicated through the company to the Engineers who identified the problem and stopped it from affecting other storage sites. It isn't like these people all work in the same room. Google is huge well over 20,000 employees. The restore portion of the event is what took the 24 hours.
That being said I am sure that there are those who are not convinced and see this outage as just another in a line of outages from 2009, two years ago. Yeah, there were several outages in 2009 and one this March. Things happen. No computer system is fail-safe. Nothing is, things fail. It is having a DR plan and protocols to follow during outages and disasters that make up for it.
Let’s looks at the pros and cons here from this most recent outage:
- You are relying on someone else to run your email accounts.
(Well you would be anyway; someone has to manage it be it on site staff, Managed Service Company. Having Google do it is free and they will maintain their technology and provide the staff for it)
- Google is not immune to outages….frankly no one is...not even the uptime of 99.99999%, there is still some outages in there. So there will be downtime.
- The responsibility of communication is on Google's side. You can contact them about issues, but there is no recourse of action if they do not communicate back.
- Google has shown that they are committed to make the Gmail system work and provide the service. They will get it fixed and they have the right people to do it. This particular issue will not happen again. Ever.
- Google responded to the issue quickly and stopped it from affecting other storage systems. Less than 1% of the total Gmail users were affected.
- Google still has the redundancy of hot sites and physical backup. Services which are out of most business price range. This service usage is Free!
- Comparatively the amount of downtime that business experience with their own internal email is much longer than the 24 hours experienced by Gmail users.
Overall using Gmail is not a larger risk after the latest outage. There are the same risks as before that is inherent in using cloud resources. Google has shown that they have the capability and drive to recover from outages that just cannot be matched by smaller business and it is free. If you'd like to hear more about specific Disaster Recovery plans, please contact Trigon at your earliest convenience! If you'd like to hear more about Google Apps, contact our friends at Mosaic.