Click to Start – Eventually disaster befalls everyone

Once upon a time a wise old computer salesman told me there were only two types of computer owners – those whose hard drives have failed, and those whose hard drives are yet to fail. This sage also told me that as a computer retailer, he expected around 1 in 10 of the hard drives he sold to fail and come back under warranty.

I received this advice 15 years ago, and I’m happy to concede that advancing technologies, and improvements in manufacturing processes have probably improved the failure statistics. But the fact remains: if you are a business which relies on technological hardware to carry out day to day activities, one day you will have a failure. How that failure impacts on your business is a function of your preparedness for failure.

Research conducted by security and risk specialists at Ernst and Young found that companies have a 50-50 chance every year of their key computer systems failing for more than two hours.  1

Despite global governments’ attempts to panic the population following the tragedy of September 11 in New York, in the practical world of business risk, errors by humans in technology development is a greater worry than terrorism. A survey by technology company Veritas in 2003 of disaster recovery preparedness found ‘technological failure ranks highest in the list of perceived threats’, with hardware failure, and software and viruses at the top of the list, followed by fire, hackers and accidental employee errors.  2

Disaster can be caused by any number of factors, and is not limited to aeroplanes plummeting into tall office buildings. But the results are the same, failure of technological systems, and a flow on effect to the businesses and users which rely on those systems for everyday life. One of the positive outcomes of the terrorist attacks on New York has been a far greater recognition by corporations of the importance of disaster and business continuity planning. Consultancy firm Booz Allen Hamilton surveyed 72 chief executives from firms with revenues of more than $US1 billion just after the September 11 attacks, and, not surprisingly found 90% were reviewing their disaster planning documents.  3

All this disaster planning is great, but not much help if it is not taken seriously. Ernst and Young in a survey of 459 Chief Information Officers, IT Directors and business executives found 53% of large companies have business continuity plans designed to ensure their company could recover from, and continue to operate after, a disaster. What’s not such a help is that 21% have never tested their plan, and obviously thus have absolutely no idea if it would work. A triumph of documentation over practical implementation.  4

Our everyday lives are completely reliant on technology, technology which is created by human beings, who make continual mistakes; on behalf of companies which don’t test the human work and which, despite knowing that there is a better than even chance of the technology failing, write a report about what they think they might do if failure were to occur.

I’ll give you a practical example. We’ve just had a run of dramas in our office, over a couple of months. For a while we’d been experiencing problems with our main corporate server, we’ve had it for several years, and it was running the older Windows NT operating system. Once or twice a week it had a habit of crashing in the middle of the night. No one could work out why, but my best guess is it was related to a virus infection we caught some time ago – despite running anti-virus software.

Then one morning it crashed, and kept crashing, however many times we rebooted. The problem this time was deep inside the mail server part of Windows NT. Our computer support company couldn’t find a solution which didn’t involve completely wiping everything off the server and starting all over again. Given the age of the machine, and its hardware and software we took a deep breath and ordered a new server. It’s a gorgeous HP box, with the latest Windows 2003 server software and all the gadgets. The changeover took three days, during which we were without email a great deal of the time. Cost: $17,000.

Eventually everything settled down. Then last week we moved office. We planned everything down to the last 30 minutes with our internet provider, agreeing a specific time when we would grab all our servers from the old office, race down to the new office, they would cut over the internet connection, we’d boot up the boxes, and everyone presumed it would work fine.

Yeah right. I plugged in all the workstations. One went BANG and a puff of smoke floated up. Scratch one computer.

We plugged our internet router into the wall – no internet. Called the internet company – they thought we might need a new router. So they couriered one over a few hours later. No dice. So someone came to our office. Couldn’t fix it. Our staff were standing around – as we’re an online company, having no internet access means they can’t work. So we sent them home early.

By the end of the day we still were not online. I was standing outside the Victorian Arts Centre still on the phone to the computer company at 7.25pm, trying to get into a 7.30pm show. Nobody had a clear idea why, given nothing about our setup had changed except our address, the internet access shouldn’t work.

Cut to next morning. Still no access. We switched over to our backup internet connection – a standard Telstra ADSL connection. So now the staff could work online on the web, but still no email because that relied totally on the industrial strength internet connection working.

We wound up with three technicians onsite, and enough combined brainpower to put the space shuttle into orbit. By 3.45pm – 15 minutes before an Arts Hub bulletin publication deadline they got the internet connection working. The solution? A new cable. A cable which, they swear, to all intents and purposes was pretty much the same as all the other cables they had tried.

Ask yourself these questions:

1. How reliant on technology is my business?

2. How would my business be affected if:

a) One computer died

b) The server died

c) The internet connection died

3. How long can you survive if one or more or all of the above occurs?

4. What’s my backup plan, what will we do? Do my staff know what to do?

Will you all run around like headless chooks? Or do you have a documented plan of action, available to all staff, ready for implementation? Does the plan include the contact details for all of your technical support companies, including out of hours numbers (computers normally don’t break down during business hours)? Does it include all the vital information like administrator passwords, user names etc?

Simulating disaster is easy. Try turning off your file server and internet connection – how long could you last. Two hours? An afternoon? A day? Two days? Eventually disaster will always strike. And like the boy scouts say, you must ‘be prepared’.

1.  Link

2.  From “Veritas Disaster Recovery Research 2003”, an independent market research report by Dynamic Markets commissioned by VERITAS Software Corporation, surveyed IT managers with responsibility for their company’s disaster recovery plan in large companies across the United States, 10 European countries, the Middle East and South Africa. Link

3.  “How Corporate Security is Reshaping the Post-9/11 CEO Agenda”, Booz Allen Hamilton, 2002. Link

4.  From an Ernst and Young ‘Fast Facts’ entitled “Continuity and Availability Planning are Critical to Mitigating Systemic Risk”, Link