How to Avoid IT Failure

July 20, 2016 (1120 Views)

The job of IT is to make sure the technology keeps working so that employees can do their jobs in support of the company making money. If there’s a breakdown in the IT infrastructure, employees can’t do their jobs, and the company can’t make money. Also, some instances have a real cost—data recovery, for instance.

So what are some situations that are potentially disruptive to IT, and how can you anticipate them? What can we do to mitigate those risks, and is it worth it?

Here are a few real-world scenarios businesses can find themselves in, along with bad, better, and best solutions. (Notice how preparedness and redundancy are always the best solutions.)

Power Outage

If your office loses power, it’s going to be hard—if not impossible—for employees to be productive.

Bad option: Buy a diesel generator. The noise and the fumes aren’t so bad, as long as you can still do your work, right? By the way, IT said something about surge protectors, power spikes and power drops, so you’re researching those by candlelight (so you don’t waste electricity that the servers need).

Better option: You have a clean power generator that hooks into an uninterruptable power supply (UPS). If the generator drops or surges, the UPS absorbs it. The UPS itself delivers clean, stable power for up to a couple hours, even if the generator fails. But those UPS batteries were expensive…

Best option: Send employees home for the day. Let them know that if they have power and internet at home, they are welcome to keep working over their remote desktop connections. Or if they want to head to a coffee shop to work, that’s fine too. Your streaming IT guy told you that the remote desktop connection is natively encrypted, so you’re not worried about employees working from an insecure connection.

Coffee shop

Storage Failure

Traditional HDD’s have moving parts, which means eventually they will fail. Rather than pretend that your data is safe on a disc, it’s smarter to plan on what to do when the drive fails.

Bad option: Replace the drive. Install the OS from a CD. Load all the applications. Configure the system to reconnect to the user profile, or restore user data from backups.

Better option: Replace the drive with a solid state drive (SSD) so it’s less likely to fail again. Re-image the drive with a pre-made system image, which contains all the default applications and configuration settings. Connect the user to the roaming profile.

Best option: You don’t have a storage failure because you’re using a zero client. The OS loads from the network, and the user connects to a virtually hosted remote desktop service. If the unit goes bad, swap it out, and the user is up and running in minutes.

Internet Failure

Someone forgot to call digline, and they shattered the fiber-optic cable that supplies data to your building.

Bad option: Wait for the ISP to get repairs done. Employees twiddle their thumbs because they can’t access the files on the cloud server.

Better option: You have in-house data storage, so employees have access to their resources. They can work, but connecting to clients, or using VoIP, or any tasks involving research or communication are going to be challenging.

Best option: You call up one of your three internet service providers because the network is a little spotty today. Two of them report that their lines are just fine, but the third says that a line was cut. You ask them to suspend the line until they can get it repaired, and they say OK.

Data Center Failure

What all can go wrong at a data center? Plenty. They might have a security breach. A hard drive might fail. A server might fail. A hurricane might hit and flood the facility.

Bad option: the data center is in your office, and you’re standing in two feet of water. Fortunately, the power company turned off the power. Unfortunately, you’re staring at thousands of dollars’ worth of soggy hard drives.

Better option: Your office is online, but the outsourced data center on the east coast was hit by a hurricane. You have knowledgeable employees who can do their work with local resources, but you’re feeling exposed because that data isn’t being backed up. Plus, employees have to improvise because their tools and data aren’t available.

Best option: You get a call from the contact at your managed services provider. She tells you not to worry about the hurricane on the east coast. It did hit one of the data centers, but your data is OK because they have it mirrored to a second site. You shouldn’t see any disruption, but if you do, please let them know right away.


You Got Hacked

Maybe it was a corporate credit card that was part of a security breach at a major retailer. Maybe it was just a virus in an email. Cybercrime is escalating quickly, because it’s a low-risk and high-reward tactic. And you just became its latest victim.

Bad option: You’re staring at a message on your screen that tells you your hard drive has been encrypted. It’s asking you to call a phone number and pay $300 in Bitcoin in exchange for the passcode. You’re dialing your IT department to ask them how to buy Bitcoin.

Better option: You’re staring at a message on your screen that has encrypted the contents of your hard drive. The phone rings, and your IT department tells you that it’s OK, the last backup was only three hours ago. Give them 20 minutes, and they’ll have almost all your stuff back.

Best option: Your managed service provider calls you and tells you that the email filters have been catching a lot more ransomware attacks lately. To combat this, they’ve stepped up the filters and double-checked the endpoint monitoring in your infrastructure to watch for unusual traffic patterns. Plus they’re sending an evidence package to the authorities. They advise you to put out a memo to employees to be on the lookout for dodgy emails.

These are just a few ideas about how to deal with unfortunate situations. They are outliers, and not something you’d expect to happen every day. But they can happen—and when they do, it can be catastrophic. To combat this, look not just at the frequency of the failure, but also the potential cost. To mitigate that cost, look for elegant solutions that cover multiple possible failures. You’ll be glad you did.

flickr photo by Drey Roque shared under a Creative Commons (BY) license
flickr photo by Pete shared under a Creative Commons (BY) license
flickr photo by Quinn Dombrowski shared under a Creative Commons (BY) license

(No Comments)

Top Related Posts

Written By

  • David Koffer

    David is a computer whisperer from a little town in Idaho. Although he’s been convincing computers to do his bidding since childhood, he’s studiously tried to avoid the label “nerd” by doing cool things like karate and baking bread. He has a Master’s degree in English and Technical Writing from Idaho State University, where he geeked out on linguistics and persuaded his advisors to let him write his thesis on video games.

Nerdio Blog