As I was reading the Sunday newspaper, I was drawn to the front page article titled "VA radiation errors laid to offline computer", which can be found here - http://www.philly.com/philly/news/homepage/51107782.html
. Here's the two minute drill on this article in "Cliff's Notes" format (a key component of my high school learning years):
Veterans that were being treated at the Philadelphia VA Medical Center were not receiving the proper amount of radiation for their ailments. Some were getting overdoses of radiation and some were simply not getting enough. This became evident and investigators were brought in to diagnose the issue. Could it have been an advanced calculation problem or failed logic in one of their key applications? Was it possible that human error occurred during the formulation process? No, it wasn't any of these issues. A PC that runs important radiation dosage calculations was UNPLUGGED from the network for a year!
This story was of particular interest to me as an IT professional for obvious reasons. It really is the worst case scenario in my mind. A simple computer issue could have (any may have) affected lives and could have been easily caught and resolved with the proper procedures and tools. Here are a few of the main lessons that I think can be learned here:
1. Methodology - The inventor of the OSI model is smiling somewhere. All networking issues work themselves from the Physical layer (i.e. cabling) and work their way up the stack. This would have been caught within 30 seconds of troubleshooting, given that the issue was identified.
2. Process - Even after the PC was plugged back in, errors were still made because there was no formal procedure to review the radiation dosage calculations. This is a training and process issue, not a technology issue.
3. Monitoring - This problem could have easily been identified and solved by a "managed services" provider or simple monitoring tool. To learn more about managed services, click here.
4. Best Practices - I am assuming that the PC was located in a common area on a Workstation OS (not a Server OS). It should have been running a Server OS and locked securely in a server room rack.
5. Brand - The university has suffered a setback and its reputation has been compromised due to such a simple technology issue. It will take time to re-establish that trust and build up it's name. Some very basic and inexpensive precautions could have prevented this costly mistake. Did I mention that more information about managed services can be found here?