The Microsoft Troubleshooting Methodology – Determine Root Cause

I simply cannot count the times I’ve heard someone vaguely describe some computer problem and then immediately follow it up with their resolution. That resolution is almost always one of these gems:

  • Reboot the computer
  • Reboot the server
  • Reboot the {insert your favorite IT component here}
  • Defragment the hard drive (even solid state drives)
  • Ipconfig/release and Ipconfig/renew (even static IP addresses)
  • Run Checkdisk (chkdsk.exe) on the hard drive
  • Delete the temp folder
  • Delete the browser cache
  • Reboot the computer again
  • Uninstall and reinstall the application
  • Reinstall the operating system
  • Rename the computer to I D 1 0 t
  • Reboot the computer, third time’s a charm
  • Just wait, the problem goes away on its own

Some of those steps are valid resolutions to known problems. You should understand how each step (except perhaps the ID10t step) affects a computer and what problems it is likely to actually resolve. But these can be used for right or wrong reasons. Taking any of these approaches without fully understanding the problem and identifying the root cause is a tragically wrong approach to troubleshooting.

Consider when you go to the doctor. You give your doctor a vague symptom like, “My head hurts.” If his immediate response, with no further examination, is, “Take some Vicodin!” you know he’s a quack. He might be right – sometimes the symptom is resolved. But you might do more harm than good by covering up an even worse symptom, perhaps a brain tumor, or the resolution just makes matters worse, like if the pain was actually a bad reaction to a similar drug you’d already taken.

This article is about taking the time during the troubleshooting process to identify the root cause of problems in Windows. It isn’t about jumping straight to a fix or getting things up immediately.

Quick Reminder of MikeDan’s Quick and Dirty Troubleshooting Methodology

In a previous article I explained how I follow a simple and straightforward troubleshooting methodology. That article shows the way this methodology works and how it functions as a troubleshooting framework for almost any computer problem. Here’s a reminder of my approach.

 

 

Figure 1. MikeDan’s Quick and Dirty Troubleshooting Methodology.

Identifying the symptom is the most critical step, and is the focus of this article.

What’s The Apparent Symptom?

The common initiators for the entire troubleshooting process are either undesired behavior or an error message.

The error message is certainly easier to focus on. It is tangible, usually specific, and I can persuade anyone to read it to me verbatim or grab a screenshot. Take a look at Figure 2 for a rather clear error message.

system-error-microsoft-troubleshooting-methodology-determine-root-cause

Figure 2. A Windows IP address conflict error.

But don’t think for a moment that an error message is always informative or even leads to a better understanding of the problem. Consider Figure 3 as an antithesis of all useful error messages.

Figure 3. I hope you never see this error message.

Error messages are often the first step. Complaints and problem descriptions are also pretty common. Neither usually identifies the root cause immediately. You must consider what other symptoms might be occurring to be able to focus on identifying the cause.

Find the Edge and Go Beyond

When you find an error or identify a symptom, are there other symptoms? For example when I encounter the error shown in Figure 2, I would ask myself or the user these questions immediately.

  • Are other computers also receiving a similar error?
  • Is the affected computer still communicating on the network? If so, all communications are working or some?
  • Has there been a recent change to the computer or network, and is there any possible correlation between the symptom and that change?

And finally…

  • Is the error message telling me the truth?

It’s that last question that I really love. Some error messages or symptoms have many possible causes. And the knee-jerk response to a symptom in those cases just wastes time and fixes nothing.

To illustrate the point, the error message in Figure 2 has nothing to do with an IP address conflict. It was actually the result of duplicate media access control (MAC) addresses. Because Windows doesn’t have a Duplicate MAC Address message or a MAC conflict alert, it displayed the closest thing. While Windows certainly had good intentions, stopping at that symptom would have Windows leading me in the wrong direction. Going beyond the error or first symptom is critical to gathering enough data to find the root cause of a problem.

Conclusion

To really nail down the symptom I start with the apparent stuff: exact error messages, specific symptoms, etc. As quickly as possible I try to gather more data to determine if other symptoms exist and whether the symptom is a false lead. I do all of this before I fix on any single cause. Gathering enough data and asking a few key questions up front are often the difference between chasing false causes and determining the true problem.

Stay tuned for future articles on root cause analysis and creating, testing, implementing, and verifying a resolution.

Mike Danseglio -CISSP / CEH
Interface Technical Training – Technical Director and Instructor

Posted in CompTIA, Windows 7 | Posted in , , , , , , , , , , | Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">