//flex table opened by JP

Click to See Complete Forum and Search --> : Hard Drive failure and lessons learned (the long version)


Target
06-12-2001, 01:34 PM
Just a quick note about a recent hard drive failure I am currently experiencing, and some of the things that I observed that may be of help to you (at least they surprised me).

The drive is a 20gb Maxtor that I purchased probably about a year ago or less. Its been a great drive up to this point, and to be honest, I have not been kind to it http://www.sysopt.com/forum/smile.gif Lots of file transfers, high hours of operation, don't let it spin down when not in use, non-climate controlled environment, etc etc.

Anyway, recently I noticed my system acting flaky at times. The first indication was when attempting to install applications, only to have them not function 100% correctly. The second indication was when the system failed to boot up properly one day, claiming that no system disk could be found.

Entered the bios, and sure enough, the drive with the active partition on it wasn't listed. Not knowing at this point if it was a problem with the bios losing its settings, or the hard disk, I attempted to redetect the drive and it failed. Powered the box off, and decided to try again. After all, I wasn't ready to admit that my hard disk of less than a year was already dead, and I was pretty bummed about the thought of losing the data on it that I had not backed up yet. It took a few times, but I was finally able to get the bios to recognize the drive. This should have been clue #1, since it shouldn't have taken so many attempts. But still, I couldn't be 100% sure that it wasn't the motherboard and its onboard IDE controller, or the cable, or something else.

Anyway, I was able to get the drive to boot up again, so was somewhat relieved that I might yet be able to recover the things I had not backed up in a while.

I tried all the various trouble shooting tricks we sometimes all use (that I could think of anyway). Looked for conflicting software, made sure the source application install CD's were fine, sorted out possible registry issues, all patches installed, drivers all updated and installed correctly, and so on.

Having no luck with any of that, it was time to move on to the hardware. Memory checked out ok, CPU was fine, cables were in order, motherboard was functioning properly, hard disks seemed to be in order as did the other devices attached to the machine.

I was stumped......

For the heck of it, I ran scandisk (for like the 20th time) and then decided to run a surface scan as well. I had been avoiding that because the system has two drives. Not wanting to simply assume it was only the drive with the OS on it causing the issues, that meant doing a surface scan on both drives. At 50Gb worth of drive space, you can imagine how long that takes and why I had been avoiding it....especially when all the previous scandisk runs had reported no issues.

Ok, on to more stumpifying behavior.

Surface scan didn't find any issues either, but the annomolies with the OS and applications continued. No real pattern to when they happened, or a cause, which still had me leaning towards a hardware issue as the root cause. At least with software issues, they are somewhat repeatable if you perform the same tasks in the same way......you generally are able to recreate the problem with some accuracy.

Anyway, I decided it was time to do a scandisk from the Command Prompt only, thinking that maybe windows was ignoring an error it shouldn't be on the hard disks. Scandisk did its thing, and found no errors <grrrr>. My level of frustration was increasing..... Then a surface scan was performed as well. Again, same result....no issues were found.

So, here I was...... Outside of that one time when the bios was unable to recognize the drive, everything was checking out. OS seemed to be ok, application sources checked out, hardware seemed to be in order...I was nearing the end of my patience and ability to get this one figured out, yet the problems with flaky operation persisted.

Even S.M.A.R.T. hard disk monitoring was active and had not given me any indication that the disk was a cause for concern.

I have no idea why at this point I decided to do so, but I dug out the box for the Maxtor drive. Started looking through the one sheet instruction manual it had, and found another Maxtor diskette I had forgotten to take out of the box originally. On that paper, was a small section about a utility that was included on the diskette which would test your hard disk, and could help to determine if a problem existed.

Not really believing at this point that it was in fact a hard disk problem, and leaning more towards a failing motherboard controller, I wasn't optimistic that this utility would be able to tell me anything more than I already "didn't" know......

However, I popped in the diskette and booted the machine anyway thinking what do I have to lose but more time. Wouldn't you know it, that upon bootup, the motherboard didn't recognize the drive again.

Perhaps I was on to something after all...though I still couldn't be sure if it was the motherboard controller, the drive, the cable, or what (even though I had checked them all at least once before).
Decided one more time to ensure that the cable was connected properly to the drive and the motherboard, and in doing so, I happened to notice that one hard disk was noticably hotter than the other. When I say noticably, I mean one was warm, and the other almost burned me to touch it!

Great, one more issue to throw into the mix..... This of course led me to yet another possible cause, a power supply that was not providing the proper voltage to the drive (ie: too much voltage).

Let it cool off, and tried again to get it to boot up. Bios still had trouble recognizing the drive, and at this point I figured it was hosed for good. Still didn't know what exactly the issue was given the things I had already attempted, but I was stuck with a system I couldn't boot to recognize the drive.

Decided to take a break from the darn thing, and got some food and diversion from the machine. When I returned, figuring it couldn't hurt, I tried one last time to get the machine to boot again, recognize the drive, and allow me to run the Maxdiag utility on the diskette (still believing it wouldn't shed any new light on the situation). Wouldn't you know it, it worked! Bios recognized the drive, and I was on my way.

Booted up with the Maxtor diskette and ran the Maxdiag utility. Found the drive just fine, and I asked it to do the "quick scan". Immediately upon launching the quick test, it produced an error. Said it could not perform the test, gave me an error code, and told me to contact Maxtor for a replacement.

Needless to say, I was surprised!!

All of the other tests I had performed, all of the scandisk(s) that were run, and not one of them indicating a problem with the hard drive! Yet, here was this little utility that I had never considered using, telling my my hard disk was bad. I reset the machine and booted again with the diskette and re-ran the test. Same failure and same result........"contact Maxtor for a replacement" which is exactly what I did.

Maxtor service rep was extremely helpful and understanding, and has a replacement unit on the way as I write this.....other than model and serial number, no other questions asked to speak off.

So, what did I learn from all this: ??

That my trouble shooting skillz weren't as finely honed as I thought they were. That utilites I commonly rely on to alert me to problems aren't as good or thourough as I believed them to be. That the cause of issues (especially hardware) can mask themselves as many different things for a while. And that it was silly of me to doubt a small utility on a diskette http://www.sysopt.com/forum/wink.gif

So, there you have it. If you stuck through reading this post to the end, thanks for your time and sharing my pain. Perhaps if you find yourself with similar issues in the future, you will use the information here to get you to the resolution quicker than I arrived at it.

PS: I haven't shut the box down since getting it to work and determining that the drive needed replacement. Am confident that I have gotten pretty much everything backed up to another drive I was afraid of losing in the first place.....but it was a scary and frustrating ride for a while.

~Target

Kuasimodem
06-12-2001, 08:09 PM
A friend of mine had a similar problem with his Maxtor drive, and found it the same way you did.

If you want to save your data off the drive, try this...
place the drive between two ziplock bags filled with ice water, to keep it cool enough to get the data off without corruption. It worked for my buddy, it should work for you.

Curt

RobRich
06-13-2001, 02:00 AM
Excellent advice aboutnthe vendor utility disk. http://www.sysopt.com/forum/smile.gif

I gave up on Scandisk for drive diagnostics years ago. Scandisk has literally ok'd drives that I knew had bad sectors, even when using a surface scan! Instead, I try to keep the latest utilities for each popular brand onhand for error/problem detection. I have also found that many manufacturers will not provide an RMA until you provide the specific error code for the drive as indicated by its respective scan utility.

Robert Richmond

elroy
06-17-2001, 08:25 PM
www.grc.com (http://www.grc.com) Get Spinrite 5.0
It is slow but boy is it good.

Shagnasty
06-17-2001, 08:55 PM
Just a note...Pretty much all of the Major
Hard Drive outfits have a Utility which
will test, Repair (If possible) and give
error codes for their HD's. I deal primarily
with Western Digital and the diagnostics
they provide are great. They also will find
problems with the drives that you won't find
otherwise, like bios and S.M.A.R.T. problems.
I think HD's are just like Humans...They're
all gonna die...sooner or later...

daveleau
06-20-2001, 08:33 AM
Glad you got it all sorted out, Target. I got my hands on several of the 7200rpm 20GB Maxtors and had 3 of the 6-7 go bad very quickly. It kind of steered me away from Maxtor. Maxtor RMA'ed all the drives readily, so I traded two of the Maxtors for 2 15GB WD drives (I had to give ppl some incentive to trade) and have been happy ever since. Mine did not act like yours, mine just started clicking and died.

Good luck
Dave