|
|
IDE harddisk errors: DriveReady SeekComplete Error status=0x51 DriveStatusError error=0x04
Ever seen some of those error messages in the kernel log file ?
kernel: hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
kernel: hda: drive_cmd: error=0x04 { DriveStatusError }
I have (well, only a few times per month) and here are my conclusions.
Before I start: If you saw such errors too, and/or other errors and your harddrive died afterwards,
I'd be more than happy if you drop me an email with the errors. Thanks! :-)
Also see: Linux Harddisk Monitoring with SmartMonTools (smartctl)
Since anyone on any maillist or forum says something different about the harddisk error messages,
I started my search in the kernel:
captain:/usr/src/kernel-source-2.6.8# grep -R DriveStatusError *
drivers/ide/legacy/hd.c: if (hd_error & ABRT_ERR) printk("DriveStatusError ");
drivers/ide/ide.c: if (err & ABRT_ERR) printk("DriveStatusError ");
drivers/ide/ide-disk.c: if (err & ABRT_ERR) printk("DriveStatusError ");
drivers/ide/Kconfig: hda: set_multmode: error=0x04 { DriveStatusError }
"DriveStatusError" appears in 4 files in the kernel.
Looking into drivers/ide/ide.c reveals that those strings are in the method
ide_dump_status:
/*
* Error reporting, in human readable form (luxurious, but a memory hog).
*/
u8 ide_dump_status (ide_drive_t *drive, const char *msg, u8 stat)
{
[...]
local_irq_set(flags);
printk(KERN_WARNING "%s: %s: status=0x%02x", drive->name, msg, stat);
#if FANCY_STATUS_DUMPS
printk(" { ");
if (stat & BUSY_STAT) {
printk("Busy ");
} else {
if (stat & READY_STAT) printk("DriveReady ");
if (stat & WRERR_STAT) printk("DeviceFault ");
if (stat & SEEK_STAT) printk("SeekComplete ");
if (stat & DRQ_STAT) printk("DataRequest ");
if (stat & ECC_STAT) printk("CorrectedError ");
if (stat & INDEX_STAT) printk("Index ");
if (stat & ERR_STAT) printk("Error ");
}
printk("}");
#endif /* FANCY_STATUS_DUMPS */
Above we have the output of the IDE status register bits. In our case, DriveReady
just means the drive is ready. Nothing to worry! SeekComplete means the seek operation
requested by the previous IDE command was completed. Still nothing to worry!
But Error means something out of the ordinary happened. No reason to panic yet, but
start worrying a bit.
printk("\n");
if ((stat & (BUSY_STAT|ERR_STAT)) == ERR_STAT) {
err = hwif->INB(IDE_ERROR_REG);
printk("%s: %s: error=0x%02x", drive->name, msg, err);
#if FANCY_STATUS_DUMPS
if (drive->media == ide_disk) {
printk(" { ");
if (err & ABRT_ERR) printk("DriveStatusError ");
if (err & ICRC_ERR) printk("Bad%s ", (err & ABRT_ERR) ? "CRC" : "Sector");
if (err & ECC_ERR) printk("UncorrectableError ");
if (err & ID_ERR) printk("SectorIdNotFound ");
if (err & TRK0_ERR) printk("TrackZeroNotFound ");
if (err & MARK_ERR) printk("AddrMarkNotFound ");
printk("}");
if ((err & (BBD_ERR | ABRT_ERR)) == BBD_ERR || (err & (ECC_ERR|ID_ERR|MARK_ERR))) {
if ((drive->id->command_set_2 & 0x0400) &&
[...]
printk(", LBAsect=%llu, high=%d, low=%d",
(long long) sectors,
high, low);
} else {
u8 cur = hwif->INB(IDE_SELECT_REG);
if (cur & 0x40) { /* using LBA? */
printk(", LBAsect=%ld", (unsigned long)
[...]
} else {
printk(", CHS=%d/%d/%d",
[...]
if (HWGROUP(drive) && HWGROUP(drive)->rq)
printk(", sector=%llu", (unsigned long long)HWGROUP(drive)->rq->sector);
}
}
The next section above shows the error message output. The only halfway "normal" error
is DriveStatusError. This just means that there is a status error. All other errors
like bad sector, uncorrectable error, sector id not found, track zero not found
and address mark not found, mean that there is something bad going on and you should
immediately backup data (if you've got no recent backup) and replace the harddrive.
In case of our DriveStatusError the kernel file include/linux/hdreg.h
sheds some more light onto it (also for the other errors):
/* Bits of HD_STATUS */
#define ERR_STAT 0x01
#define INDEX_STAT 0x02
#define ECC_STAT 0x04 /* Corrected error */
#define DRQ_STAT 0x08
#define SEEK_STAT 0x10
#define SRV_STAT 0x10
#define WRERR_STAT 0x20
#define READY_STAT 0x40
#define BUSY_STAT 0x80
/* Bits for HD_ERROR */
#define MARK_ERR 0x01 /* Bad address mark */
#define TRK0_ERR 0x02 /* couldn't find track 0 */
#define ABRT_ERR 0x04 /* Command aborted */
#define MCR_ERR 0x08 /* media change request */
#define ID_ERR 0x10 /* ID field not found */
#define MC_ERR 0x20 /* media changed */
#define ECC_ERR 0x40 /* Uncorrectable ECC error */
#define BBD_ERR 0x80 /* pre-EIDE meaning: block marked bad */
#define ICRC_ERR 0x80 /* new meaning: CRC error during transfer */
First the status byte: 0x51 = 01010001b
0 READY_STAT 0 SEEK_STAT 0 0 0 ERR_STAT
As said above, the drive is ready, seek is complete, but there is an error.
The error byte: 0x04 = 00000100b
0 0 0 0 0 ABRT_ERR 0 0
Here, in the IDE error register, we only have ABRT_ERR = "Command aborted".
CONCLUSIONS:
kernel: hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
kernel: hda: drive_cmd: error=0x04 { DriveStatusError }
This says that there was an error on the harddrive, and that the command was aborted
("Command aborted"). As far as I have found, such errors once in a while
mean nothing serious. As long as there are no "Uncorrectable ECC error"s or other
grave errors like checksum error, bad address mark etc. it should be nothing to worry about.
Such "aborted commands" occur e.g. when an unknown sector is requested, that is not
present on the harddisk, buggy drivers (-> the driver sent a command that was not understood by the
drive).
Another evidence that the DriveStatusError (command aborted) is harmless is that
the SmartMonTools (Linux Harddisk Monitoring with SmartMonTools (smartctl))
don't report any non-zero RAW_VALUES for Reallocated_Sector_Ct, Seek_Error_Rate, Reallocated_Event_Count, Offline_Uncorrectable,
UDMA_CRC_Error_Count, Multi_Zone_Error_Rate or Hardware_ECC_Recovered etc., so there were
not serious errors on the harddrive, but the command was just not executed or understood
by the disk.
Maybe the firmware on the harddrive is buggy.
If I'm in error somewhere, please let me know! It is for the benefit of any linux user
if this page is accurate as possible. Thanks a lot in advance!
And last but not least, don't blame me if your harddrive dies!
Also see: Linux Harddisk Monitoring with SmartMonTools (smartctl)
Last-Modified: Fri, 31 Mar 2006 22:15:32 GMT
|
|