JaySee
15/06/2007, 17h02

Sinon je rajouterai, pour ceux qui on du RAID hardware, il faut faire :
smartctl -a -d 3ware,0 /dev/twe0
et
smartctl -a -d 3ware,1 /dev/twe0
pour recuperer les infos de smartctl
Au niveau method je pense que plutot que de parser le resultat de smartctl on peut aussi matter le code retour de smartctl, voici ce qu'en dit le man :
Code:
RETURN VALUES The return values of smartctl are defined by a bitmask. If all is well with the disk, the return value (exit status) of smartctl is 0 (all bits turned off). If a problem occurs, or an error, potential error, or fault is detected, then a non-zero status is returned. In this case, the eight different bits in the return value have the following meanings for ATA disks; some of these values may also be returned for SCSI disks. Bit 0: Command line did not parse. Bit 1: Device open failed, or device did not return an IDENTIFY DEVICE structure. Bit 2: Some SMART command to the disk failed, or there was a checksum error in a SMART data structure (see '-b' option above). Bit 3: SMART status check returned "DISK FAILING". Bit 4: SMART status check returned "DISK OK" but we found prefail Attributes <= threshold. Bit 5: SMART status check returned "DISK OK" but we found that some (usage or prefail) Attributes have been <= threshold at some time in the past. Bit 6: The device error log contains records of errors. Bit 7: The device self-test log contains records of errors.
Du coup comme ca m'amusait et je viens de faire un script shell (j'ai pas php sur tout les serveurs) :
Script smartctl.sh
Code:
#!/bin/sh # par JC le 15-06-2007 # MAJ: rajout de l'adresse email en param ca evite d'editer le script # test un disque dur via smartctl et verifie le code retour if [ ! "$#" = "3" ]; then echo "Usage : " echo " $0 email driver device" echo " email is the email to notify if an error is found" echo " driver can be one of : ata 3ware,0 3ware,1 (for others: check you hardware documentation)" echo " device: example: /dev/sda or /dev/hda or /dev/twe0 ..." echo " full example : $0 toto@test.com \"3ware,1\" /dev/twe0" exit 1 fi if [ ! `which smartctl 2> /dev/null` ]; then echo "smartctl not installed or not in PATH, abort" exit 1 fi NOTIFY=$1 DRIVER=$2 DEVICE=$3 DATE=`date "+%Y-%m-%d %Hh%M"` HOST=`hostname` # on lange smartctl en mode minimal, juste pour recuperer le code de retour `smartctl -q silent -d $DRIVER $DEVICE` RES=$? if [ ! "$RES" = "0" ]; then MSG="Hello\n\nOn $DATE smartcl returns code : $RES for device $DEVICE ($DRIVER)" # Bit 0: Command line did not parse. # on s'en fiche de celui la! # Bit 1: Device open failed, or device did not return an IDENTIFY DEVICE structure. if [ ! "$(($RES & 2))" = "0" ]; then MSG="$MSG\nError on Bit 1: Device open failed, or device did not return an IDENTIFY DEVICE structure." fi # Bit 2: Some SMART command to the disk failed, or there was a checksum error in a SMART data structure (see '-b' option above). if [ ! "$(($RES & 4))" = "0" ]; then MSG="$MSG\nError on Bit 2: Some SMART command to the disk failed, or there was a checksum error in a SMART data structure." fi # Bit 3: SMART status check returned "DISK FAILING". if [ ! "$(($RES & 8))" = "0" ]; then MSG="$MSG\nError on Bit 3: SMART status check returned \"DISK FAILING\"." fi # Bit 4: SMART status check returned "DISK OK" but we found prefail Attributes <= threshold. if [ ! "$(($RES & 16))" = "0" ]; then MSG="$MSG\nError on Bit 4: SMART status check returned \"DISK OK\" but we found prefail Attributes <= threshold." fi # Bit 5: SMART status check returned "DISK OK" but we found that some (usage or prefail) Attributes have been <= threshold at some time in the past. if [ ! "$(($RES & 32))" = "0" ]; then MSG="$MSG\nError on Bit 5: SMART status check returned \"DISK OK\" but we found that some (usage or prefail) Attributes have been <= threshold at some time in the past." fi # Bit 6: The device error log contains records of errors. if [ ! "$(($RES & 64))" = "0" ]; then MSG="$MSG\nError on Bit 6: The device error log contains records of errors." fi # Bit 7: The device self-test log contains records of errors. if [ ! "$(($RES & 128))" = "0" ]; then MSG="$MSG\nError on Bit 7: The device self-test log contains records of errors." fi MSG="$MSG\n\nHere comes the full smartctl status for this disk :\n=================================================================\n" MSG_END="\n=================================================================\nDate was : $DATE\n" echo -e "$MSG`smartctl -a -d $DRIVER $DEVICE 2> /dev/null`\n$MSG_END" | mail -s "[$HOST] - HDD $DEVICE ($DRIVER) probleme" "$NOTIFY" exit 2 fi exit 0
/path/to/the/script/smartctl.sh email driver device
exemple :
/path/to/the/script/smartctl.sh toto@titi.com "3ware,0" /dev/twe0
pour le disque 0 du raid hard
Qu'en pensez vous ?
comme j'ai pas de disque qui foire... les tests ne sont pas evidents!!! j'ai pu tester en passant un disque inexistant et j'ai bien une erreur sur le bit 1... le reste devrai donc marcher...
J'ai mis en crontab daily le script suivant afin de checker tout les disques :
Code:
#!/bin/sh SMARTCTL=/root/scripts/smartctl.sh EMAIL=email@notifier.com $SMARTCTL "$EMAIL" "3ware,0" /dev/twe0 $SMARTCTL "$EMAIL" "3ware,1" /dev/twe0