http://forum.havetheknowhow.com/viewtopic.php?p=1991#p1991
I have, however thought of a better solution, which will make the output for logging and email content much more readable and informative.
To do this, I have used the contents of the dev/disk/by-id folder.
Firstly, in a terminal, I enter:
Code: Select all
~$ ls -l /dev/disk/by-id
total 0
llrwxrwxrwx 1 root root 9 Aug 1 09:13 ata-Hitachi_HDS724040ALE640_PK1310PAG0VMBJ -> ../../sdb
lrwxrwxrwx 1 root root 10 Aug 1 09:13 ata-Hitachi_HDS724040ALE640_PK1310PAG0VMBJ-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 9 Aug 1 09:13 ata-MATSHITABD-MLT_UJ240AS_WJ42_003694 -> ../../sr0
lrwxrwxrwx 1 root root 9 Aug 1 09:13 ata-OCZ-NOCTI_OCZ-F412PBYMZ7MZ4E6W -> ../../sdc
lrwxrwxrwx 1 root root 10 Aug 1 09:13 ata-OCZ-NOCTI_OCZ-F412PBYMZ7MZ4E6W-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 9 Aug 1 09:13 ata-SAMSUNG_SSD_830_Series_S0XZNEAC711934 -> ../../sda
lrwxrwxrwx 1 root root 10 Aug 1 09:13 ata-SAMSUNG_SSD_830_Series_S0XZNEAC711934-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Aug 1 09:13 ata-SAMSUNG_SSD_830_Series_S0XZNEAC711934-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Aug 1 09:13 ata-SAMSUNG_SSD_830_Series_S0XZNEAC711934-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Aug 1 09:13 dm-name-Server-root -> ../../dm-0
lrwxrwxrwx 1 root root 10 Aug 1 09:13 dm-name-Server-swap_1 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Aug 1 09:13 dm-name-Server-System -> ../../dm-2
lrwxrwxrwx 1 root root 10 Aug 1 09:13 dm-uuid-LVM-Z8LZg70hTKbj7AoTEU12IP81IeP5fgLdb9h3Rs23fJ2io8zxPjbpedP4eUrC3OVw -> ../../dm-2
lrwxrwxrwx 1 root root 10 Aug 1 09:13 dm-uuid-LVM-Z8LZg70hTKbj7AoTEU12IP81IeP5fgLdG5s3dBd4rFdt34hdJoyhJxB5oCw7l6RI -> ../../dm-1
lrwxrwxrwx 1 root root 10 Aug 1 09:13 dm-uuid-LVM-Z8LZg70hTKbj7AoTEU12IP81IeP5fgLdVJQezcxe7NQVplQeVfQFMlqY4dAwq73D -> ../../dm-0
lrwxrwxrwx 1 root root 9 Aug 1 09:13 scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ -> ../../sdb
lrwxrwxrwx 1 root root 10 Aug 1 09:13 scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 9 Aug 1 09:13 scsi-SATA_OCZ-NOCTI_OCZ-F412PBYMZ7MZ4E6W -> ../../sdc
lrwxrwxrwx 1 root root 10 Aug 1 09:13 scsi-SATA_OCZ-NOCTI_OCZ-F412PBYMZ7MZ4E6W-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 9 Aug 1 09:13 scsi-SATA_SAMSUNG_SSD_830S0XZNEAC711934 -> ../../sda
lrwxrwxrwx 1 root root 10 Aug 1 09:13 scsi-SATA_SAMSUNG_SSD_830S0XZNEAC711934-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Aug 1 09:13 scsi-SATA_SAMSUNG_SSD_830S0XZNEAC711934-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Aug 1 09:13 scsi-SATA_SAMSUNG_SSD_830S0XZNEAC711934-part3 -> ../../sda3
lrwxrwxrwx 1 root root 9 Aug 1 09:13 wwn-0x5000cca22bc063f2 -> ../../sdb
lrwxrwxrwx 1 root root 10 Aug 1 09:13 wwn-0x5000cca22bc063f2-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 9 Aug 1 09:13 wwn-0x5002538043584d30 -> ../../sda
lrwxrwxrwx 1 root root 10 Aug 1 09:13 wwn-0x5002538043584d30-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Aug 1 09:13 wwn-0x5002538043584d30-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Aug 1 09:13 wwn-0x5002538043584d30-part3 -> ../../sda3
lrwxrwxrwx 1 root root 9 Aug 1 09:13 wwn-0x5e83a97edd3455aa -> ../../sdc
lrwxrwxrwx 1 root root 10 Aug 1 09:13 wwn-0x5e83a97edd3455aa-part1 -> ../../sdc1
I'm after the ID of the physical drive sdb here, for my purposes, so the line;
lrwxrwxrwx 1 root root 9 Aug 1 09:13 scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ -> ../../sdb
is what I'm looking for.
The path to this symbolic link is therefore:
/dev/disk/by-id/scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ
So the modified Thermal Shutdown script for this method will look like:
Code: Select all
#!/bin/bash
#PURPOSE: Script to check temperature of installed hard drives and report/shutdown if specified temperatures exceeded
#
# Modified for this server!!
#
# AUTHOR: feedback[AT]HaveTheKnowHow[DOT]com
# Expects three arguments:
# 1. Warning temperature
# 2. Critical shutdown temperature
# 3. If argument 3 is present then just check that drive letter
# eg. using ./DriveTemps.sh 35 45
# will warn when temperature of one or more drives reaches 35degrees and shutdown when any one of them hits 45
# eg. using ./DriveTemps.sh 35 45 c
# will warn when temperature of drive sdc reaches 35degrees and shutdown when it hits 45
# NOTES:
# Change the string ">>/home/htkh" as required
# Substitute string "myemail@myaddress.com" with your own email address in the string which starts "/usr/sbin/ssmtp myemail@myaddress.com"
# Change the command MyList='a b c d e' to the number of drives you have. In this case I'm using 6 drives
# Assumes /usr/sbin/smartctl -n standby -a /dev/sd$i returns the string 'Temperature_Celsius' somewhere
echo "JOB RUN AT $(date)"
echo '============================'
echo ''
echo 'Drive Warning Limit set to =>' $1
echo 'Drive Shutdown Limit set to =>' $2
echo ''
echo ''
if [ $# -eq 2 ]
then
MyList='scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ'
echo 'Testing all drives'
else
MyList=($3)
echo 'Testing only the system drive'
fi
echo ''
for i in $MyList
do
echo 'Drive /dev/disk/by-id/'$i
/usr/sbin/smartctl -n standby -a /dev/disk/by-id/$i | grep Temperature_Celsius
done
echo ''
echo ''
for i in $MyList
do
#Check state of drive 'active/idle' or 'standby'
stra=$(/sbin/hdparm -C /dev/disk/by-id/$i | grep 'drive' | awk '{print $4}')
echo 'Testing Drive with ID: '$i
if [ ${stra} = 'standby' ]
then
echo ' Drive with ID: '$i ' s in standby'
echo ''
else
str1='/usr/sbin/smartctl -n standby -a /dev/disk/by-id/'$i
str2=$($str1 | grep Temperature_Celsius | awk '{print $10}')
if [ ${str2} -ge $1 ]
then
echo '========================================' >>/home/server/Logs/DriveWarning.Log
echo $(date) >>/home/server/Logs/DriveWarning.Log
echo '' >>/home/server/Logs/DriveWarning.Log
echo 'WARNING: TEMPERATURE FOR DRIVE with ID: '$i 'EXCEEDED' $1 '=>' $str2 >>/home/server/Logs/DriveWarning.Log
echo '' >>/home/server/Logs/DriveWarning.Log
echo '========================================' >>/home/server/Logs/DriveWarning.Log
echo '========================================'
echo $(date)
echo ''
echo 'WARNING: TEMPERATURE FOR DRIVE with ID: '$i 'EXCEEDED' $1 '=>' $str2
echo ''
echo '========================================'
fi
if [ ${str2} -ge $2 ]
then
echo '========================================' >>/home/server/Logs/DriveWarning.Log
echo $(date) >>/home/server/Logs/DriveWarning.Log
echo '' >>/home/server/Logs/DriveWarning.Log
echo 'CRITICAL: TEMPERATURE FOR DRIVE with ID: '$i 'EXCEEDED' $2 '=>' $str2 >>/home/server/Logs/DriveWarning.Log
echo '' >>/home/server/Logs/DriveWarning.Log
echo '========================================' >>/home/server/Logs/DriveWarning.Log
echo '========================================'
echo $(date)
echo ''
echo 'CRITICAL: TEMPERATURE FOR DRIVE with ID: '$i 'EXCEEDED' $2 '=>' $str2
echo ''
echo '========================================'
/usr/sbin/pm-hibernate
/usr/sbin/ssmtp ******@****** </home/server/Logs/DriveWarning.Log
echo 'Email Sent.....'
exit
else
echo ''
echo ' Temperature of Drive with ID: '$i' is OK at =>' $str2
echo ''
fi
fi
done
echo 'All Drives are within limits'
echo ''
As you can see, the script is set to hibernate the system instead of shutting down and sends all output (Warning as well as Critical) to the log file.
To enable the hibernation feature, just install pm-utils:
Code: Select all
sudo apt-get install pm-utils
Running the script should give a similar result to this:
Code: Select all
server@Server:~/Scripts$ sudo ./DriveTempShutdown.sh 40 55
[sudo] password for server:
JOB RUN AT Sat Aug 4 19:32:44 BST 2012
============================
Drive Warning Limit set to => 40
Drive Shutdown Limit set to => 55
Testing all drives
Drive /dev/disk/by-id/scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ
194 Temperature_Celsius 0x0002 139 139 000 Old_age Always - 43 (Min/Max 22/47)
Testing Drive with ID: scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ
========================================
Sat Aug 4 19:32:47 BST 2012
WARNING: TEMPERATURE FOR DRIVE with ID: scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ EXCEEDED 40 => 43
========================================
Temperature of Drive with ID: scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ is OK at => 43
All Drives are within limits
All working!
I hope that this as useful to others as I find it to be. Changing to the disk/by-id/ symlink prevents the script from breaking on a hardware change and also tells exactly which drive has overheated.
Further improvements could be made by perhaps grep-ing the drive label to give sd#; ie: scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ -> ../../sdb. Maybe adding something along the lines of: ls -l /dev/disk/by-id | grep $i might work? Giving volume label would also be useful; perhaps this could be grep-ed from the result of the last function (ie. sdb):
Code: Select all
server@Server:~/Scripts$ ls -l /dev/disk/by-label
total 0
lrwxrwxrwx 1 root root 10 Aug 4 18:46 4TB_Storage -> ../../sdb1
lrwxrwxrwx 1 root root 10 Aug 4 18:46 Recordings -> ../../sdc1
This is, of course, all very unnecessary and tends to make everything more complicated, but every little bit of info helps us to deal with problems more quickly.
Food for thought, anyway.