Page 1 of 1

Thermal Shutdown and UUID

Posted: August 1st, 2012, 12:35 am
by Beeblebear
Hi!

Thanks again for the great website, it's proving to be an invaluable resource.
If I may, I'd like to post a tip about scripting around using a UUID for my storage drive instead of it's identity as sdb. I was going to post this as a question, as I couldn't get it to work, but in the process of writing this, I realised that there was a small typo in the script and that I wasn't running it a root. (duh!)

Anyway...

The other day I was testing a USB rescue drive on my server system, to see if it booted properly and left it in by mistake when I shut down.
I'm still early in my install of the base system (Ubuntu Server 12.04) and so upon the next reboot, I happened to be playing around with hdparm parameters, continuing from my last session. After a period of some alarm and confusion, I eventually realised that this USB drive had stolen the sda position and had forced all the rest of my disks down the table. Remembering about UUIDs, I did a little research and figured that this would be the best way to go in referencing my partitions, so I set about changing my settings and scripts to use these instead.

Here is how I went about changing my Thermal Shutdown script (DriveTempShutdown.sh) to reflect this new policy:

Code: Select all

#!/bin/bash

# PURPOSE: Script to check temperature of installed hard drives and report/shutdown if specified temperatures exceeded
#
# Modified for this server!!
#
# AUTHOR: feedback[AT]HaveTheKnowHow[DOT]com

# Expects three arguments:
#    1. Warning temperature
#    2. Critical shutdown temperature
#    3. If argument 3 is present then just check that drive letter
#    eg. using ./DriveTemps.sh 35 45
#    will warn when temperature of one or more drives reaches 35degrees and shutdown when any one of them hits 45
#    eg. using ./DriveTemps.sh 35 45 c
#    will warn when temperature of drive sdc reaches 35degrees and shutdown when it hits 45

# NOTES:
#  Change the string ">>/home/htkh" as required
#  Substitute string "myemail@myaddress.com" with your own email address in the string which starts "/usr/sbin/ssmtp myemail@myaddress.com"
#  Change the command   MyList='a b c d e' to the number of drives you have. In this case I'm using 6 drives

# Assumes  /usr/sbin/smartctl -n standby -a /dev/sd$i returns the string 'Temperature_Celsius' somewhere

echo "JOB RUN AT $(date)"
echo '============================'

echo ''
echo 'Drive Warning Limit set to =>' $1
echo 'Drive Shutdown Limit set to =>' $2
echo ''
echo ''

if [ $# -eq 2 ]
then
  MyList='72a4c040-a1d9-40fa-9cab-0a1d1f099529 bd6da063-8d37-476d-9f1c-7cc31098ffcd'
echo 'Testing all drives'
else
  MyList=($3)
  echo 'Testing only the system drive'
fi

echo ''

for i in $MyList
do
  echo 'Drive /dev/disk/by-uuid/'$i
  /usr/sbin/smartctl -n standby -a /dev/disk/by-uuid/$i | grep Temperature_Celsius
done

echo ''
echo ''

for i in $MyList
do
 #Check state of drive 'active/idle' or 'standby'
  stra=$(/sbin/hdparm -C /dev/disk/by-uuid/$i | grep 'drive' | awk '{print $4}')

  echo 'Testing Drive with UUID '$i

  if [ ${stra} = 'standby' ]
  then
    echo '    Drive with UUID '$i 'is in standby'
    echo ''
  else

    str1='/usr/sbin/smartctl -n standby -a /dev/disk/by-uuid/'$i
    str2=$($str1 | grep Temperature_Celsius | awk '{print $10}')

    if [ ${str2} -ge $1 ]
    then

      echo '============================'                                         >>/home/server/Logs/DriveWarning.Log
      echo $(date)                                                                >>/home/server/Logs/DriveWarning.Log
      echo ''                                                                     >>/home/server/Logs/DriveWarning.Log
      echo 'WARNING: TEMPERATURE FOR DRIVE with UUID '$i 'EXCEEDED' $1 '=>' $str2 >>/home/server/Logs/DriveWarning.Log
      echo ''                                                                     >>/home/server/Logs/DriveWarning.Log
      echo '============================'                                         >>/home/server/Logs/DriveWarning.Log

      echo '============================'
      echo $(date)
      echo ''
      echo 'WARNING: TEMPERATURE FOR DRIVE with UUID '$i 'EXCEEDED' $1 '=>' $str2
      echo ''
      echo '============================'

    fi

    if [ ${str2} -ge $2 ]
    then

      echo '============================'
      echo $(date)
      echo ''
      echo 'CRITICAL: TEMPERATURE FOR DRIVE with UUID '$i 'EXCEEDED' $2 '=>' $str2
      echo ''
      echo '============================'

      echo '============================'                                          >>/home/server/Logs/DriveWarning.Log
      echo $(date)                                                                 >>/home/server/Logs/DriveWarning.Log
      echo ''                                                                      >>/home/server/Logs/DriveWarning.Log
      echo 'CRITICAL: TEMPERATURE FOR DRIVE with UUID '$i 'EXCEEDED' $2 '=>' $str2 >>/home/server/Logs/DriveWarning.Log
      echo ''                                                                      >>/home/server/Logs/DriveWarning.Log
      echo '============================'                                          >>/home/server/Logs/DriveWarning.Log
      /usr/sbin/pm-hibernate
      /usr/sbin/ssmtp ******@*******.com </home/server/Logs/DriveWarning.Log
      echo 'Email Sent.....'
      exit
    else

      echo ''
      echo '    Temperature of Drive with UUID '$i' is OK at =>' $str2
      echo ''
    fi
  fi
done

echo 'All Drives are within limits'
echo ''


The first drive [72a4c040-a1d9-40fa-9cab-0a1d1f099529] is the only one which needs to be monitored, but the second (an SSD) [bd6da063-8d37-476d-9f1c-7cc31098ffcd] was included to see if the script was interpreting the UUIDs as valid variables and repeating for both listed drives.

Code: Select all

server@Server:~$ sudo blkid

/dev/sda1: UUID="38C2-743C" TYPE="vfat"
/dev/sda2: UUID="cc1e567d-7a33-41a4-8c36-b2885a6aa6cc" TYPE="ext2"
/dev/sda3: UUID="2pjRqc-cuZc-3l0G-z1so-oOni-EB8a-Oyn1oq" TYPE="LVM2_member"
/dev/sdb1: LABEL="4TB_Storage" UUID="72a4c040-a1d9-40fa-9cab-0a1d1f099529" TYPE="ext4"
/dev/sdc1: LABEL="Recordings" UUID="bd6da063-8d37-476d-9f1c-7cc31098ffcd" TYPE="ext4"
/dev/mapper/Server-root: UUID="d5d8b383-162d-4cbe-8d93-0ed8a940370f" TYPE="ext4"
/dev/mapper/Server-swap_1: UUID="b24830bb-4f1c-4868-b623-a76b7af28142" TYPE="swap"
/dev/mapper/Server-System: UUID="17f07fbf-a141-4fbe-bf30-f74b7987125d" TYPE="ext4"


Anyway, here is the result of running the script:

Code: Select all

server@Server:~/Scripts$ sudo ./DriveTempShutdown.sh 35 45
JOB RUN AT Tue Jul 31 23:45:14 BST 2012
============================

Drive Warning Limit set to => 35
Drive Shutdown Limit set to => 45


Testing all drives

Drive /dev/disk/by-uuid/72a4c040-a1d9-40fa-9cab-0a1d1f099529
194 Temperature_Celsius     0x0002   153   153   000    Old_age   Always       -       39 (Min/Max 22/47)
Drive /dev/disk/by-uuid/bd6da063-8d37-476d-9f1c-7cc31098ffcd
194 Temperature_Celsius     0x0022   128   129   000    Old_age   Always       -       128 (Min/Max 127/129)
231 Temperature_Celsius     0x0013   100   100   010    Pre-fail  Always       -       0


Testing Drive with UUID 72a4c040-a1d9-40fa-9cab-0a1d1f099529
============================
Tue Jul 31 23:45:16 BST 2012

WARNING: TEMPERATURE FOR DRIVE with UUID 72a4c040-a1d9-40fa-9cab-0a1d1f099529 EXCEEDED 35 => 40

============================

    Temperature of Drive with UUID 72a4c040-a1d9-40fa-9cab-0a1d1f099529 is OK at => 40

Testing Drive with UUID bd6da063-8d37-476d-9f1c-7cc31098ffcd
./DriveTempShutdown.sh: line 70: [: too many arguments
./DriveTempShutdown.sh: line 89: [: too many arguments

    Temperature of Drive with UUID bd6da063-8d37-476d-9f1c-7cc31098ffcd is OK at => 128 0

All Drives are within limits


As you can see, the script works fine until it has to deal with the SSD, whereupon it breaks when it has to deal with the drive's weird output of 128 and 0 degrees!
Obviously though, SSD's aren't going to have to be monitored for temperature in this way, so I didn't include it in the script on my system.

As you can see, the script is set to hibernate the system instead of shutting down and sends all output (Warning as well as Critical) to the log file.
To enable the hibernation feature, just install pm-utils:

Code: Select all

sudo apt-get install pm-utils


I have been recording and modifying a walk-through for my own reference, detailing all of the steps I have taken during my server installation so far (like the one above), so perhaps I can post it on this site when I get the opportunity.

I'm not sure that the Tips section would be the best place for it though, as it may be a bit raw and non-specific. What do you think?

EDIT: A version if this method, using disk/by-id instead of uuid, is here: http://forum.havetheknowhow.com/viewtopic.php?p=1995#p1995.

Re: Thermal Shutdown and UUID

Posted: August 1st, 2012, 9:15 am
by Ian
This is a great tip, thank you. :thumbup:

I've been meaning to look into UUIDs (I don't use them myself) and this thread will serve as a gentle reminder for me! ;)

Thanks again.

Ian.

ps. Yeah, please feel free to post as much as you like :clap:

Re: Thermal Shutdown and UUID

Posted: August 1st, 2012, 9:25 am
by Beeblebear
Morning!

I thought of a better way of doing this last night, but it was getting very late and so I didn't follow it through.

I will post it in it's own thread and link to it here.

Re: Thermal Shutdown and UUID

Posted: August 1st, 2012, 9:29 am
by Ian
I'll look forward to that cos how you've done it above is how I would have done it :crazy:

Re: Thermal Shutdown and UUID

Posted: August 1st, 2012, 10:43 am
by Beeblebear
I've posted another version of this at:

viewtopic.php?f=12&t=471

It doesn't quite work yet, but I'll edit it when I figure it out.