Tag Archives: pegasus

Promise Pegasus2: Scripting a SMART check with promiseutil

We’ve found that the Promise Pegasus2 Thunderbolt 2 RAID can report that the SMART Health status of its disks is just dandy, while the unit is quietly accumulating ATA errors that may indicate the pending failure of a disk.

I want to be notified if the Pegasus either has a SMART status failure or if ATA errors are present on any of the disks.

This script does just that. It’s essentially a more refined version of the previous promiseutil scripts that grabs the simple SMART status of each disk, greps to see if it’s “OK”, then runs a line of awk that looks at the report to see if there’s an “ATA Error Count”. As always, it logs to system.log and optionally sends error reports by email.

#!/bin/bash
#
# promise_smart_check.sh
#
# Checks Promise Pegasus2 SMART status, checks for ATA errors, logs and mails the output if there's an issue.
#
# Author: AB @ Modest Industries
#
# Requires Promise Utility for Pegasus2 (http://www.promise.com), tested with v3.18.0000.18
# Requires sendemail for email alerts (http://caspian.dotconf.net/menu/Software/SendEmail/)
#
# Edit History
# 2014-04-21 - AB: Version 1.0.
# 2014-04-24 - AB: Refactored.
# 2014-05-01 - AB: Incorporate the awk script to check for ATI errors.
# 2014-05-08 - AB: Refinements.
# 2014-05-15 - AB: Update to message body construction, tmp file & sendemail sanity checks.
# 2014-05-17 - AB: Added promiseutil path check.

export DATESTAMP=`date +%Y-%m-%d\ %H:%M:%S`

# Editable variables

# Path to sendemail
sendemail_path="/Library/Scripts/Monitoring/sendemail"

# Send email alerts?
send_email_alert=true

# Variables for sendemail
# Sender's address
alert_sender="[email protected]"

# Recipient's addresses, comma separated.
#alert_recipient='[email protected], [email protected]'
alert_recipient="[email protected]"

# SMTP server to send the messages through
alert_smtp_server="smtp.example.com"

# ------------ You probably shouldn't edit below this line ------------------
# Variables

# Default the error flags to false.
smart_error_flag="false"
ata_error_flag="false"

# Alert subject
alert_subject="ALERT: Promise Pegasus2 SMART problem detected on $HOSTNAME."

# Alert header
alert_header="At $DATESTAMP, a problem was detected on this device:\n"

# Pass / Fail messages
pass_msg="Promise Pegasus SMART check successful."
fail_msg=" *** Promise Pegasus SMART check FAILED!!! ***"

# Default the message body
message_body=""

# Alert footer
alert_footer="Run 'promiseutil -C smart -v' for more information."

# Promise Pegasus command line utility default path
promiseutil_path="/usr/bin/promiseutil"

# ----------------- Check for promiseutil, sendemail & set up temp files ------------------
if [ ! -f $promiseutil_path ]; then
        echo "$0 ERROR: $promiseutil_path does not exist"
        echo  "Please download and install the Promise Pegasus Utility app from http://promise.com"
        exit 1
fi

if [ ! -f $sendemail_path ]; then
        echo "$0 ERROR: $sendemail_path does not exist"
        echo  "Please download from http://caspian.dotconf.net/menu/Software/SendEmail/ and then set the \$sendmemail_path variable inside this script"
        exit 1
fi

unit_ID_tmp=`mktemp -q "/tmp/$$_unit_ID.XXXX"`
if [ $? -ne 0 ]; then
        echo "$0: ERROR: Can't create temp file, exiting..."
        exit 1
fi

smart_results_tmp=`mktemp -q "/tmp/$$_smart_results.XXXX"`
if [ $? -ne 0 ]; then
        echo "$0: ERROR: Can't create temp file, exiting..."
        exit 1
fi

# ----------------- Run promiseutil, evaluate the results ------------------

# Get Unit ID information for this Promise unit. Includes workaround for promiseutil tty issue.
screen -D -m sh -c "$promiseutil_path -C subsys -v >$unit_ID_tmp"

# Drop the output into a variable.
unit_ID=$(<$tmpdir$unit_ID_tmp)

# Get the SMART report, put it into a tmp file.
screen -D -m sh -c "$promiseutil_path -C smart -v >$smart_results_tmp"

# Grab the header for each PdId in the Promise
smart_status=$(cat $smart_results_tmp | grep -A4 "^PdId")

# Check the header to see if SMART Health Check reports a problem
if grep "^SMART Health Status:" <<< "$smart_status" | grep -qv "OK"
then
        smart_error_flag="true"
fi

# Check for ATA errors, which may indicate that the drive is failing even if SMART Health is OK
ata_errors=$(awk '/^PdId: [1-9][0-9]*/ \
                                { a=$0; n=4; next } \
                                n { --n; a=a "\n" $0; next } \
                                /^ATA Error Count*/ \
                                { ata_err=$0; print a "\n" ata_err "\n" }' \
                                "$smart_results")
# Flag if there were ATA errors
if [ "$ata_errors" != "" ]; then
        ata_error_flag="true"
fi

# ----------------- Build the message_body ------------------

# If there's a problem, build the header.
if [ "$smart_error_flag" ==  "true" ] || [ "$ata_error_flag" == "true" ]; then
        message_body="$alert_header\n\n$fail_msg\n\n$unit_ID\n\n"

        # SMART Health status.
        if [ "$smart_error_flag" == "true" ]; then
                message_body="$message_body\nSMART Health Status is reporting one or more bad drives."
        fi

        # Always include the smart_status
        message_body="$message_body\n\n$smart_status"

        # Then the ATA errors.
        if [ "$ata_error_flag" == "true" ]; then
                message_body="$message_body\n\nOne or more drives has an ATA Error Count and may be failing.\n\n$ata_errors"
        fi
fi

#  ----------------- Logging & email ------------------

# Log the results, conditionally send email on failure.
if [ "$ata_error_flag" == "true" ] || [ "$smart_error_flag" == "true" ]; then
        message_body="$message_body\n\n$alert_footer"
        echo "$DATESTAMP: \n\n$message_body" >> /var/log/system.log
        if [ "$send_email_alert" == "true" ] ; then
                "$sendemail_path" -f $alert_sender -t $alert_recipient -u $alert_subject -m "$message_body" -s $alert_smtp_server
        fi
else
        echo "$DATESTAMP: $pass_msg\n\n$unit_ID" >> /var/log/system.log
fi

# ----------------- Cleanup ------------------

rm -f rm -f $unit_ID_tmp $smart_results_tmp


This version of the script checks for the presence of promiseutil and sendemail. We call screen here because the promiseutil seems to need a TTY in order to run properly.

Hope you find it useful.