Tag Archives: Pegasus2

Promise Pegasus2: Scripting an Enclosure check with promise_enclosure_check.sh

The Promise Pegasus2 has onboard sensors that monitor the power supply  voltages, speed of the fan, and temperature of the controller and backplane.

This seems worth performing the occasional check on.

The example script below runs an initial check of the enclosure using promiseutil. If it doesn’t find that “Everything is OK”, it runs a more verbose check, logs the problem and optionally sends email.

#!/bin/bash
#
# promise_enclosure_check.sh
#
# Checks the status of a Promise Pegasus2 RAID enclosure and mails the output if there's an issue.
#
# Author: AB @ Modest Industries
#
# Works with Promise Utility for Pegasus2 v3.18.0000.18 (http://www.promise.com)
# Requires sendemail for email alerts (http://caspian.dotconf.net/menu/Software/SendEmail/)
#
# Edit History
# 2014-04-21 - AB: Version 1.0.
# 2014-05-08 - AB: Refinements.
# 2014-05-09 - AB: Better message_body if failed.

export DATESTAMP=`date +%Y-%m-%d\ %H:%M:%S`

# Editable variables

# Path to sendemail
sendemail_path="/Library/Scripts/Monitoring/sendemail"

# If a problem is found, send email?
send_email_alert=true

# Variables for sendemail
# Sender's address
alert_sender="[email protected]"

# Recipient's addresses, comma separated.
#alert_recipient='[email protected], [email protected]'
alert_recipient="[email protected]"

# SMTP server to send the messages through
alert_smtp_server="smtp.example.com"

# ------------ Do not edit below this line ------------------
# Variables

# Pass / fail flags
enclosure_pass=true

# The subject line of the alert.
alert_subject="Alert: Promise Pegasus2 enclosure problem detected on $HOSTNAME."

# Alert header
alert_header="At $DATESTAMP, an enclosure problem was detected on this device:\n"

# Pass / Fail messages
pass_msg="Promise Pegasus Enclosure check successful."
fail_msg=" *** Promise Pegasus Enclosure check FAILED!!! ***\n\n"

# Alert footer
alert_footer="Run 'promiseutil -C enclosure -v' for more information."

# Create temp files
unit_ID_tmp=`mktemp "/tmp/$$_unit_ID.XXXX"`
enclosure_results_tmp=`mktemp "/tmp/$$_enclosure_results.XXXX"`

message_body="$alert_header"

# Get the information for this Promise unit. Includes workaround for promiseutil tty issue.
screen -D -m sh -c "promiseutil -C subsys -v >$unit_ID_tmp"

# Drop the output into a variable.
unit_ID=$(<$unit_ID_tmp)

# Get the report, put it into a tmp file.
screen -D -m sh -c "promiseutil -C enclosure >$enclosure_results_tmp"

if ! grep -qv "Everything is OK" $enclosure_results_tmp
then
        enclosure_pass="false"
        # Get a more detailed report, put it into a tmp file.
        screen -D -m sh -c "promiseutil -C enclosure -v >$enclosure_results_tmp"

        # Build the message.
        message_body=$message_body$fail_header$unit_ID$(<$enclosure_results_tmp)
fi

#  ----------------- Logging & email ------------------

# Log the results, conditionally send email on failure.
if [ "$enclosure_pass" == "false" ]; then
        message_body="$message_body\n\n$alert_footer"
        echo "$DATESTAMP: \n\n$message_body" >> /var/log/system.log
        if [ "$send_email_alert" == "true" ] ; then
                "$sendemail_path" -f $alert_sender -t $alert_recipient -u $alert_subject -m "$message_body" -s $alert_smtp_server
        fi
else
        echo "$DATESTAMP: $pass_msg" >> /var/log/system.log
fi
# Cleanup
rm -f rm -f $unit_ID_tmp $enclosure_results_tmp

The script was developed against a Promise Pegasus2. It hasn’t been tested with the earlier Promise Pegasus series.

2014-11-07 – Update: Merci to Stéphane Allain for catching a typo in the script.

Promise Pegasus2: The gap between a failing disk and a failed disk.

We were recently called in to diagnose a relatively new Promise Pegasus2 R6 that intermittently refused to mount. The Promise Utility app reported nothing amiss with the RAID or the drives, green lights everywhere, so we used the command line to dig a little deeper.

So let’s run a verbose SMART check on the unit:

promiseutil -C smart -v

The first three drives checked out. Drive 4 indicated that SMART thought everything was fine:

PdId: 4
Model Number: TOSHIBA DT01ACA2
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK

But then a little further down,  CRC errors:

Error 165 occurred at disk power-on lifetime: 1176 hours (49 days + 0 hours)
  When the command that caused the error occurred,
  the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 50 b0 ee 81 0d

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 80 a8 80 ee 81 40 00      18:38:48.276  WRITE FPDMA QUEUED
  61 80 a0 00 ee 81 40 00      18:38:48.276  WRITE FPDMA QUEUED
  61 80 98 80 ed 81 40 00      18:38:48.276  WRITE FPDMA QUEUED
  61 80 90 00 ed 81 40 00      18:38:48.276  WRITE FPDMA QUEUED
  61 80 88 80 ec 81 40 00      18:38:48.275  WRITE FPDMA QUEUED

Error 164 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
  When the command that caused the error occurred,
  the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 10 f0 ad 6b 0d  Error: ICRC, ABRT 16 sectors at LBA = 0x0d6badf0 = 225160688

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 80 80 ad 6b 40 00      18:36:07.145  WRITE DMA EXT
  35 00 80 00 ae 6b 40 00      18:36:07.144  WRITE DMA EXT
  35 00 80 00 ad 6b 40 00      18:36:07.144  WRITE DMA EXT
  35 00 80 80 ab 6b 40 00      18:36:07.139  WRITE DMA EXT
  35 00 80 00 ab 6b 40 00      18:36:07.139  WRITE DMA EXT

Error 163 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
  When the command that caused the error occurred,
  the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 f0 10 5e 5d 0d  Error: ICRC, ABRT 240 sectors at LBA = 0x0d5d5e10 = 224222736
...

The client confirmed that he’d seen a warning light on drive 4, but that it had “gone away”. We had them back the data up immediately. Promise support subsequently verified that the drive had failed based on the logs and sent a replacement drive out.

If the drive had failed completely, I assume the RAID would have kicked in, taken the bad drive offline and continued spinning, but since the drive hadn’t actually failed, the volume was struggling with a failing member and that was causing boot and performance issues.

The take-away is that there’s a generous gap between a drive that’s beginning to fail and a drive that’s failed enough for the Promise Utility app to detect it. Verbose mode is your friend.

Turn off Promise Pegasus2 power savings one-liner.

From Terminal:

promiseutil -C ctrl -a mod -s “powersavinglevel=0″

Now check your handiwork:

promiseutil -C ctrl -v | grep PowerSavingLevel

You should see:

PowerSavingLevel: 0

You may be tempted to turn off PowerManagement. Here’s what Promise’s website has to say about that:

Do not disable Power Management. Doing so will make the Pegasus2 hang when the Mac is in sleep or shut down and restarted and will require the Pegasus and the Mac to be power cycled to return to normal operation.

Promise Pegasus2 command line tools.

When I began deploying Promise Pegasus2 storage devices, I wasn’t happy with the state of the Promise Utility app. It doesn’t provide email alerts except when a user is logged in and this is isn’t optimal for most of our deployments.

Then I stumbled on a couple of Ruby scripts by GriffithStudio that showed a way around many of the limitations of the Promise GUI.

When you install the Promise Utility for Promise Pegasus2, it includes a command line utility. You can view status and even change the settings of the device. In a Terminal window, type:

promiseutil

You’ll be greeted with an interactive command line.

-------------------------------------------------------------
Promise Utility
Version: 3.18.0000.18 Build Date: Oct 29, 2013
-------------------------------------------------------------
 
List available RAID HBAs and Subsystems
=============================================================
Type  #    Model         Alias                            WWN                 
=============================================================
hba   1  * Pegasus2 R4                                    2000-0001-5557-98bf 
 
Totally 1 HBA(s) and 0 Subsystem(s)
 
-------------------------------------------------------------
The row with '*' sign refers the current working HBA/Subsystem path
To change the current HBA/Subsystem path, you may use the following command:
  
  spath -a chgpath -t hba|subsys -p <path #>.
 
Type help or ? to display all the available commands
-------------------------------------------------------------
 
cliib> 

To get a list of commands, type ? and press return. Some of the available commands include:

subsys - Model, serial number, hardware revision.
enclosure - Enclosure status.
ctrl - Firmware version, array & RAID status.
phydrv - Physical drive status.
array - Array status.
logdrv - Logical drive status.
event - Event log, including abnormal shutdowns.

Many of the commands yield a brief Pass/Fail style response:

cliib> enclosure 
=============================================================
Id  EnclosureType               OpStatus  StatusDescription                    
=============================================================
1   Pegasus2-R4                 OK        Everything is OK

If you want more details, you can add the verbose flag, -v. Want the serial number, model and hardware revision?

cliib> subsys -v
 
-------------------------------------------------------------
Alias: 
Vendor: Promise Technology,Inc.        Model: Pegasus2 R4
PartNo: F29DS4722000000                SerialNo: M00H00CXXXXXXXX
Rev: B3                                WWN: 2000-0001-5557-98bf

You can grab enclosure information, including temperature of box, backplane and controller card, as well as the rotation speed of the fans and voltage on the power rails with this:

cliib> enclosure -v
 
 
-------------------------------------------------------------
Enclosure Setting:
 
EnclosureId: 1
CtrlWarningTempThreshold: 63C/145F     CtrlCriticalTempThreshold: 68C/154F
 
 
-------------------------------------------------------------
Enclosure Info and Status:
 
EnclosureId: 1
EnclosureType: Pegasus2-R4
SEPFwVersion: 1.00
MaxNumOfControllers: 1                 MaxNumOfPhyDrvSlots: 4
MaxNumOfFans: 1                        MaxNumOfBlowers: 0
MaxNumOfTempSensors: 2                 MaxNumOfPSUs: 1
MaxNumOfBatteries: 0                   MaxNumOfVoltageSensors: 3
 
=============================================================
PSU       Status                        
=============================================================
1         Powered On and Functional     
 
=============================================================
Fan Location        FanStatus             HealthyThreshold  CurrentFanSpeed 
=============================================================
1   Backplane       Functional            > 1000 RPM        1200 RPM        
 
=============================================================
TemperatureSensor   Location       HealthThreshold   CurrentTemp    Status    
=============================================================
1                   Controller     < 63C/145F        49C/120F       normal    
2                   Backplane      < 53C/127F        47C/116F       normal    
 
=============================================================
VoltageSensor  Type    HealthyThreshold         CurrentVoltage  Status         
=============================================================
1              3.3V    +/- 5% (3.13 - 3.46) V   3.2V            Operational    
2              5.0V    +/- 5% (4.75 - 5.25) V   5.0V            Operational    
3              12.0V   +/- 8% (11.04 - 12.96) V 12.0V           Operational 

How about the state of the physical drives?

cliib> phydrv   
=============================================================
PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     
=============================================================
1    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot1   OK        Array0 No.0      
2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      
3    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      
4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3

You can also run commands without entering interactive mode and this is useful when incorporating into bash scripts. Simply add the -C flag, followed by the command you want to run. For example:

krieger:~ admin$ promiseutil -C logdrv -v

will let you view the logical drives.

Many of these commands will run on the previous generation Promise Pegasus units.

Be aware: it’s possible to change the configuration of your Pegasus2 or even destroy your RAID setup from the command line, so use caution when working on production systems.