Tag Archives: promiseutil

Sometimes, you stumble on the right person…

…and they reveal to you a bit of magic you didn’t know existed.

In this case, it is an undocumented flag in Promise Technology’s command line utility for the Promise Pegasus2 Thunderbolt RAID, promiseutil.

As previously discussed, it appeared to be impossible to check the status of more than one Promise Pegasus enclosure from inside a script using promiseutil. We had filed a support ticket, hoping for some kind of resolution, but were told that promiseutil works as intended.

On a hunch, I reached out to someone at Promise and asked for their help investigating this issue.  I was pleasantly surprised when the contact not only took the issue seriously, he immediately looped in other support engineers to look at the problem.

After a week of back and forth about what an appropriate solution would be, perhaps a feature request, the support engineer discovered that there is an undocumented flag that allows you to specify the hba of the Promise unit you want to execute a command on.

Here’s an example. Let’s say we want to check the SMART status of two Promise Pegasus from the command line:

 promiseutil -C smart -v

will return the information for the default device only.

If you want to be explicit about which Promise Pegasus you’re checking, first get the hba numbers of the connected units:

promiseutil -C spath

The results will be something like this:

archer:~ admin$ promiseutil -C spath
=================================================
Type  #    Model        Alias   WWN          Seq
=================================================
hba   1  * Pegasus2 R4       2000-0001-5558-2fe2  1
hba   2    Pegasus2 M4       2000-0001-5558-3f92  1

Now we use the magic (apparently undocumented) -P (uppercase, not the documented lowercase) flag to specify the unit we want to look at.

promiseutil -T hba -P 1 -C smart -v

which returns the results for the first unit.

promiseutil -T hba -P 2 -C smart -v

will return results for the second unit.

My sincere thanks to the people at Promise who helped us sort this out (you know who you are) and to my fellow bug wrestler, Allen Hancock of Watchman Monitoring.

As always, be cautious with promiseutil. Its power is mighty and Bad Things® can happen if used incorrectly.

Scanning more than one Promise device with promiseutil

So, comes the day when you have more than a single Promise Pegasus attached to a Mac and you’d like to leverage some of your utilities to check the second device.

“No problem,” you think, “I’ll just count the number of devices, then check each one in sequence.”

Except…

promiseutil is broken in one, very important way.

From inside promiseutil, the command to switch to the second unit in the chain would be something like:

spath -a chgpath -t hba -p 2

And that command works just fine. But as we’ve seen from previous work, executing promiseutil from inside a bash script requires the use of the screen command.

Executing this command from inside promiseutil run under screen does not work correctly. promiseutil appears to ignore the command and remains on the default device.

The official response from Promise is as follows:

This has been made/designed in a way to work as it is described in the KB article (and it is not a bug,but that’s how it has been designed to work) that was given on my earlier reply and it can’t used in the way that you have given and I am sorry that there are no work around available.

If you know someone at Promise and have any influence, it would be a significant improvement to have this bug removed from the next release of the promiseutil.

Heck, if you’re feeling bored, file a bug report with them here.

Promise Pegasus2: Scripting an Enclosure check with promise_enclosure_check.sh

The Promise Pegasus2 has onboard sensors that monitor the power supply  voltages, speed of the fan, and temperature of the controller and backplane.

This seems worth performing the occasional check on.

The example script below runs an initial check of the enclosure using promiseutil. If it doesn’t find that “Everything is OK”, it runs a more verbose check, logs the problem and optionally sends email.

#!/bin/bash
#
# promise_enclosure_check.sh
#
# Checks the status of a Promise Pegasus2 RAID enclosure and mails the output if there's an issue.
#
# Author: AB @ Modest Industries
#
# Works with Promise Utility for Pegasus2 v3.18.0000.18 (http://www.promise.com)
# Requires sendemail for email alerts (http://caspian.dotconf.net/menu/Software/SendEmail/)
#
# Edit History
# 2014-04-21 - AB: Version 1.0.
# 2014-05-08 - AB: Refinements.
# 2014-05-09 - AB: Better message_body if failed.

export DATESTAMP=`date +%Y-%m-%d\ %H:%M:%S`

# Editable variables

# Path to sendemail
sendemail_path="/Library/Scripts/Monitoring/sendemail"

# If a problem is found, send email?
send_email_alert=true

# Variables for sendemail
# Sender's address
alert_sender="[email protected]"

# Recipient's addresses, comma separated.
#alert_recipient='[email protected], [email protected]'
alert_recipient="[email protected]"

# SMTP server to send the messages through
alert_smtp_server="smtp.example.com"

# ------------ Do not edit below this line ------------------
# Variables

# Pass / fail flags
enclosure_pass=true

# The subject line of the alert.
alert_subject="Alert: Promise Pegasus2 enclosure problem detected on $HOSTNAME."

# Alert header
alert_header="At $DATESTAMP, an enclosure problem was detected on this device:\n"

# Pass / Fail messages
pass_msg="Promise Pegasus Enclosure check successful."
fail_msg=" *** Promise Pegasus Enclosure check FAILED!!! ***\n\n"

# Alert footer
alert_footer="Run 'promiseutil -C enclosure -v' for more information."

# Create temp files
unit_ID_tmp=`mktemp "/tmp/$$_unit_ID.XXXX"`
enclosure_results_tmp=`mktemp "/tmp/$$_enclosure_results.XXXX"`

message_body="$alert_header"

# Get the information for this Promise unit. Includes workaround for promiseutil tty issue.
screen -D -m sh -c "promiseutil -C subsys -v >$unit_ID_tmp"

# Drop the output into a variable.
unit_ID=$(<$unit_ID_tmp)

# Get the report, put it into a tmp file.
screen -D -m sh -c "promiseutil -C enclosure >$enclosure_results_tmp"

if ! grep -qv "Everything is OK" $enclosure_results_tmp
then
        enclosure_pass="false"
        # Get a more detailed report, put it into a tmp file.
        screen -D -m sh -c "promiseutil -C enclosure -v >$enclosure_results_tmp"

        # Build the message.
        message_body=$message_body$fail_header$unit_ID$(<$enclosure_results_tmp)
fi

#  ----------------- Logging & email ------------------

# Log the results, conditionally send email on failure.
if [ "$enclosure_pass" == "false" ]; then
        message_body="$message_body\n\n$alert_footer"
        echo "$DATESTAMP: \n\n$message_body" >> /var/log/system.log
        if [ "$send_email_alert" == "true" ] ; then
                "$sendemail_path" -f $alert_sender -t $alert_recipient -u $alert_subject -m "$message_body" -s $alert_smtp_server
        fi
else
        echo "$DATESTAMP: $pass_msg" >> /var/log/system.log
fi
# Cleanup
rm -f rm -f $unit_ID_tmp $enclosure_results_tmp

The script was developed against a Promise Pegasus2. It hasn’t been tested with the earlier Promise Pegasus series.

2014-11-07 – Update: Merci to Stéphane Allain for catching a typo in the script.

Promise Pegasus2: Scripting a disk check with promise_disk_check.sh

When you deploy a Promise Pegasus2, you want to run regular disk health checks and send an email notification if there’s a problem. The Promise Utility app can theoretically do this* when there’s someone logged in at the console, but we’re rarely running these in environments where there’s anyone logged at the console.

The solution is to script a check of the disks using the promiseutil command line utility and then create a cronjob to run it at regular intervals.

Here’s an example disk check that parses the output of phydrv, logs each run to system.log and can optionally send email if a problem is found.

#!/bin/bash
#
# promise_disk_check.sh
#
# Checks the phydrv status of a Promise Pegasus, logs and mails the output if there's an issue.
#
# Author: A @ Modest Industries
# Last update: 2014-07-19
# 2014-07-19 - tweaked grep to allow for Media Patrol
#
# Works with Promise Utility for Pegasus2 v3.18.0000.18 (http://www.promise.com)
# Requires sendemail for email alerts (http://caspian.dotconf.net/menu/Software/SendEmail/)

export DATESTAMP=`date +%Y-%m-%d\ %H:%M:%S`

# Editable variables

# Path to sendemail
sendemail_path="/Library/Scripts/Monitoring/sendemail"
# Email alert?
send_email_alert=true

# Variables for sendemail
# Sender's address
alert_sender="[email protected]"

# Recipient's addresses, comma separated.
#alert_recipient='[email protected], [email protected]'
alert_recipient="[email protected]"

# SMTP server to send the messages through
# alert_smtp_server="smtp.example.com:port"
alert_smtp_server="smtp.example.com"

# Subject line of the alert.
alert_subject="Alert: Promise disk problem detected on $HOSTNAME."

# Header line at the top of the alert message 
alert_header="At $DATESTAMP, a problem was detected on this device:\n"

# Pass / Fail messages
pass_msg="Promise disk check successful."
fail_msg=" *** Promise disk check FAILED!!! ***"

# ------------ Do not edit below this line ------------------
# Variables
pass=true
results=""

# Create temp files
unit_ID_tmp=`mktemp "/tmp/$$_ID.XXXX"`
results_tmp=`mktemp "/tmp/$$_results.XXXX"`

# Get header information for this Promise unit. Includes workaround for promiseutil tty issue.
screen -D -m sh -c "promiseutil -C subsys -v >$tmpdir$unit_ID_tmp"
unit_ID=$(<$tmpdir$unit_ID_tmp)

# Get status of the disks.  Includes workaround for promiseutil tty issue.
screen -D -m sh -c "promiseutil -C phydrv >$tmpdir$results_tmp"

# Check each line of the output the test results.
while read -r line
do
        if grep '^[0-9]' <<< "$line" | grep -Eqv 'OK|Media'
        then
                results=$results"BAD DRIVE DETECTED: $line\n\n"
                pass=false
        fi
done < $tmpdir$results_tmp

# Log the results, conditionally send email on failure.
if [ "$pass" = false ] ; then
        results="$alert_header$unit_ID\n\n$results\n$alert_footer"
        echo "$DATESTAMP: $fail_msg\n\n$results" >> /var/log/system.log
        if [ "$send_email_alert" = true ] ; then
                "$sendemail_path" -f $alert_sen:der -t $alert_recipient -u $alert_subject -m "$results" -s $alert_smtp_server
        fi
else
        echo "$DATESTAMP: $pass_msg" >> /var/log/system.log
fi

# Cleanup
rm -f $tmpdir$unit_ID_tmp $tmpdir$results_tmp

Note that the script uses sendemail for sending mail, a very useful little drop in for when the local machine isn’t running mail services.

*I say “theoretically” because configuring email in the Promise Utility is a mess and I’ve yet to see a single successful notification after configuring it.

2014-05-04 – Updated to make the path to sendemail a variable.

2014-07-19 – Changed grep to handle false positive during Media Patrol runs

Promise Pegasus2: The gap between a failing disk and a failed disk.

We were recently called in to diagnose a relatively new Promise Pegasus2 R6 that intermittently refused to mount. The Promise Utility app reported nothing amiss with the RAID or the drives, green lights everywhere, so we used the command line to dig a little deeper.

So let’s run a verbose SMART check on the unit:

promiseutil -C smart -v

The first three drives checked out. Drive 4 indicated that SMART thought everything was fine:

PdId: 4
Model Number: TOSHIBA DT01ACA2
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK

But then a little further down,  CRC errors:

Error 165 occurred at disk power-on lifetime: 1176 hours (49 days + 0 hours)
  When the command that caused the error occurred,
  the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 50 b0 ee 81 0d

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 80 a8 80 ee 81 40 00      18:38:48.276  WRITE FPDMA QUEUED
  61 80 a0 00 ee 81 40 00      18:38:48.276  WRITE FPDMA QUEUED
  61 80 98 80 ed 81 40 00      18:38:48.276  WRITE FPDMA QUEUED
  61 80 90 00 ed 81 40 00      18:38:48.276  WRITE FPDMA QUEUED
  61 80 88 80 ec 81 40 00      18:38:48.275  WRITE FPDMA QUEUED

Error 164 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
  When the command that caused the error occurred,
  the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 10 f0 ad 6b 0d  Error: ICRC, ABRT 16 sectors at LBA = 0x0d6badf0 = 225160688

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 80 80 ad 6b 40 00      18:36:07.145  WRITE DMA EXT
  35 00 80 00 ae 6b 40 00      18:36:07.144  WRITE DMA EXT
  35 00 80 00 ad 6b 40 00      18:36:07.144  WRITE DMA EXT
  35 00 80 80 ab 6b 40 00      18:36:07.139  WRITE DMA EXT
  35 00 80 00 ab 6b 40 00      18:36:07.139  WRITE DMA EXT

Error 163 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
  When the command that caused the error occurred,
  the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 f0 10 5e 5d 0d  Error: ICRC, ABRT 240 sectors at LBA = 0x0d5d5e10 = 224222736
...

The client confirmed that he’d seen a warning light on drive 4, but that it had “gone away”. We had them back the data up immediately. Promise support subsequently verified that the drive had failed based on the logs and sent a replacement drive out.

If the drive had failed completely, I assume the RAID would have kicked in, taken the bad drive offline and continued spinning, but since the drive hadn’t actually failed, the volume was struggling with a failing member and that was causing boot and performance issues.

The take-away is that there’s a generous gap between a drive that’s beginning to fail and a drive that’s failed enough for the Promise Utility app to detect it. Verbose mode is your friend.

Turn off Promise Pegasus2 power savings one-liner.

From Terminal:

promiseutil -C ctrl -a mod -s “powersavinglevel=0″

Now check your handiwork:

promiseutil -C ctrl -v | grep PowerSavingLevel

You should see:

PowerSavingLevel: 0

You may be tempted to turn off PowerManagement. Here’s what Promise’s website has to say about that:

Do not disable Power Management. Doing so will make the Pegasus2 hang when the Mac is in sleep or shut down and restarted and will require the Pegasus and the Mac to be power cycled to return to normal operation.

Promise Pegasus2 command line tools.

When I began deploying Promise Pegasus2 storage devices, I wasn’t happy with the state of the Promise Utility app. It doesn’t provide email alerts except when a user is logged in and this is isn’t optimal for most of our deployments.

Then I stumbled on a couple of Ruby scripts by GriffithStudio that showed a way around many of the limitations of the Promise GUI.

When you install the Promise Utility for Promise Pegasus2, it includes a command line utility. You can view status and even change the settings of the device. In a Terminal window, type:

promiseutil

You’ll be greeted with an interactive command line.

-------------------------------------------------------------
Promise Utility
Version: 3.18.0000.18 Build Date: Oct 29, 2013
-------------------------------------------------------------
 
List available RAID HBAs and Subsystems
=============================================================
Type  #    Model         Alias                            WWN                 
=============================================================
hba   1  * Pegasus2 R4                                    2000-0001-5557-98bf 
 
Totally 1 HBA(s) and 0 Subsystem(s)
 
-------------------------------------------------------------
The row with '*' sign refers the current working HBA/Subsystem path
To change the current HBA/Subsystem path, you may use the following command:
  
  spath -a chgpath -t hba|subsys -p <path #>.
 
Type help or ? to display all the available commands
-------------------------------------------------------------
 
cliib> 

To get a list of commands, type ? and press return. Some of the available commands include:

subsys - Model, serial number, hardware revision.
enclosure - Enclosure status.
ctrl - Firmware version, array & RAID status.
phydrv - Physical drive status.
array - Array status.
logdrv - Logical drive status.
event - Event log, including abnormal shutdowns.

Many of the commands yield a brief Pass/Fail style response:

cliib> enclosure 
=============================================================
Id  EnclosureType               OpStatus  StatusDescription                    
=============================================================
1   Pegasus2-R4                 OK        Everything is OK

If you want more details, you can add the verbose flag, -v. Want the serial number, model and hardware revision?

cliib> subsys -v
 
-------------------------------------------------------------
Alias: 
Vendor: Promise Technology,Inc.        Model: Pegasus2 R4
PartNo: F29DS4722000000                SerialNo: M00H00CXXXXXXXX
Rev: B3                                WWN: 2000-0001-5557-98bf

You can grab enclosure information, including temperature of box, backplane and controller card, as well as the rotation speed of the fans and voltage on the power rails with this:

cliib> enclosure -v
 
 
-------------------------------------------------------------
Enclosure Setting:
 
EnclosureId: 1
CtrlWarningTempThreshold: 63C/145F     CtrlCriticalTempThreshold: 68C/154F
 
 
-------------------------------------------------------------
Enclosure Info and Status:
 
EnclosureId: 1
EnclosureType: Pegasus2-R4
SEPFwVersion: 1.00
MaxNumOfControllers: 1                 MaxNumOfPhyDrvSlots: 4
MaxNumOfFans: 1                        MaxNumOfBlowers: 0
MaxNumOfTempSensors: 2                 MaxNumOfPSUs: 1
MaxNumOfBatteries: 0                   MaxNumOfVoltageSensors: 3
 
=============================================================
PSU       Status                        
=============================================================
1         Powered On and Functional     
 
=============================================================
Fan Location        FanStatus             HealthyThreshold  CurrentFanSpeed 
=============================================================
1   Backplane       Functional            > 1000 RPM        1200 RPM        
 
=============================================================
TemperatureSensor   Location       HealthThreshold   CurrentTemp    Status    
=============================================================
1                   Controller     < 63C/145F        49C/120F       normal    
2                   Backplane      < 53C/127F        47C/116F       normal    
 
=============================================================
VoltageSensor  Type    HealthyThreshold         CurrentVoltage  Status         
=============================================================
1              3.3V    +/- 5% (3.13 - 3.46) V   3.2V            Operational    
2              5.0V    +/- 5% (4.75 - 5.25) V   5.0V            Operational    
3              12.0V   +/- 8% (11.04 - 12.96) V 12.0V           Operational 

How about the state of the physical drives?

cliib> phydrv   
=============================================================
PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     
=============================================================
1    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot1   OK        Array0 No.0      
2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      
3    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      
4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3

You can also run commands without entering interactive mode and this is useful when incorporating into bash scripts. Simply add the -C flag, followed by the command you want to run. For example:

krieger:~ admin$ promiseutil -C logdrv -v

will let you view the logical drives.

Many of these commands will run on the previous generation Promise Pegasus units.

Be aware: it’s possible to change the configuration of your Pegasus2 or even destroy your RAID setup from the command line, so use caution when working on production systems.