Introduction
You will find here several simple scripts that hopefully will be useful for UNIX system administrators. My Guide to UNIX System Programming has a few programs that may be useful as well.
Follow links and get file information
If given a link, or chain of links, fstat.pl will follow them, printing out bits of information along the way, until it finds a real file the link(s) terminate with. It will then print out most of the data available via the stat(1) and file(1) commands provided by most UNIX variants.
Some years ago I came across a virtual forest of symbolic links inside an application tree that required the resolution of as many as 58 links before a real real file was to be found!!! While I wrote fstat.pl to help with that contract, it has proven to be so handy over the years that it has found a home in my private bin directory. Aside from the link following capability, the ability of this script to provide access to stat(3) data from the command line in a consistent way on different UNIX variants is probably its most charming feature.
Compute Statistics
stats.pl started life as a bit of one line perl magic, and has grown over the years into the described here. The idea is simple: bust the text data up into columns of numeric values and then report various statistical information.
The stats.pl script is capable of some fairly sophisticated statistical computations. Of course the standard things like mean, average, max, min, count, standard deviation, variance, regression lines, and histograms are all available. On the more exotic side, the script understands combinatorial (factor) variables - a feature commonly found only in dedicated statistical packages. Another advanced feature, which is even quite rare in advanced statistical software, is the ability to generate weighted histograms. All of the various computations may be performed on the input data or on data computed from the input data -- rather like how one sometimes adds a computed column to an spreadsheet. Finally, the format in which all of the statistical computations are reported is quite customizable allowing a range of formats from machine readable ones like CSV and TSV to human consumable reports using fixed width tables.
Perhaps the most complex, and useful, feature of the stats.pl script is the powerful techniques it uses to extract the data in the first place. After all, there is no point in having sophisticated computational capabilities if one can't extract the data and get it into the tool -- this is a barrier every working statistician learns very soon after entering the real world!! This is doubly important for UNIX geeks that tend to deal with numerous oddly formatted text files on a daily basis. Note that the script is not only capable of using the data it extracts, it is also capable of outputting the filtered and scrubbed data in various formats (like CSV). Many people tell me they primarily use the script in this mode as a sort of a general purpose "data extractor and filter" allowing them to feed data into tools like R, SAS, or (goodness forbid) Excel. I know of no other tool that even comes close in terms of flexibility in data extraction.
For simple cases, the script "just works" with the default values; however, more complex examples are easy to find in the day-to-day life of a UNIX system administrator:
- How do I extract the data from
vmstat? - The output of
vmstatis funny as the second line has the titles while the first and third lines are junk with the data starting on line four. That sounds painful, but stats.pl makes it easy:-skipLines=3 -headerLine=2 - How do I get extract the data from
mpstat? - The output of
mpstatis another odd one in that the first line and every fourth line consists of column titles. How kooky is that? We note that each title line has the stringCPUand none of the data lines do. So we can use something like this:-headerLine=1 -notRegex=CPU - OK. I got the data from
mpstat, but I want a summary for each CPU? - The CPU is labeled in the output of
mpstatin a column calledCPU- the column we used in the previous FAQ entry to delete the title lines. All we need do is tell stats.pl about this column. The following options will do the trick:-headerLine=1 -notRegex=CPU -cats=CPU - How do I get the data from
sar? - The output from
saris more complex. The first three lines are bogus, the fourth line has titles MIXED with data, and the last two lines are junk (a blank line and an "Average" line). Still, it isn't too bad telling stats.pl how to get the data. Because this one is so complex, there are different ways to do it. Here are three:-notRegex=Average -goodColCnt=5-stopRegex='^$' -skipLines=4-notRegex='(^$|Average)' -skipLines=4 - How can I get better titles from
sardata? - First, see the previous question about how to get the data.
Use one of the options, and add the following to the command line:
-colNames==time,usr,sys,wio,idle
Change Solaris host IDs
SPARC based computers running Solaris have a feature available
known as the "host ID" - a 32-bit integer intended to
uniquely identify the host. This "host ID" is burned
into the PROM of older hardware, and is programmed into the NVRAM
of newer SPARC platforms. The design of the UNIX operating system
is such that software always interacts with the actual hardware
via the kernel - thus one may effectively change the host ID by
manipulating the running kernel. The kernel in a UNIX system is
nothing more than a program, and thus may be manipulated with a
debugger. The Host ID is stored in the kernel symbol
hw_serial. Unfortunately, stuffing a new Host ID
into hw_serial is rather convoluted:
- Convert the hex host ID into decimal.
- Compute the ASCII code for each digit.
- Compute the hex equivalent of each ASCII code.
- Place these ASCII code, hex numbers into groups of 4.
- Pad the last hex number with zeros.
- Place the resulting 3, 32-bit, hex integers into
hw_serial,hw_serial+4, andhw_serial+8.
The script
newhostid.pl
performs the necessary conversions, and the uses adb
debugger to make the changes to the running kernel. BTW, you can
change the Host ID by patching the binary file
/kernel/misc/sysinit on Solaris x64.
Solaris Patch Tools
Solaris system administrators are often required to perform several extremely tedious tasks related to Solaris patches - hopefully the following scripts will help a bit...
- checkpatch.pl
- Takes a list of patches on the command line, or a stream of
text on
STDINthat contains patch IDs, and determines if the current system has the patches, or newer versions, installed. The text stream sent toSTDINcan contain text other than the patch IDs, and the application will extract the patch IDs from the text - so one may simplycatin a README file or e-mail into the script and let it find the patch numbers. - diffpatches.pl
- Takes two host names as arguments. It then tells you what the differences are between the patches on the two hosts. It even tells you information regarding the versions of the various patches found on both hosts. This script can be invaluable when trying to find out why a program works on host A and doesn't on host B.
- patchmach.sh
- Takes a host name, and will tell you what patches need to be installed on the current host to bring it up to the same patch levels as the given host. This is handy when you simply what to bring a particular host up to the level of some reference host on your network.
- kerpatch.sh
- This is a template for a script that can install Solaris kernel patches. Kernel patches generally require a reboot and need to be installed at run level 2. This script checks to see if a particular patch is installed on the current host based upon the OS version. If the patch is not installed, it installs it in the correct way for the OS version. It then reboots the host. It logs EVERYTHING. If it fails, it will NOT attempt to install the patch on the next boot until the lock file is removed. This prevents a host from falling into a "reboot loop". This is a simple, but very handy little script that can make the install of a kernel patch on thousands of hosts painless. It can be changed to install other patches as well.
Work with syslog message files
The syslogd daemon is a part of
most versions of UNIX ranging from commercial systems like
Solaris and HP/UX to free systems like Linux and FreeBSD. This
daemon provides a central, and uniform device through which all
applications on a computer, or set of networked computers, may
log messages. Unfortunately, this venerable tool has some quirks
that make it difficult to use. These problems include:
- The "
last message repeated n times" messages make line oriented tools likegrepless than useful. - It is often difficult to judge the spread of a network wide problem by looking at the messages file for a large network.
- Many times one simply wishes to count the number of errors by host in order to judge the severity of a problem on a large network.
- It is very difficult to find time dependencies in syslog data. For example, it is difficult to see hourly or daily repeating errors.
- All of the "don't care" messages can really get in the way and obscure messages that are important.
I have used a uniform naming convention for
these scripts. If the script has "xsyslog" in
the name, then it processes a syslog file that has been
"expanded" by the first script listed below. If the
script only has "syslog" in the name, then it
processes a raw syslog file.
- expandSyslog.pl
- Probably the most useful script. It expands the "last
message repeated n times" messages into "n"
copies of the previous message. It also prepends each line
with a "-" if it is a unique message found in the
file, and a "+" if it is generated from a "last
message repeated n times" message. This tool opens up the
processing of syslog files to the considerable collection of
text processing utilities available on all UNIX platforms -
like
grep(1),sed(1), etc... - countXsyslogByHost.pl
- Takes an expanded messages file, one that has been processed by expandSyslog.pl, and counts the messages that match a given regular expression by host - i.e. how many times each host has produced a message that matches the search criteria. This kind of thing is very difficult to get a feel for by just looking at the messages file on a large network.
- countSyslogByTime.pl
- Takes a raw messages file and creates a
histogram based on buckets that are formed by time so that
problems that occurring at regular intervals may be easily
identified. For example, a
crontask causing a spike of errors every day at 1PM would show up as a spike in the histogram graph. This script allows one to specify the time quantum, and a regular expression to select the messages to count. countSyslogByTime.gp is an input file forgnuplotthat can be used to graph the histogram. - extractHostXsyslog.pl
- Extracts the messages that were generated from a set of hosts given in a file.
- extractMultiHostsSyslog.pl
- Extracts the messages that were generated from a set of hosts given in a set of files. The extracted messages are sorted into different files based upon the host groupings specified to the tool. This allows one to break a messages file up into classes based upon host class or group.
- filterSyslog.pl
- extracts all the messages that do NOT match ANY of the
regular expressions given to the script. Basically this is
a handy way to
grepfor interesting messages. The filterXsyslog.pl script is similar except that it requires an "expanded" syslog file. - syslogger.pl
- is a more flexible replacement for the
logger(1)command line tool that adds several features:- The ability to change the level of logging in the script - no
need to change the
/etc/syslog.conffile. - The ability to change the destination of the logging to a local file,
STOUT, orSTDERR- very useful for debugging. - The ability to change both the log level and log destination based on command line
options or environment variables - no need to change the
/etc/syslog.conffile.
- The ability to change the level of logging in the script - no
need to change the
Fast filesystem traversal
The traditional way to traverse a file system is to simply use a recursive algorithm such as the one described in APUG. This algorithm is generally I/O bound; however, the culprit on modern systems is often I/O latency - not bandwidth. This is particularly true with today's transaction based I/O subsystems and network file systems like NFS. One way to alleviate this bottleneck is to have multiple I/O operations simultaneously in flight. Using this technique on a single CPU Linux box with a local file system only produces marginal performance increases, but when dealing with NFS file systems the speedup can be quite significant. Experiments with multi-CPU hosts utilizing gigabit Ethernet with large NFS servers show incredible performance improvements of well over 50x (20 hours cut down to 20 minutes). This set of programs has been used to traverse hundreds of terabytes of storage distributed across more than a billion files and 100 fileservers in just a few hours.
The idea is to first store every directory given on the command line
in a linked list. Then a thread pool is created, and the threads
pop entries off of that linked list in the order they were placed
in the list (FIFO). Each thread then reads all the entries in the
directory it popped off the list, performs user defined actions on
each entry, and stores any subdirectories at the end of the linked
list. This algorithm leads to a roughly depth-first
directory traversal. The nature of the algorithm places a heavy
load upon the caching systems available in many operating
systems. For example, ncsize plays a roll in how
effective this program is on a Solaris system. Also in Solaris the
number of simultaneous NFS connections dramatically effects
performance. Depending on what the optional processing functions
are doing, this program can place an incredible load on
nscd.
The version of the code linked here is written in C, and makes use of ancient C techniques to provide for tool customization. The C++ version provides a dramatically superior extension and abstraction model, and is much less difficult to extend. While the C++ version was written at about the same time as the C version it has seen much less testing in the real world, and I am hesitant to release it into the wild. In addition, an MPI version for both the C and C++ code exists that can spread the system across many hosts in a network. Like the C++ version, I am not comfortable enough with this version to release it.
The code base is designed to be customized so that binaries may be easily produced to do special tasks as the need arises. As an example of this, several compile options exist for the code in the archive that generate different binaries that do very different things. Currently the following examples may be compiled right out of the box:
du- A very fast version of
/bin/du. It has no command line options, and simply displays the output of a 'du -sk'. dux- A very fast, extended version of
/bin/duthat displays much more data about the files traversed including: file sizes, number of blocks, detects files with holes, and lots of other data. own- Prints the names of all files in a directory tree that are owned by a specified list of users.
age- Produces a report regarding the ages of the files in a directory tree.
noown- Prints the names of all files in a directory tree that are NOT owned by a specified list of users.
dirgo- Simply lists the files it finds. This is similar to
a '
find ./', only it does an almost depth-first search.