© 2009 Mitch Richling
You will find here several simple tools that may be of use for UNIX system administrators. My Guide to UNIX System Programming has a few programs that may be useful as well.
If given a link, or chain of links,
fstat.pl
will follow them, printing out bits of information along the way, until it finds a real file the link(s) terminate
with. It will then print out most of the data available via the stat(1) and file(1) commands
provided by most UNIX variants.
Some years ago I came across a virtual forest of symbolic links inside an application tree that required the
resolution of as many as 58 links before a real real file was to be found!!! While I wrote
fstat.pl
to help with that contract, it has proven to be so handy over the years that it has found a home in my
private bin directory. Aside from the link following capability, the ability of this script to provide
access to stat(3) data from the command line in a consistent way on different UNIX variants is probably
its most charming feature.
stats.pl started life as a bit of one line perl magic, and has grown over the years into what you see here. The idea is simple: bust the text data up into columns of numeric values and then report various statistical information.
The stats.pl script is capable of some fairly sophisticated statistical computations. Of course the standard things like mean, average, max, min, count, standard deviation, variance, regression lines, and histograms are all available. On the more exotic side, the script understands combinatorial (factor) variables - a feature commonly found only in dedicated statistical packages. Another advanced feature, which is even quite rare in advanced statistical software, is the ability to generate weighted histograms. All of the various computations may be performed on the input data or on data computed from the input data -- rather like how one sometimes adds a computed column to an spreadsheet. Finally, the format in which all of the statistical computations are reported is quite customizable allowing a range of formats from machine readable ones like CSV and TSV to human consumable reports using fixed width tables.
Perhaps the most complex, and useful, feature of the stats.pl script is the powerful techniques it uses to extract the data in the first place. After all, there is no point in having sophisticated computational capabilities if one can't extract the data and get it into the tool -- this is a barrier every working statistician learns very soon after entering the real world!! This is doubly important for UNIX geeks that tend to deal with numerous oddly formatted text files on a daily basis. Note that the script is not only capable of using the data it extracts, it is also capable of outputting the filtered and scrubbed data in various formats (like CSV). Many people tell me they primarily use the script in this mode as a sort of a general purpose "data extractor and filter" allowing them to feed data into tools like R, SAS, or (goodness forbid) Excel. I know of no other tool that even comes close in terms of flexibility in data extraction.
For simple cases, the script "just works" with the default values; however, more complex examples are easy to find in the day-to-day life of a UNIX system administrator:
vmstat?vmstat is funny as the second line has the titles while the first and third lines are
junk with the data starting on line four. That sounds painful, but
stats.pl
makes it easy:
-skipLines=3 -headerLine=2
mpstat?mpstat is another odd one in that the first line and every fourth line consists of
column titles. How kooky is that? We note that each title line has the string CPU and none of the
data lines do. So we can use something like this:
-headerLine=1 -notRegex=CPU
mpstat, but I want a summary for each CPU?mpstat in a column called CPU - the column we used
in the previous FAQ entry to delete the title lines. All we need do is tell
stats.pl
about this column. The following options will do the trick:
-headerLine=1 -notRegex=CPU -cats=CPU
sar?sar is more complex. The first three lines are bogus, the fourth line has titles
MIXED with data, and the last two lines are junk (a blank line and an "Average" line). Still, it isn't
too bad telling
stats.pl
how to get the data. Because this one is so complex, there are different ways to do it. Here are three:
-notRegex=Average -goodColCnt=5
-stopRegex='^$' -skipLines=4
-notRegex='(^$|Average)' -skipLines=4
sar data? -colNames==time,usr,sys,wio,idle
SPARC based computers running Solaris have a feature available known as the "host ID" - a 32-bit integer
intended to uniquely identify the host. This "host ID" is burned into the PROM of older hardware, and is
programmed into the NVRAM of newer SPARC platforms. The design of the UNIX operating system is such that software
always interacts with the actual hardware via the kernel - thus one may effectively change the host ID by manipulating
the running kernel. The kernel in a UNIX system is nothing more than a program, and thus may be manipulated with a
debugger. The Host ID is stored in the kernel symbol hw_serial. Unfortunately, stuffing a new Host ID
into hw_serial is rather convoluted:
hw_serial, hw_serial+4, and hw_serial+8. The script
newhostid.pl
performs the necessary conversions, and the uses adb debugger to make the changes to the running
kernel. BTW, you can change the Host ID by patching the binary file
/kernel/misc/sysinit on Solaris x64.
Solaris system administrators are often required to perform several extremely tedious tasks related to Solaris patches - hopefully the following scripts will help a bit...
STDIN that contains patch IDs,
and determines if the current system has the patches, or newer versions, installed. The text stream sent to
STDIN can contain text other than the patch IDs, and the application will extract the patch IDs from
the text - so one may simply cat in a README file or e-mail into the script and let it find the patch
numbers.The syslogd daemon is a part of most versions of UNIX ranging from commercial systems like Solaris and
HP/UX to free systems like Linux and FreeBSD. This daemon provides a central, and uniform device through which all
applications on a computer, or set of networked computers, may log messages. Unfortunately, this venerable tool has
some quirks that make it difficult to use. These problems include:
last message repeated n times" messages make line oriented tools
like grep less than useful. I have used a uniform naming convention for these scripts. If the script has "xsyslog" in the
name, then it processes a syslog file that has been "expanded" by the first script listed below. If the
script only has "syslog" in the name, then it processes a raw syslog file.
grep(1), sed(1), etc... cron task causing a spike of
errors every day at 1PM would show up as a spike in the histogram graph. This script allows one to specify the time
quantum, and a regular expression to select the messages to count.
countSyslogByTime.gp
is an input file for gnuplot that can be used to graph the histogram. grep for interesting messages. The
filterXsyslog.pl script is similar except that it requires an
"expanded" syslog file. logger(1) command line tool that adds several features:
/etc/syslog.conf
file.STOUT, or STDERR - very useful for debugging./etc/syslog.conf file.The traditional way to traverse a file system is to simply use a recursive algorithm such as the one described in APUG. This algorithm is generally I/O bound; however, the culprit on modern systems is often I/O latency - not bandwidth. This is particularly true with today's transaction based I/O subsystems and network file systems like NFS. One way to alleviate this bottleneck is to have multiple I/O operations simultaneously in flight. Using this technique on a single CPU Linux box with a local file system only produces marginal performance increases, but when dealing with NFS file systems the speedup can be quite significant. Experiments with multi-CPU hosts utilizing gigabit Ethernet with large NFS servers show incredible performance improvements of well over 50x (20 hours cut down to 20 minutes). This set of programs has been used to traverse hundreds of terabytes of storage distributed across more than a billion files and 100 fileservers in just a few hours.
The idea is to first store every directory given on the command line in a linked list. Then a thread pool is created,
and the threads pop entries off of that linked list in the order they were placed in the list (FIFO). Each thread then
reads all the entries in the directory it popped off the list, performs user defined actions on each entry, and stores
any subdirectories at the end of the linked list. This algorithm leads to a roughly depth-first directory
traversal. The nature of the algorithm places a heavy load upon the caching systems available in many operating
systems. For example, ncsize plays a roll in how effective this program is on a Solaris system. Also in
Solaris the number of simultaneous NFS connections dramatically effects performance. Depending on what the optional
processing functions are doing, this program can place an incredible load on
nscd.
The version of the code linked here is written in C, and makes use of ancient C techniques to provide for tool customization. The C++ version provides a dramatically superior extension and abstraction model, and is much less difficult to extend. While the C++ version was written at about the same time as the C version it has seen much less testing in the real world, and I am hesitant to release it into the wild. In addition, an MPI version for both the C and C++ code exists that can spread the system across many hosts in a network. Like the C++ version, I am not comfortable enough with this version to release it.
The code base is designed to be customized so that binaries may be easily produced to do special tasks as the need arises. As an example of this, several compile options exist for the code in the archive that generate different binaries that do very different things. Currently the following examples may be compiled right out of the box:
du/bin/du. It has no command line options, and simply displays the output of a
'du -sk'. dux/bin/du that displays much more data about the files traversed
including: file sizes, number of blocks, detects files with holes, and lots of other data. ownagenoowndirgofind ./', only it does an almost depth-first
search.