You think learning Unix is not for you? Stick with SAS because there's
enough there yet to learn? True, there's plenty more to learn in SAS. Very
true for me, and I have been programming in it intensively since 1986.
So leave Unix for the techies because it won't help you with SAS and the
work you do? Well, think again! If you work on a Unix platform and you
don't know Unix well then I guarantee you that you are not working as efficiently
as you could be. Very far from it, in fact.
SAS is SAS and Unix is Unix, right?
WRONG. This is probably the assumption you
are making and you couldn't be more wrong. That is why you don't bother
learning more about Unix. Don't blame yourself, though, because this is
what 99% or more of SAS programmers on Unix platforms think. But, in fact,
you can combine SAS and native Unix commands easily and in so doing open
up whole new possibilities for ease and greater efficiency. Once you start
out down this route, you will never look back. But you can't see that now,
so I would like you to step into my shoes for a few minutes so you can
see things from my point of view. I will tell you about a typical day of
mine using SAS with Unix.
How I use Unix with SAS
I have written a number of Unix utilities and many of these execute SAS
within them. I use some of these utilities every day at work. I can't remember
a day, now, in the past year, when I have not used at least one of them.
I'll give you a list of the SAS/Unix utilities I use often as well as some
of the pure Unix ones that still relate to the work I do with SAS. I'd
like you to imagine what it could be like working in the following way
that I will illustrate. Put yourself in my place and imagine doing the
I create a dataset and place it in a library. I want to check that I have
assigned labels to all the variables. I make that directory the current
directory and type in the command at the Unix prompt contents demog.
I see the contents of the demog dataset displayed on the screen.
If I want to see more details then I type in contentsl demog instead
(a longer version of my contents utility) and see the length,
variable type and formats as well. I will soon see if I have missed off
a label. Maybe I want to see the contents of all the datasets
in that library. I just type the command contents and there it is
for every dataset. If I want to route what I see to a file then I type
in something like contents > cont instead and can browse the file
later. Suppose I want to know what datasets contain the variable
then I can pipe the output of the contents command to
like this contents | grep SESS and there, on the screen, are all
the datasets that contain the variable SESS. Do you see how SAS
and Unix can work together? Do you see how simple it is? There's more to
I've created a dataset in the output library. I've used contents
to check the labels. All is well. But have I populated all the variables?
I just type in the command allmiss demog and it will tell me if
I have any all-missing character or numeric variables. I type in misscnt
demog if I want a count of the number of missing values for each variable
rather than be informed about the all-missing variables. Suppose I want
to check the whole library for all-missing variables. I just type in allmiss
with no parameters.
I come across a strange subject whose data just doesn't add up. I need
to look at all the data I have for that subject and piece together what
is going wrong by cross-referencing the information in the various datasets.
I do this nearly every day. I type in the command printalln subject=1234
> subj1234 and then all the data for subject 1234 in that library is
put in the file subj1234 where I can browse it. If I were interested
in data for an unexpected session then I could type in something like printalln
sess=99 > sess99 and go look at it in the file sess99. It's
as easy as that. Yes. it is running SAS behind the scenes. Of course it
is. But you won't find any sas code or logs being left behind in those
directories. It just does its work and then disappears. It is just like
a native Unix command except that you have SAS working for you instead.
I have a tight deadline. Time is running short. There is a "titles"
dataset somewhere with all the titles and footnotes in it for all the code
that produces output. Is meeting this deadline going to be possible? How
many reporting programs haven't been written yet? Well, it's easy for me
to find out. I just type in the command intitlesnoprogs in any relevent
study directory and I get a list up on the screen of all the missing reporting
programs. The utility has read the titles dataset, has searched the program
directories (or perhaps all the programs directories for that study area)
and got a list of entries and told me which sas programs haven't been written
yet. This is SAS and Unix working directly together to provide you useful
SAS project information.
This is a simple one I wrote years ago. I have created a library of
SAS datasets and I want to know where there are discrepancies of label,
length, format or whatever among identically-named variables throughout
that library. I just type in the command clash and then I see the
discrepancies listed. If there are a number of them I might repeat the
command but direct it to a file where I can mull over the discrepancies
at my leisure like this clash > clash.
This should be made compulsory for QC'ing, in my opinion. A suite of
programs has run. Have all the error messages and warning messages been
checked? What about the important note statements put out in the log? I
can just type in the command scanlogs for a directory and it will
scan all the logs for important messages that programmers need to check
out. I could pick a single log, if I wanted to, or a specific group of
logs like this scanlogs d3p*.log.
I once managed to delete all the programs in a directory by using the
Unix command rm *.sas when I meant to type in rm *.log. I
was tired that day. Since that day I create and maintain a backup
sub-directory in all program directories and would advise others to do
the same. This was a small disaster but I still had all the logs from the
programs, so I wrote a utility called rescue. It gets back the code
from the program logs. It can be implemented using SAS talking to Unix
or using pure Unix utilities (awk or nawk to be precise).
That's got to be better than making a fool of yourself to Unix support
and waiting three days for them to bring back your backed-up SAS programs.
Especially if you have to deliver your reports the next day in any case.
If I create a new program, I use a script I wrote called hdr
something like this: hdr newprog . It prompts me for a program purpose
and creates the SAS member with all sorts of useful information filled
in including the project and study identity it has pulled out of the directory
name. It has pulled out my name as the program author and puts in the date.
If it is a macro, and I want more in the header, then I use the command
instead. If it is a Unix shell script then I use shdr instead. Documentation
of code is a pain, but this helps. It almost makes documentation a fun
thing to do. And when your study documentation is good and easy then things
are on the up and up, programming wise.
You may or may not have heard about the diff utility that is native
to Unix. You use it to compare two files. These files could be report output
files. You know that sometimes you need to do a complete rerun due to a
couple of data changes and you want to make sure that the outputs have
only changed in the way you want. And you are not interested in seeing
listing of differences of who ran what when such as lines like: "userid:/sas/programs/thisprog.sas
23JUL03 15:13 Page 2 of
88" . You just want to see real differences in figures that you are
expecting. Well, I wrote one to do that based on the native diff
You go to a subdirectory of your output directory where you have stored
all your previous outputs and type in the command ddiff and you
will see a list of all differences between the outputs in the subdirectory
compared to those in the parent directory. This only takes seconds to run
and for you to check that the outputs match what you expected. Better than
re-QC'ing the whole lot. But in order for this to work well you will need
to know a bit about something called "pattern matching" so you can spot
and blank out those lines you are not interested in.
listempty and pages
More on report outputs. I can check to see if any reports don't contain
anything by just typing in the command listempty *.lst. Once I am
satisfied that all the reports contain something then I might want to print
out pages 2 to 4 only for all the reports and write them to a file. I can
do this with pages 2 4 *.lst > pages2to4.
These Unix utilities that I have listed above that run SAS (or sometimes
not) have the generic name of "Unix shell scripts". They have a
language syntax that is quite different to SAS (actually there are a few
different types each with their own peculiarities of syntax) and it might
put you off trying to learn them. So you might feel that you will never
be able to write your own utility that calls SAS and interacts with Unix
like the ones I have listed. I thought about that and the problems that
SAS programmers might face making a start on this so I wrote a utility
that writes utilities. Yes, you read that right - It's a utility that
writes utilities !!. You call this utility named sasunixskeleton
and it asks you what you want to call your new utiility and what it does
and it writes the shell script for you. All you have to do is add a bit
of code where it says EDIT to supply a usage message and your SAS code.
The rest of the shell script you leave alone and it will work, even if
you haven't got a clue what it is supposed to be doing.
I've by no means covered all possible examples of what would be useful
to you in your Clinical trials reporting environment. There is no such
thing as a definitive list. It all depends on the way you work. All I hope
to have achieved here, assuming you have read the above list and thought
about it, is to open up your mind to what might be possible in your own
workplace if you could make SAS talk to Unix in the form of utilities that
operate to match the way you work. Let me assure you that there is very
little "pain" to get to this stage and a great deal of "gain" to be had.
Unix commands, and writing Unix shell scripts, is much easier than writing
code. It just looks odd and when you make an error it is hard to debug.
But once you know how to do this you will be able to transform the way
you work and raise the efficieny of your department in a way you never
previously thought possible.
I invite you to join me in making the transition from being a pure SAS
programmer to a SAS programmer who has married their skills to the Unix
environment. You have maybe studied the many SAS macros on this web site
to see how it could be applied to achieve greater efficiency in the field
of Clinical reporting so maybe you are ready to take one more step with
me in that direction. Click on the following link to move onto the next