Extending the power of SAS® software using other software packages
[last updated: 06 July 2008]
Introduction
I sometimes read discussions about what is the best statistical software
package. Those that feature include, but are not limited to, Matlab,
R, SAS, S-Plus, SPSS and Stata. "R" seems to dominate
these discussions at the moment with its extendable nature and its better
graphics. But all seem to agree that for handling very large amounts of
data (think tens of gigabytes) that SAS is King. SAS is more a data
handling language with stats and graphics added on. The other packages
are more visualisation or statistical and most will require that all the
data be in memory. This is not always possible where large amounts of data
are involved. If you have SAS then you will probably not want it to be
replaced by another statistical package with poorer data handling capabilities
but you might wish that SAS could do some of the things that some of the
other packages can do and thereby extend its power. In this way you could
use the power of SAS to manipulate the data in the way you want to and
then call another software package to do the analysis or do the graphics
for you. "Is this possible?", you may wonder. Can you imagine SAS communicating
with another software package, passing the data to it, telling it what
code to run and then getting back the results -- or does this sound too
much like science fiction? Well, actually, it is easy to do this........
but in a very unexcitingway.
Batch Mode
The way to make the functionality of the other software packages easily
available to your SAS program is to run the other software package in
batch mode using a system call with the data prepared and the code
(or "script") to run put in a file for the software package to read. Nearly
all statistical software packages can run in batch mode. All the ones I
have listed above can run in batch mode. Then, when the software package
has finished running the code in batch mode and creating the analysis or
the output that you want, your SAS program will resume and carry on running.
You can have the best of both worlds! Better still, since there is no limit
to the number of software packages you can call in this way, you can
have the best of all worlds!!
Adding the software package to your PATH
In order for your SAS program to be able to call the software package by
its executable name, using a system call, your computer has to know where
to find it. Both Windows versions and Unix/Linux/AIX have a PATH
system environment variable and this tells the computer where to look for
executable files such as the software package. You can edit the setting
of the PATH variable or it can be edited for you. On a Unix/Linux system,
if you are running the "bash" shell, you will have a file in your home
directory called .bashrc or .bashrc_own that you can edit
and change the value of the PATH variable. On a Windows system, if you
right
click on your "My Computer" icon, Choose "Properties" and
then select the tab for "Advanced" (for Windows XP) you will find
"Environment Variables". Click on this and you will see two panels.
The top panel is for "User Variables" (which we don't want) and the bottom
panel is for "System Variables". Click on "Sytem Variables" and
you will find a variable called "Path". This you can click on and
edit. To the end of this you can use a semicolon as a separator (Unix/Linux
uses a colon for this) and after it add the full path name of the directory
that contains the ".exe" file of the statistical package. If you are on
Windows, you can find both the path and the ".exe" file name by right-clicking
on the program icon of the software package you want to call, choosing
"Properties", then the "Shortcut" tab and this information
will be in the "Target" field. By adding the directory that contains
this executable to your PATH variable you will be able to call this executable
by its name from your SAS program using your system call.
Inside your SAS program before the call
Before you call the other statistical software package, you have to write
the code (sometimes called a "script") to a file that you will specify.
The best way to do this is with a "data _null_" using DATALINES4 and "put"ting
_INFILE_ out to the file that will contain your code. DATALINES4 will allow
for there being semicolons in the data, which is quite possibly the case
for other computer languages. Below is an example of how to do this. I
am writing the code to a file named "roland.r" in a directory I have chosen.
data _null_; file "C:\spectre\roland.r"; input; put _infile_; datalines4; this is a the first line of code; with; semicolons this is the second line of code ;;;; run;
Whatever code you provide must create the output in a format you can
use and put it in a file somewhere where you can find it. Note that for
graphics, the best format for including into printable documents is "vector
graphics". You can read about these image file formats here.
Next you have to prepare the data so that the software package can read
it. Some packages can read sas files directly. This is the easiest way
but the program you call won't know about a dataset called "sales" or "work.sales"
since it won't know about your libraries you have set up. You will have
to write this dataset to a libref whose destination you know and then "clear"
this libref before you call the software package. Then the software package
will have to be told its full pathname location.
For those software packages that can't read sas datasets, you will have
to put the data in a form that it can read and put it in a place that you
tell it. You might have to write out your data to a comma-delimited file.
In the above cases, it would be a good idea to make sure these files
are deleted before you create them -- just in case something goes wrong
and the external software uses old copies of the data or runs out-of-date
code. Also, any files the external software creates should be deleted before
the run.
Making the call
To call the software package, I recommand you use the "systask command"
method rather than using "X" with which you are probably more familiar.
This is because, certainly for Windows systems, it prevents unwanted windows
popping up and disappearing again. You have to set the right system option
for this. The following is what you need.
options xsync noxwait;
With those options in effect you call the software package like this.
What you put in double quotes will have to be the correct syntax for calling
the software package in batch mode that you have read about from the documentation
for that software package.
systask command "the .exe file plus tell it where the code is"
taskname=xxx; waitfor xxx;
....where "xxx" shown above is a name you choose for the task
(can be anything sensible). The SAS program will (should) wait for the
task to finish before resuming execution. I say "should" because it can
vary with the software package but what I suggest should work so you should
try that first.
So the software has run and it should have created your output. Yippee!
What next?
Inside your program after the call
After the external software package has finished its batch run, you are
highly advised to copy its log file into your own sas log file but in a
way that identifies itself with the software package you have called. I
like the system suggested in the paper "SAS
to R to SAS" written by Phil
Holland where he reads in the log creates by the external software
package. In this case it is the log from an R program and that is why the
lines he writes to the log are prefixed with "**R: ". You should follow
that notation for other software packages as well such that a stata program
will have its log prefixed with "**STATA: ". I hope you will read his paper
for a real-life example of SAS calling R, although the focus of that paper
was more on using ODS in conjunction with "R" and some of the techniques
I recommend in this document differ from the example shown in that paper.
Note that although SAS writes profuse notes, warnings and error messages,
the software package you call might not tell you much when things go wrong.
It's not SAS's fault!
DATA _NULL_; INFILE 'c:\temp\r\program.log'; FILE LOG; INPUT; PUT '**R: ' _INFILE_; RUN;
A cheaper alternative to SAS software
Now we can see how easy it is for SAS to use the functionality of other
software packages, it opens the door for a cheaper alternative to using
SAS software. That is to use WPS
which is a software package where the native language is identical to SAS
code, though its functionality, especially on the graphics and statistics
side, is more limited. WPS supports system calls using the "X" system.
The licence is cheaper than for SAS, it has the same strengths for handling
large volumes of data, and its lack of graphics and statistical functionalities
can be complemented using other software packages called in batch mode.
Within WPS you will be able to call other software packages as shown below
but note that the documentation for "WPS" says making system calls is only
suitable for Windows platforms as this copy of part of their user manual
says.
Remarks The X statement executes the command as if it had been typed into an
operating
system command prompt or shell. The X statement is only suitable for
use on the
Windows platform.
Instead of "systask command" that I recommend for your call from a SAS
program, your WPS code would look like this.
options xsync xwait; X "the .exe file plus tell it where the code is";
Note that I do not recommend giving up using SAS software and moving
over to WPS to try to save on costs. This would be a very difficult decision
to take. But you might identify situations where exactly the same job could
be done using WPS instead of SAS and then it might be worthwhile using
WPS instead.
"Moving over to R"
There is some talk in the pharmaceutical industry about "moving over to
R". That is to run down the SAS side of things for clinical reporting and
shift more analysis and graphics to R. I do not believe a complete
transition from SAS to "R" will be possible in the next few years. This
is because R can not handle the very large volumes of data that would be
typical of lab data, for example. I guess somebody might write extensions
to R to allow it to handle large volumes of data but the emphasis of R
is on the visualisation and analysis side. There is no incentive to develop
it along the lines of handling large volumes of data so this might never
be done. The ideal is to keep SAS (or WPS) for its data handling capabilities
but have the functionality of R. You have seen from the above that it is
easy to combine R with SAS. That should be a happy marriage for the pharmaceutical
industry for the forseeable future.
Conclusion
You have seen how SAS (and WPS) can call external statistics software packages
and then resume execution and in this way add the functionality of other
software vendors to its own, thereby effectively extending its power.
SAS and all other SAS Institute Inc. product or service names are registered
trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
® indicates USA registration.