Extending the power of SAS® software using other software packages

[last updated: 06 July 2008]

Introduction

I sometimes read discussions about what is the best statistical software package. Those that feature include, but are not limited to, Matlab, R, SAS, S-Plus, SPSS and Stata. "R" seems to dominate these discussions at the moment with its extendable nature and its better graphics. But all seem to agree that for handling very large amounts of data (think tens of gigabytes) that SAS is King. SAS is more a data handling language with stats and graphics added on. The other packages are more visualisation or statistical and most will require that all the data be in memory. This is not always possible where large amounts of data are involved. If you have SAS then you will probably not want it to be replaced by another statistical package with poorer data handling capabilities but you might wish that SAS could do some of the things that some of the other packages can do and thereby extend its power. In this way you could use the power of SAS to manipulate the data in the way you want to and then call another software package to do the analysis or do the graphics for you. "Is this possible?", you may wonder. Can you imagine SAS communicating with another software package, passing the data to it, telling it what code to run and then getting back the results -- or does this sound too much like science fiction? Well, actually, it is easy to do this........ but in a very unexciting way.

Batch Mode

The way to make the functionality of the other software packages easily available to your SAS program is to run the other software package in batch mode using a system call with the data prepared and the code (or "script") to run put in a file for the software package to read. Nearly all statistical software packages can run in batch mode. All the ones I have listed above can run in batch mode. Then, when the software package has finished running the code in batch mode and creating the analysis or the output that you want, your SAS program will resume and carry on running. You can have the best of both worlds! Better still, since there is no limit to the number of software packages you can call in this way, you can have the best of all worlds!!

Adding the software package to your PATH

In order for your SAS program to be able to call the software package by its executable name, using a system call, your computer has to know where to find it. Both Windows versions and Unix/Linux/AIX have a PATH system environment variable and this tells the computer where to look for executable files such as the software package. You can edit the setting of the PATH variable or it can be edited for you. On a Unix/Linux system, if you are running the "bash" shell, you will have a file in your home directory called .bashrc or .bashrc_own that you can edit and change the value of the PATH variable. On a Windows system, if you right click on your "My Computer" icon, Choose "Properties" and then select the tab for "Advanced" (for Windows XP) you will find "Environment Variables". Click on this and you will see two panels. The top panel is for "User Variables" (which we don't want) and the bottom panel is for "System Variables". Click on "Sytem Variables" and you will find a variable called "Path". This you can click on and edit. To the end of this you can use a semicolon as a separator (Unix/Linux uses a colon for this) and after it add the full path name of the directory that contains the ".exe" file of the statistical package. If you are on Windows, you can find both the path and the ".exe" file name by right-clicking on the program icon of the software package you want to call, choosing "Properties", then the "Shortcut" tab and this information will be in the "Target" field. By adding the directory that contains this executable to your PATH variable you will be able to call this executable by its name from your SAS program using your system call.

Inside your SAS program before the call

Before you call the other statistical software package, you have to write the code (sometimes called a "script") to a file that you will specify. The best way to do this is with a "data _null_" using DATALINES4 and "put"ting _INFILE_ out to the file that will contain your code. DATALINES4 will allow for there being semicolons in the data, which is quite possibly the case for other computer languages. Below is an example of how to do this. I am writing the code to a file named "roland.r" in a directory I have chosen.
 
data _null_; 
  file "C:\spectre\roland.r"; 
  input; 
  put _infile_; 
  datalines4; 
this is a the first line of code; with; semicolons 
this is the second line of code
;;;; 
run;

Whatever code you provide must create the output in a format you can use and put it in a file somewhere where you can find it. Note that for graphics, the best format for including into printable documents is "vector graphics". You can read about these image file formats here.

Next you have to prepare the data so that the software package can read it. Some packages can read sas files directly. This is the easiest way but the program you call won't know about a dataset called "sales" or "work.sales" since it won't know about your libraries you have set up. You will have to write this dataset to a libref whose destination you know and then "clear" this libref before you call the software package. Then the software package will have to be told its full pathname location.

For those software packages that can't read sas datasets, you will have to put the data in a form that it can read and put it in a place that you tell it. You might have to write out your data to a comma-delimited file.

In the above cases, it would be a good idea to make sure these files are deleted before you create them -- just in case something goes wrong and the external software uses old copies of the data or runs out-of-date code. Also, any files the external software creates should be deleted before the run.

Making the call

To call the software package, I recommand you use the "systask command" method rather than using "X" with which you are probably more familiar. This is because, certainly for Windows systems, it prevents unwanted windows popping up and disappearing again. You have to set the right system option for this. The following is what you need.
 
options xsync noxwait;

With those options in effect you call the software package like this. What you put in double quotes will have to be the correct syntax for calling the software package in batch mode that you have read about from the documentation for that software package.
 
systask command "the .exe file plus tell it where the code is" taskname=xxx;
waitfor xxx;

....where "xxx" shown above is a name you choose for the task (can be anything sensible). The SAS program will (should) wait for the task to finish before resuming execution. I say "should" because it can vary with the software package but what I suggest should work so you should try that first.

So the software has run and it should have created your output. Yippee! What next?

Inside your program after the call

After the external software package has finished its batch run, you are highly advised to copy its log file into your own sas log file but in a way that identifies itself with the software package you have called. I like the system suggested in the paper "SAS to R to SAS" written by Phil Holland where he reads in the log creates by the external software package. In this case it is the log from an R program and that is why the lines he writes to the log are prefixed with "**R: ". You should follow that notation for other software packages as well such that a stata program will have its log prefixed with "**STATA: ". I hope you will read his paper for a real-life example of SAS calling R, although the focus of that paper was more on using ODS in conjunction with "R" and some of the techniques I recommend in this document differ from the example shown in that paper. Note that although SAS writes profuse notes, warnings and error messages, the software package you call might not tell you much when things go wrong. It's not SAS's fault!
 
DATA _NULL_;
  INFILE 'c:\temp\r\program.log';
  FILE LOG;
  INPUT;
  PUT '**R: ' _INFILE_;
RUN;

A cheaper alternative to SAS software

Now we can see how easy it is for SAS to use the functionality of other software packages, it opens the door for a cheaper alternative to using SAS software. That is to use WPS which is a software package where the native language is identical to SAS code, though its functionality, especially on the graphics and statistics side, is more limited. WPS supports system calls using the "X" system. The licence is cheaper than for SAS, it has the same strengths for handling large volumes of data, and its lack of graphics and statistical functionalities can be complemented using other software packages called in batch mode. Within WPS you will be able to call other software packages as shown below but note that the documentation for "WPS" says making system calls is only suitable for Windows platforms as this copy of part of their user manual says.
 
Remarks
The X statement executes the command as if it had been typed into an operating
system command prompt or shell. The X statement is only suitable for use on the
Windows platform.

Instead of "systask command" that I recommend for your call from a SAS program, your WPS code would look like this.
 
options xsync xwait;
X "the .exe file plus tell it where the code is";

Note that I do not recommend giving up using SAS software and moving over to WPS to try to save on costs. This would be a very difficult decision to take. But you might identify situations where exactly the same job could be done using WPS instead of SAS and then it might be worthwhile using WPS instead.

"Moving over to R"

There is some talk in the pharmaceutical industry about "moving over to R". That is to run down the SAS side of things for clinical reporting and shift more analysis and graphics to R. I do not believe a complete transition from SAS to "R" will be possible in the next few years. This is because R can not handle the very large volumes of data that would be typical of lab data, for example. I guess somebody might write extensions to R to allow it to handle large volumes of data but the emphasis of R is on the visualisation and analysis side. There is no incentive to develop it along the lines of handling large volumes of data so this might never be done. The ideal is to keep SAS (or WPS) for its data handling capabilities but have the functionality of R. You have seen from the above that it is easy to combine R with SAS. That should be a happy marriage for the pharmaceutical industry for the forseeable future.

Conclusion

You have seen how SAS (and WPS) can call external statistics software packages and then resume execution and in this way add the functionality of other software vendors to its own, thereby effectively extending its power.
 

Go back to the home page.

E-mail the macro and web site author.







SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.