Building the Derived Datasets

(Author: Roland Rashleigh-Berry Date: 10 Jun 2006)

Introduction

The Spectre component for building the derived datasets and "stats" datasets might not be in use at your site. Either not at all or not for certain studies. If not at all then you will learn nothing useful here, so it is recommended that you skip this page and go back to the previous page using your browser's "Back" button.

Spectre offers very little to help programmers build their derived datasets. It is quite the opposite, in fact. You have to help the reporting system. This will be explained later.

The reporting system upon which this one is based, used the "make" utility. The "make" utility analyses the dependencies between programs and creates a script to run programs in the correct order. Though an extremely clever utility, I witnessed much time wasted in editing the scripts it created. Also, nearly all the programs within a reporting system are there to create tables and listings, and if this is sensibly done from derived/stats/reporting-ready datasets then there should no dependencies between these programs. This only leaves the few programs that create the derived datasets (maybe only twenty) and there are easier, more visible and more obvious ways of running these programs in the correct order.

Program numbering

The way the reporting system runs the derived dataset build programs in the correct order is to trust a numbering system that you are supposed to use. It runs programs in the order using the number as the first sort key and the full program name as the second sort key. The number must be placed directly before the first underscore in the program name. This is best explained using examples.

The way derived datasets are built, there is usually a cut down version of an "acct" dataset that the other programs use so this cut down "acct" dataset has to be the first built. Let us assume that this cut down "acct" dataset is named "acct0". There may well be other programs that have to be run before this can be created, so a recommended name for it would be s10_acct0.sas . The "10" gives it the main order key, the full name "s10_acct0" is the name order key and will be used as the second key in the sort. Note the underscore. This is compulsory. You can have other underscores but the number must be directly before the first underscore. The actual "name" part of the program, in this case "acct0", should be the same as the dataset it builds. Finally, the program name, to qualify, must end with ".sas".

The way the scripts are delivered, it is a requirement that program names starts with "s" or "d" but the Spectre administrator is allowed to edit the script to change this to meet your site standards. Note that you must not leave old versions of programs around that fit this pattern, otherwise these old versions will be run as well. For example, suppose you have an old version of the program just mentioned, then do not call it s10_acct0_old.sas or this too will be run. Call it s10_acct0.old instead or use some other way to make sure it does not match the pattern.

Numbering order

The program that creates the cut down "acct" dataset should have the lowest number. "10" is recommended. Then it is hoped that all the other programs that build derived dataset can have the number "11". It is recommended that the number of the program that builds the full "acct" dataset be "20" so that it should be named s20_acct.sas . Once the full version "acct" dataset is built there will probably be some efficacy program that have to be built later that use information in the full "acct" dataset that could not be put in the cut down "acct0" dataset so these should have the number "21".

There will often be some dependencies between the derived dataset build programs. To give an example, it may be a requirement to include the dose a patient was on for the onset of every AE (Adverse Event). So the "dose" dataset program has to come before the "adv" (adverse event) dataset program. So if the dose program is named s11_dose.sas then the adverse event dataset program should be called s12_adv.sas (assuming the dataset is named "adv").

From time to time, programs will have to be renamed where more dependencies are found. This can get out of control, with numbers going higher and higher, unless two rules are followed. These are:

No derived dataset is allowed to contain fields that are in the full "acct" dataset that are not in the cut down "acct0" dataset. This is only allowed for efficacy datasets.

You should use the lowest number you can, within the constraints of the numbering system.

The Spectre administrator is also aware of these rules and will make checks on your program names.

derorder

The script that will be called to identify all the programs to be run and the order to run them in is called "derorder". You can call this script to give you this list of programs. It will just display the list - it will not run any programs. You call it like this:

derorder

You can use this script to ensure that all the programs you want to run will be run and those you do not want to run will be left out. You will see the order in which your programs will be run.

Conclusion

You will see that the only help the reporting system gives you in the area of building the derived datasets is the script "derorder" which tells you all the derived dataset programs that match the pattern it has found and the order in which they will run. The rest of the work you must do yourself to ensure the reporting system will run the programs in the correct order.

Use the "Back" button of your browser to return to the previous page.

contact the author

More on programs and concerning dataset
The Fastest FTP Client on the Planet, GoFTP FREE