Time to Event analysis

(Author: Roland Rashleigh-Berry                                                                                    Date: 10 Jun 2006)






The easiest way to do time-to-event analysis is to "flatten" your data so that you have only one observation per "by group" (where this "by group" is usually a single variable such as "subject"). With your data all in one observation, also knowing how many observations belong to each by group, then you can use array processing to loop through your data. If you organise your data like that then it becomes a lot easier. Since multiple variables may need to be flattened, I wrote a macro named "flatten" to do all the "proc transpose"s on these variables and to count the number of observations per "by group" and add that to the output dataset. You can view the macro below.
flatten

Below is an extremely simple example of code to find the date at which a value rose above 1000. This is far simpler than anything you will need to do in reality but serves to show how the data is transformed with the flatten macro and to show how to loop through the data.
 
data test;
  subj=1000;
  dt='01jan03'd;val=0;output;
  dt='01feb03'd;val=500;output;
  dt='01mar03'd;val=1005;output;
  dt='01apr03'd;val=2005;output;
  subj=2000;
  dt='01jan03'd;val=100;output;
  dt='01feb03'd;val=100;output;
  dt='01mar03'd;val=100;output;
  subj=3000;
  dt='01jan03'd;val=100;output;
  dt='01feb03'd;val=1110;output;
  dt='01mar03'd;val=500;output;
  format dt date7.;
run;

%flatten(dsin=test,bygroup=subj,vars=dt val)
%put ********* _maxn_=&_maxn_;

data t2event;
  set test;
  array dt {*} dt:;
  array val {*} val:;
  do i=1 to nobs;
    if val(i)>1000 then do;
      date=dt(i);
      i=nobs;
    end;
  end;
  format date date7.;
  drop i;
run;

data _null_;
  set t2event;
  put (_all_) (=);
run;

And here is part of the log where you can see the "date" variable added that will be set to when the count exceeded 1000 (if at all).
 

92    %put ********* _maxn_=&_maxn_;
********* _maxn_=4
93
94    data t2event;
95      set test;
96      array dt {*} dt:;
97      array val {*} val:;
98      do i=1 to nobs;
99        if val(i)>1000 then do;
100         date=dt(i);
101         i=nobs;
102       end;
103     end;
104     format date date7.;
105     drop i;
106   run;

NOTE: There were 3 observations read from the data set WORK.TEST.
NOTE: The data set WORK.T2EVENT has 3 observations and 11 variables.
NOTE: DATA statement used:
      real time           0.01 seconds
      cpu time            0.01 seconds
 

107
108   data _null_;
109     set t2event;
110     put (_all_) (=);
111   run;

subj=1000 nobs=4 dt1=01JAN03 dt2=01FEB03 dt3=01MAR03 dt4=01APR03 val1=0 val2=500 val3=1005
val4=2005 date=01MAR03
subj=2000 nobs=3 dt1=01JAN03 dt2=01FEB03 dt3=01MAR03 dt4=. val1=100 val2=100 val3=100 val4=.
date=.
subj=3000 nobs=3 dt1=01JAN03 dt2=01FEB03 dt3=01MAR03 dt4=. val1=100 val2=1110 val3=500 val4=.
date=01FEB03
NOTE: There were 3 observations read from the data set WORK.T2EVENT.
NOTE: DATA statement used:
      real time           0.01 seconds
      cpu time            0.01 seconds
 

Some notes on the code. The maximum value of the number of observations per by group gets written out to the global macro variable _maxn_. You can use this in your array statement like this:
 
array dt {*} dt1-dt&_maxn_; 

...but you usually do not need to reference this macro variable as you can refer to a list of variables using a colon trailer as was done in the code.
 

Note that I am using a form of "put _all_" in the code that you might not be familiar with. I have used "put (_all_) (=)" to avoid putting out the automatic variables _N_ and _ERROR_. You can read more about this on the SAS technical support web site here.
 
 
 
 

Use the "Back" button of your browser to return to the previous page

contact the author


 

 

What the world says about data and and also set
Fastest FTP on the planet Go FTP FREE Software