I have seen code to do this processing using arrays. In one sense this is a good idea since it is the most efficient way of handling it. However, the solution using this method never struck me as being elegant. I feel the array method springs from an incorrect assumption of what can be achieved using merges. I'll explain in the next section.
Values to carry forward (values dataset)
date value 01jan02 1 01mar02 2 01may02 3 Dates to merge with (dates dataset)
Resulting dataset (final dataset)
|
It is true that such a merge can not be done using only the variables
shown above but what we can do is add a sequence number to the "dates"
dataset and increment it when we get a match on date with the "values"
dataset like this:
data dates;
retain _seq 0; merge dates values(in=_val); by date; if _val then _seq=_seq+1; run; |
And now we have a sequence number that retains its value corresponding
to each observation in the values dataset. So if we simply add the same
sequence number to the "values" dataset like this:
data values;
set values; _seq=_n_; run; |
.... then we can merge like this:
data final;
merge dates values; by _seq; drop _seq; run; |
... and this gives us the type of merge we need for LOCF merging. If
you need further convincing, then here is some code you can copy and paste
and run.
data values;
date='01jan2002'd;value=1;output; date='01mar2002'd;value=2;output; date='01may2002'd;value=3;output; format date date7.; run; data dates;
data dates;
data values;
data final;
options nocenter;
|
And if you look at the proc print output then this is what you will
see. And I think you will agree that this is correct.
Obs date
value
1 01JAN02
1
|
So at least, in principle, it is possible to merge values in with a time sequence using a sequence.variable we have created. The above is very simple test data, of course, but we can use the same principle in more complex cases. In reality we will have an owner variable such as a subject-id that changes. There may be subcategories like a lab parameter. We would probably use visit numbers instead of dates. But the same principle of adding a sequence number to both datasets and merging using that sequence number could still apply. And it is this principle I have used in my macro solution for this sort of processing.
(values dataset)
subj param visit value AA RBC 2 22 AA RBC 4 44 AA WBC 3 33 BB ALB 3 55 (visits dataset)
|
Now before we decide on how to merge these datasets, take a look at the "visits" dataset. Visits 2-5 are the "on-treatment" visits right up to the planned end point which is visit 5. It is for these visits we want to carry forward our last observation. Visit 1 is not shown because it is likely a baseline value and because it was not a measure taken while on and as a result of treatment we would probably not want to carry forward these baseline values. So Visits 2-5 are the "eligible" visit values for carrying forward. Now look at the values dataset. Again these only have the "eligible" visits and their values and these will be the values we will be carrying forward where required. Note that the subj/param pairings in the "visits" dataset match those in the values dataset. It makes sense to only have subj/param pairings to match values otherwise you would have nothing to carry forward. But this is an assumption. You should understand the assumptions being made. Sometimes the assumptions will not be approppriate for a study you are working on and you might have to change the logic.
Now consider how we are going to approach the merge of the two datasets. Look at the data for subject AA. We would not want to carry forward values of RBC into the following WBC so obviously we will be merging using matching on the subj/param pairing. We will have to reset our sequence count at the start of each subj/param grouping and increment from there. And when we add the sequence number to the "values" dataset we will have to do the same so that when we merge using subj/param/_seq we will get the correct matching.
First read through the following code until you understand what it is
doing. The first two data steps generate the observations as shown above.
The next two data steps add the sequence number and the final data step
merges the data before printing it. Then copy and paste the code, run it
under SAS software and convince yourself that it is producing the correct
output:
data values;
subj='AA'; param='RBC'; visit=2;value=22;output; visit=4;value=44;output; param='WBC'; visit=3;value=33;output; subj='BB'; param='ALB'; visit=3;value=55;output; run; data visits;
data visits;
data values;
data final;
options nocenter;
|
You should regard this macro as a solution rather than the
solution. You need to understand what it is doing and what assumptions
are being made. It may or may not match with your study requirements. You
might end up writing a new macro that fits your site standards better,
based on this macro. I can only go so far with it and keep it as generic
and as simple as possible. If you want to develop it further then feel
free but I will only be able to support the version you find here. To view
the macro click on the link below.
locf
Use the "Back" button of your browser to return to the previous page
SAS and all other SAS Institute Inc. product or service names are registered
trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
® indicates USA registration.