Computer are optimized to work with relatively small chunks of data and their performance is impressive in these circumstances. But once you go outside this comfortable limit, things slow down a lot ! The worst thing you can do with lab data is to make it any larger than it already is.
People can come up with many good reasons why lab data should have more variables added to it, for QC purposes, reference purposes or whatever, but the amount of data is already too large and adding more variables makes it more difficult for a computer to process the data.
The computer can cope with pure lab data for an average study but processing times will increase in cubic proportions once you get past a certain limit (approximately half a gigabyte). Double it and your processing can take eight times longer. Increase it five fold and your processing could take 125 times longer. If you have pooled study data and you add many variables to the lab data then processing it could take days! If you are processing pooled study lab data then you already have a problem. It is best not to make that problem worse.
"Derived" or "value added" datasets are popular and it is easier to produce tables from them but for lab data you have got to be very careful in adding extra variables because of the increase in processing time that will result.
Another problem comes from text variables being much larger than they need to be because somebody thought it would be a good idea to "standardise" them and may have increased the length of some character fields to 200 bytes. If this is the case then you should explain the problem this causes and ask for them to be made smaller. Even if you are not keeping these variables you will still be inconvenienced because it will take longer to read the lab data.
proc sql noprint;
create table _patinfo as (
/*--- population ---*/
/*--- on treatment ---*/
/*--- pre treatment ---*/
/*--- post treatment ---*/
/*--- post study ---*/
) order by study, ptno; select distinct(study) into: studies
separated by " " from _patinfo;
*- lab grouping variables -;
*- lab values -;
quit; |
I don't need the two variables anallbl and popudc anymore
because I wrote them to macro variables in the SQL step so I drop them
here to save space. It's not very important because once we have the lab
data in study / patient order then things run fast after that but every
small thing helps.
/************************
Extend patinfo ************************/ data _patinfo2;
/************************
*- create a format to map treatment code to the decode label
-;
*- create a format for treatment arm totals -;
|
See how I am adding flag variables and also only keeping lab values for that population by using "if _a and _b". This is the best time for dropping lab observations from patient populations we are not interested in. We know it will be done efficiently this way.
See how I sort it at the end with the variable "_period" added. This will not be a problem because the lab data is already nearly in this order so the sort step will not take long. The sort algorithms will detect this and use an efficient method.
You can see that I am creating three flag variables for the different
time periods. This is inefficient but it was written as QC code and I need
to do a "proc compare" against the original dataset. But what I have done
is declare them a length of 3 to hopefully save space. If some of your
text variables are too large then you should reduce the length of them
at this stage so the following sort is faster.
/************************
Add lab flags ************************/ *- _period is used as a work variable which will be dropped
-;
proc sort data=_lab2;
|
/************************
Flag REPEAT values ************************/ data _lab3; length _fgrept 3; set _lab2; by study ptno labnm _period visno subevno; if _fgprev and not last.visno then _fgrept=&reptval; if _period=99 and not last.visno then _fgrept=&reptval;
label _fgrept="Repeated Values";
|
The last data step is the merging back in and flagging using variables
_fgbslv (baseline value) and _fglastv (last value). Again, these flag variables
make it easy to visually QC the data to make sure the correct values are
being selected. It is building a dataset with all the flags in it that
can be optionally saved for QC purposes and later will be the source dataset
for creating the final report.
/**********************************
Baseline and Last Value Flag **********************************/ *- Sometimes this value is not in the correct visno order
so it -;
*- keep only the last -;
*- Sometimes this value is not in the correct visno order
so it -;
*- keep only the last -;
*- merge and set flags -;
|
/**************************
Summarize **************************/ %let wherecls=and _hasbase=1 and _haslast=1;
*- Summarize and transpose by statistics name/_tp value using
the -;
%unistats(dsin=_laball3(where=((_fgprev or _fgontv or _fgpostv)
&wherecls and missing(_fgrept))),
|
/**************************
Add lab group order **************************/ *- add in the lab group info for proc report ordering -;
|
Note that I have used "order=internal" for the order variables.
This is always a good idea.
/**************************
Produce report **************************/ *- adjust column width according to how many statistics we
have in each group -;
*- for normalised values only show if we have a non-missing
rounding variable value -;
*- The _statkeys_ content is dynamically handled in the proc
report call. -;
proc report missing headline headskip nowd split="@" spacing=2
data=_transtat2&wherecls;
/* NOTE: _statkeys_ contains a list of the transposed numeric */
("_Value at visit_" %suffix(1STR,&_statkeys_))
|
Use the "Back" button of your browser to return to the previous page.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.