[This site is not connected with the SAS Institute]
[last updated - 08 September 2007]
Introduction
My prediction for 2003 was that "data _null_" would come into its own in
the field of clinical reporting. I was proved wrong. The reason
I thought so at the time was because at the time the FDA were insisting
that tables had to use a 12 point font. The information that used to fit
across the page using a 10 point font or smaller now has to be crammed
in somehow. This led to some fields being "stacked" one under the other
in columns. Proc report will happily "flow" columns for you if you specify
the "flow" option and you can control the flow with split characters. However,
following lines in proc report are placed after where the previous line
has flowed to. "Flow"ing effectively makes a line "fatter". The flowing
won't fill up space that following lines don't need. So suppose your "order"
variable contains age, race, sex and weight separated by a split character
then this will flow most nicely. But the second data line for that group
will be printed on the line below weight. So that leaves a gap of three
lines. Now this would hardly be an issue if there were 60 lines on a page.
But using a 12 point font limits a page to about 43 lines, even when the
spacing between lines is being compressed. Now you maybe have five title
lines and five footnotes. You have a gap before you start the columns and
because you are stacking values like age, race, sex, weight your columns
are going to be 4 deep. And underneath that you have a header line. And
because the data lines will leave a gap before the first footnote then
you are reduced to only 26 lines of data. And if you have a "flow" gap
of 3 lines out of 26 then the report starts to look absurd. The 3 line
gap is now a yawning chasm. The table looks incorrect and unacceptable.
And there is worse to come. Your age/race/sex/weight column has more chance
of starting near the bottom of the page and so getting separated. If you
have less than 4 lines left on the page then you shouldn't start it. How
is proc report going to handle that?
It is possible to still use proc report and get round this "flow" problem
by manipulation of the dataset going into it. You then use proc report
to display your data rather than using it naturally. But the code can become
very complex You will need to know how many lines will fit on a page because
you will have to artificially put out a new age/race/sex/weight on a following
page if the previous group has data lines that flow onto a following page.
The code can end up being more complex than if you use "data _null_". But
there is a niggling problem with "data _null_" reports. Thankfully they
can automatically give us titles and footnotes using the options "titles
footnotes" on the file statement but there is one thing it will not be
able to use that other procedures can. And that is make substitutions for
#byvar
and #byval entries. If a data _null_ report could interface with
the titles fully, as does the other reporting procedures, then I think
data _null_ would be unleashed. For some of the 12 point font tables, programmers
could use data _null_ instead of doing massive manipulation of datasets
so that proc report can display it properly. The more manipulation you
do, the less you can trust the final result. I would say that if your code
becomes so complex that a good programmer could not follow it then it could
never be considered to be "validated". Sure, it might work properly on
the example datasets you put through it. You may not have had any problems
reported. But what about the next time? Do you know that the routine is
correct or are you trusting to luck? I would say that if you were trusting
to luck then the routine is unacceptable. Using data _null_, the logic
would be easier in some cases, so as long as you did not have a problem
with the #byval and #byvar in titles then in certain circumstances it could
and should be used in preference to proc report.
%bytitle - a first attempt
In almost all cases, the line that contains #byval and #byvar entries is
the last title line. We could check the contents of the last title line
by looking in the sashelp.vtitle view and if we found it contained "#byvar"
or "#byval' then we could write the information out to a global macro variable
and then nullify that title line. We could then test for whether the global
macro variable had a value in it and "put" the title out at the start of
the header routine, making substitutions. But in making substitutions we
have to know the case used for #byval and #byvar. Proc report and other
reporting procedures are oblivious to the case used for #byvar and #byval.
It could be #ByVaR and #byVAL for all they care. It will still be handled
the same. So if we do detect any forms of these strings in the titles and
extract it to a global macro variable we need to be sure we have spotted
it. And it is wise to convert it into a standard form like uppercase, leaving
all other characters with unchanged case. This converting of a mixed case
string to a specific form required me to write a macro called %casestrvar.
So to extract the last title and write it to a global macro variable if
it is a "by" title, I came up with this macro %bytitle.
I'll follow this with a piece of code you can run, but I want you to
understand that this method has limitations. These #byvar's and #byval's
could be in any title line - not just the last title line. Also we have
limited knowledge about the titles. We can find out what they are in sashelp.vtitle
but we do not know if there were any trailing spaces defined to them. It
could be that that last title line was followed by 50 spaces or more to
align the title to the left. So to do a proper job we would need to attempt
substitutions in all the title lines and somehow recover the original length
of the titles, adjust for the substitution and then "put" it at the correct
column. I'll be showing you how to do that in the next section, but for
now, take a look at the code required to get data _null_ to use #byval
and #byvar in this very simple case. I'll be using some of the logic in
the next section in any case so it would be useful to study the code. You
might like to copy the following code and run it.
data test; do year=2000 to 2003; do month=1 to 12; output; end; end; label year='What a Year !!'; run;
data _null_; length newtitle $ 200; retain bytitle "&_bytitle_" ls 0; file print titles footnotes header=header; set test; by year; if _n_=1 then do; ls=getoption('ls'); end; if first.year and not (_n_=1) then put _page_; put @49 month 2.; return; header: if bytitle ne ' ' then do; if vlabel(year) ne ' ' then newtitle=tranwrd(bytitle,'#BYVAR1',trim(vlabel(year))); else newtitle=tranwrd(bytitle,'#BYVAR1','year'); newtitle=tranwrd(newtitle,'#BYVAL1',left(year)); put @ (floor((ls-length(newtitle))/2)+length(newtitle)-length(left(newtitle))+1)
newtitle; end; put; put @48 'month'; put @48 '-----'; return; run;
The final solution
In order to make our system for handling titles reliable we should assume
it is possible for #byvar and #byval to appear in any title line and process
accordingly. We can let footnotes come out using "footnotes" on the file
statement but we will have to set "notitles" and do the titles manually.
In the previous section I mentioned the problem with not knowing the title
lengths. We have the text in sashelp.vtitle but we do not know if any trailing
spaces were defined to assist alignment. We need to get this somehow. I
do not know where SAS actually stores their real titles nor how to get
at them. The only way I could think of to get this information was to create
a dummy report, read the titles (and footnotes) back in again, compare
with what is in sashelp.vtitle and try to work out the original lengths
from the alignment in the report. To do this I wrote a macro called %titlelen.
This adds a length variable as best it can. It also turns mixed-case #byval
and #byvar into uppercase form so that it is easier to make substitutions.
Note that this macro contained a Windows-type DOS system command to delete
the temporary print file it creates. You will need to amend this line if
you are running on a different platform.
I was going to implement this solution as a macro but I changed my mind.
I think it is better if you are fully in control of the code you have running
in a data _null_. If you could follow the code in the previous section
then you will have no problems understanding the following code. Try copying
it and running it. Experiment using trailing blanks on the title lines.
Although this solution to getting data _null_ to talk to #byvar and #byval
is hardly elegant, it does at least work. And, for me, it unleashes data
_null_ reports and will allow data _null_ to cope with some of the complex
reporting issues that the FDA 12 point font guideline will cause. In the
following code example you will see much better control of the report position
and an option to add a "byline" as per proc report or proc print.
Please not that I am not entirely happy with the solution myself. I
expected to set the "notitles" option in the "file" statement and have
it all working fine. But for some reason, it throws extra pages if I do
this. So I have had to nullify the titles and keep the "titles" option
in my "file" statement. After hours trying to resolve this problem I find
it is a known bug in SAS, with the identifier SN-003818.
The titles can be recreated after the data _null_ has finished by feeding
the "titles" dataset into the %titlegen macro,
so having to nullify the titles to get the page throws right isn't a big
problem.
data test; do year=2000 to 2003; do month=1 to 12; output; end; end; label year='What a Year !!'; run;
title1;
data _null_; length newtitle $ 200; retain ls 0 startcol 0 byline 0 repwidth 5; file print titles footnotes header=header; set test; by year; if _n_=1 then do; ls=getoption('ls'); startcol=floor((ls-repwidth)/2)+1; if getoption('byline')='BYLINE' then byline=1; end; if first.year and not (_n_=1) then put _page_; put @startcol month 4.; return;
header: do tptr=1 to tnobs; set titles point=tptr nobs=tnobs; if text EQ ' ' then put; else do; oldlen=length(text); if vlabel(year) ne ' ' then newtitle=tranwrd(text,'#BYVAR1',trim(vlabel(year))); else newtitle=tranwrd(text,'#BYVAR1','year'); newtitle=tranwrd(newtitle,'#BYVAL1',left(year)); newlen=length(newtitle); length=length+newlen-oldlen; if length>ls then length=ls; put @ (floor((ls-length)/2)+length(newtitle)-length(left(newtitle))+1)
newtitle; end; end; if byline then do; put; _file_=repeat('-',ls-1); if vlabel(year) ne ' ' then newtitle=trim(vlabel(year))||'='||left(year); else newtitle='year='||left(year); substr(_file_,floor((ls-length(newtitle)+2)/2)+1,length(newtitle)+2)='
'||trim(newtitle)||' '; put; end; put; put @startcol 'month'; put @startcol '-----'; return;