[This site is not connected with the SAS Institute]
[last updated - 08 September 2007]
Introduction
There is nothing on this page that is not already on the two pages I have
written about data _null_ reporting. The page that deals with interfacing
with the titles system is here and the one
that deals with doing stacked-column reports is here.
This page only explains techniques used on those pages with more detail.
This page assumes you are a programmer wwho is not used to creating reports
using "data _null_" who has been thrown into the field of data _null_ reporting
because you have been given the task of creating reports that look like
they could be done with proc report but somehow can not.
To do data _null_ programming well, you have to learn from somebody
elses experience. Old-time programmers like me pre-date proc report. It
didn't exist when I first started using SAS. And when it was offered to
us, it was at extra cost and we couldn't see the point in having it. Many
programmers who got into SAS were used to other computer languages. Some,
like me, are ex-Cobol programmers. They are used to total control of every
field on every output page -- even controlling the throw of that output
page. But times change and the limited control that proc report had to
offer was deemed good enough for most situations. You could do a good report
in a short time using proc report, so people slowly converted over to using
it. Yes, it has limitations, but if you can accept those limitations then
using proc report is more productive than using data _null_. And so nearly
all programmers around, at the time, converted over to using proc report
and have long since forgotten their data _null_ skills. But sometimes a
table shell will not allow you to use "proc report" and if the layout can't
be negotiated into a layout that proc report can do then you might be forced
to use data _null_. This page is about the techniques you should use with
data _null_ reporting.
Never "put" without an observation
Never "put" a line without an observation. "if last.subject then put;"?
That's no good ! If that was the last observation for the subject and for
all the data in the input dataset and it occured at the very end of the
report on the very last line then that "put" would cause a page throw and
a new title with column headers being output. So you have a page being
output but you have come to the end of your data in any case. You end up
with a blank page with no data on it. Sure - it's unlikely. But you can
not trust to luck. If it can happen then it
will at some
time and so you can not use code with this weakness in it. So how do you
get round this? By all means, check for "last.subject" but use an end=
flag after your dataset so you know when you have got the the last observation.
If you have specified "end=last" then "if last.subject and not last
then put;" will do the trick. But there are exceptions. If you are
doing a stacked column report then you could be on your very last observation
right at the top of a new page and you need to complete your stacked identifier
values. So suppose there were four of these and you had a variable named
"count" to keep track then you would have to use "if not last or count<5
then put;".
The same applies to your "linesleft" variable. You will probably want
to leave a blank line before your first footnote and will want to start
a new page. Suppose your variable is named "ll" as it often is.
Then "if ll<2 then put _page_;" sounds good. But if it is at
the end of the data step then there may not be another observation after
that so you would have to use "if ll<2 and not last then put _page_;".
But if it is directly after the "set" statement then there must be an observation
otherwise there would not have been that iteration of the data step code
so "if ll<2 then put _page_;" is good.
Now this section may seem very trivial. But you have got to look after
silly details like this. You make sure the little things are right before
you can go on to the more significant things. You have got to start from
the trivial and work your way up to the important. It is like programming
backwards and you have got to do it like that. "Look after the pennies
and the pounds will look after themselves" is an English saying and that
is the way you have to program "data _null_" reports. Never "think big"
and then later fill in the smaller details. Always "think small" and work
your way up when doing "data _null_" reports.
Make your report "linesize" independent
You should not "put" at a specific column, assuming the linesize in effect.
You need to read the linesize in effect into your data _null_ and you should
also have a variable you set to your report width and so work out your
start column from that (startcol=floor((ls-repwidth)/2)+1). You can read
any option you like in the data _null_ step using the getoption() function
so you can get all the information you need. At the first iteration of
the data step (_n_=1) then get and store the linesize value and from the
report width work out the start column for your report and store it. The
code has the algorithm for that. Use this "start column" value to "put"
your values and "put" things not at the start column as an offset from
this value.
Using the "titles" and "footnotes" options on the "file" statement
Use both the "titles" and "footnotes" statement on the file statement even
if you intend to handle the titles and footnotes manually. Read the titles
into a dataset, if required, and then nullify them with a "title1;"
before outputting them but always have that "titles" option present. You
may get strange page throws if you do not. This is a known bug.
Titles can always be recreated from a suitable dataset using the %titlegen
macro if required after you nullified them, so long as the titles are in
a dataset somewhere.
Give your data _null_ titles the "proc print" and "proc report" feel
The more you can make your "data _null_" behave like "proc print" or "proc
report", when it comes to titles, the better. If the "byline" option is
set then put out a dashed "byline". If not, then don't. Does your "by"
variable have a label? If so, then your dashed byline should have the text
of your variable label in it on the left of the equals sign. If the variable
has no label then it should just be the variable name. If that's the way
"proc report" or "proc print" does it then do it that way with your "data
_null". Get it to interface with #byval and #byvar entries if they exist.
Give people the feel that "data _null_" reporting is no different to using
"proc print" or "proc report" and you will have less opposition to your
"data _null_" reports. How you do this is already in those two pages. You
are going to find yourself in a situation whereby you really must
do the report using data _null_ but other people are not comfortable with
it becasue they feel it is "too different" than using "proc report". The
more you can give it the same feel, the more easily you can dip into "data
_null_" when you have to without people worrying.
Watch out for leading spaces
When you "put" a string then be careful of leading spaces. If you "put"
it at a specific column and the string has leading spaces then you will
lose those leading spaces. You should put it at a position compensated
for leading spaces. You do this by offsetting by an extra "length(str)-length(left(str))".
put @;
Don't forget that a "put" statement need not be a commital. If you follow
it with a "@" then you have not written out the line yet. You can use this
technique to "put" out identifying information before you actually "put"
out a data value. It's called a "trailing at". You can use this to ready
identifying information and then you can complete it with "put"ting out
your field or if it is blank then just use a plain "put;" statement to
commit what you have to output.
_file_
You can load information into the file output buffer and manipulate it
there before you write it out. If you check the code examples you will
see that for the "byline" I fill _file_ with dashes up to the length of
linesize and then substring it with the by= value string.
Labelled sections of code
You will have a section to put out your titles or column headers. Nearly
everybody calls it "header" and they specify "header=header" in the file
statement. It is not a reserved word for the label. You can call it anything,
within syntax limitations. But it doesn't have to be the one and only labelled
section in your data step. You can have as many as you wish. The "header"
label will automatically get linked to when a page is thrown so you don't
have to worry about "link"ing to it. But if you have other pieces of logic,
like for handling identifier variable information, then you could always
put this in labelled sections and "link" to it. Don't forget those "return;"
statements, though.
"Stacked" identifier information
Suppose you have identifier information that is being stacked four deep
like age/race/sex/weight. Don't assume you have four observations always
per identifier. You might only have one observation for this group. But
if you only show the one observation then you will still have to show the
full age/race/sex/weight. So you will have to keep a count of where you
are from the start of the group and act if the number of observations was
not enough to complete the group identifying information. Again, the code
will show how I have handled this.
Also consider whether you want to start a new group like this at the
bottom of a page. If there are only two lines left and your identifier
information is displayed over four lines then maybe it is better to throw
a new page before you start. It might be worth your while keeping an observation
count for each stacked group so if you were stacking four deep and you
had only one observation then you could easily put this group in the last
four lines of the page. Since good use of space might be important then
this might be a wise idea.