A Programming Infrastructure for Clinical Reporting
Introduction
This document will attempt to explain what a "programming infrastructure"
is, in the sense of the statistical reporting of clinical trials and why
it is a desirable thing. This is very much the author's opinion rather
than an agreed-upon, industry-held view.
What a "programming infrastructure" means
People normally associate the word "infrastructure" with physical things
such as roads, water supply, a sewage system, drainage, power supply etc.
-- the basic physical systems of a country's or community's population.
This was its original definition but now its definition has spread beyond
that to cover systems. In the case of the statistical reporting of clinical
trials then a more fitting definition would be "an underlying base or foundation
especially for an organization or system" (this definition is taken from
the following link
where the evolution of the meaning of the word is expanded upon).
If we use the definition of "an underlying base or foundation for
a system" then we can think of this infrastructure as being ubiquitous,
omnipresent or, in other words, "present everywhere at once". Just as the
foundations of a building support us wherever we walk on the floors and
no matter where we go in a city we expect there to be running water and
electricity available, and to move anywhere in a city we use the roads
and follow the one-way signs, then a programming infrastructure should
have utilities we can tap into and a basic structure that gives us direction
in any area we work in while doing statistical reporting.
What a programming infrastructure is not is a collection of systems
that are independent of each other, even if this collection is expansive
enough to cover all areas of reporting activity. Large pharmaceutical companies
tend to buy or write a system for doing their "safety" reporting that is
entirely separate from any consideration of "efficacy" reporting. The efficacy
reporting function can not draw upon what is at the foundations of the
safety reporting system in most cases, or if this is possible then it is
not made obvious how to do it, so this is not an example of a programming
infrastructure. If a true programming infrastructure were in place
then both the safety reporting system and the efficacy function would draw
from the same pool of utilities and follow the same methodology. A programming
infrastructure would give cohesion to these two areas so they would have
the same look and feel and work in a similar way and these two functions
should be able to run together if the choice were made to do that.
Components of a programming infrastructure
Here I will attempt to list the components of a programming infrastructure
as it applies to the statistical reporting of clinical trials with emphasis
on what makes it part of an infrastructure. I will assume that there are
four types of programs; safety programs, efficacy programs, "extra analysis"
programs (programs that run on the same data that the safety and efficacy
programs ran on but done at a later date) and ad-hoc reporting programs.
-
Printing capabilities: A method of printing or writing to a print-ready
file (such as a PostScript file) producing correctly formatted reports
with the right layout. This should work in the same way for efficacy reports,
safety reports and "extra analysis" reports and perhaps for ad-hoc reports
as well.
-
Titles and footnotes: A method of storing titles and footnotes separately
from program code so that they can be included into program output. This
should work in the same way for both safety and efficacy reporting and
any "extra analysis" reporting. For ad-hoc reporting you might not want
to mix the titles with the others so these titles and footnotes might be
hard-coded instead.
-
Programming standards: A set of programming standards that is the
same for all types of programs is required.
-
Utility macros: A comprehensive set of utility macros useful in
the field of clinical reporting with good documentation to back it up so
that programmers know what is available to them and what those macros can
do.
-
Standard reporting macros: Both safety reports and efficacy reports
tend to show counts and percentages so what is needed is a utility that
produces counts and percentages that both safety and efficacy programs
can call. The same goes for descriptive statistics so there should be two
macros to handle this that can be called by all types of reporting programs.
These two macros should be able to handle different numbers of treatment
arms without extra coding being put in place.
-
Production scripts: Scripts in whatever appropriate language to
automate the processes
-
Utility scripts: Utility scripts for extra tasks the programmer
might need to do outside the normal programming field
-
Running the programs: The same utility should run every individual
program and give the same set of diagnostics at program conclusion whether
safety, efficacy or ad-hoc reports.
-
Collection of outputs: A utility is required to collect outputs
in the correct order which will work in the same way for both safety and
efficacy programs and any extra analysis programs that run on the same
data as the efficacy and safety programs do. This will likely involve writing
a list of outputs to a log so if an artificial log is created for ad-hoc
report outputs then there should be the possibility of a similar utility
for collecting these ad-hoc reports if required.
-
Running all reports: The same utility should be able to run all
safety reports, all efficacy reports or all of them together if required.
This same utility should be able to run all the post-written "extra analysis"
reports if any exist. Since such a system would probably rely on the titles
and footnotes system to get a list of programs to run then it probably
would not cover ad-hoc reporting.
-
Page number labels: Clinical report output pages are usually numbered
in the form "Page x of Y" so a utility is required to do this that will
work on all types of reporting program text output.
Spectre and the infrastructure
The background to my writing Spectre was that the biostatistics department
I was writing it for intended to take back in-house the reporting function
that was at that time largely farmed out to CROs. It was an attempt to
save on costs as well as improving consistency and reliability. Before
this could be done, there was an awareness that the department lacked an
infrastructure to enable it to make a success of this with limited resources
and so it was decided that an infrastructure needed to be put in place.
Spectre was the solution -- but it is important to focus on the infrastructure
that Spectre embodies. Spectre is more infrastructure than reporting
system. If you are used to other larger reporting systems and you are
wondering why Spectre does not seem as fully developed as the reporting
systems you are used to then the reason for that, that you can hopefully
now see, is that to overdevelop it would be to obscure its infrastructure
which is the whole reason for its existence.
With Spectre putting the infrastructure in place, and with its two powerful
reporting macros as described above, it then became possible to make efficient
the programming function and add speed and consistency. With the two major
reporting macros "validated", a lot of program validation can be avoided,
further improving the efficiency of the programming function. These two
"validated" macros could then lie at the core of other reporting macros
which in turn become validated to cover most reporting needs.
An added advantage of Spectre is that its infrastructure covered being
able to report in different client styles. The different styles are handled
by a "client titles macro". CROs often never try to implement a programming
infrastructure due to each client layout being different, but the Spectre
infrastructure makes this possible.
I know that most people who look at Spectre will see it as a fixed reporting
system that would cost money to amend to their way of working. And those
same people will not have read this page that tries to explain it as an
"infrastructure". But the only thing about Spectre that is fixed are
its key components listed in the "Components of a programming infrastructure"
section above. This is not to say they will never change, indeed it is
likely that they will, but they are essential elements that will always
be there fulfilling that role. They are the building blocks of Spectre
and Spectre is really only that collection of building blocks. The actual
implementation of a Spectre reporting system can be anything you like,
subject to using those building blocks. It's like building a house (or
whatever you like) out of "Lego".
The building blocks don't change but the possible constructions are infinite.
Just a hint, though. If you want to be able to tailor Spectre to fit your
reporting needs, then you will have to learn shell scripting so you can
link its components together. This can be a tough task for those who have
never done shell scripting before but there is a lot on this web site to
help you get up to speed with that. Once you get used to shell scripts
you will discover that they are easy to write and maintain and you will
be able to make Spectre work for you in a variety of situations.
Conclusion
This document has attempted to make clear what is meant by a "programming
infrastructure" and has stressed that an infrastructure is not just a collection
of systems but rather a set of utilities and methods that underlies the
whole field of statistical reporting. It was explained that Spectre is
more an infrastructure than a reporting system, that it is not a fixed
reporting system and is capable of taking on a variety of forms once you
learn to link its components together with shell scripts. The advantages
of having an infrastructure in place were described.
contact the author
What the world says about
reporting and infrastructure
The Fastest
SFTP (SSH)
anywhere,
FREE Go FTP
Program