[Copyright © January 2008, Roland Rashleigh-Berry. This document may not be reproduced in any form without permission from the author]
[You are expressly forbidden by the copyright holder to make a reproduction of any part of this page (beyond legally-permissable purposes)]
This page represents my current thinking on this topic. It may be subject to change.
Where your clinical reporting function exists as a separate programming function and statistical function and it becomes organised around these separate functions then a methodology and approach emerges that moulds each function in a different way. When it has reached its logical conclusion it emerges at the other end as "statisticians are great thinkers whereas programmers are great doers". Note that both functions are working towards the same goal of reporting clinical studies but their approaches to achieve this same goal emerge almost diametrically opposed.
In a team of mixed programmers and statisticians dedicated to a single drug or indication, that drug or indication becomes the "common ground" that binds people together. On the contrary, where you have separate functions then you are left with very little common ground. Where you have common ground, that is the basis of mutual understanding and generally "getting along" with each other. The lack of common ground between individuals or groups leads to division. And where two divided groups are required to work together then this can lead to distrust and misunderstanding. Frustration can build up, as a result, and can form an entrenched "us versus them" attitude. This is something that needs to be avoided.
The programmer's problem is that they have to produce many tables and listings within demanding timescales. In order to plan for this and achieve it they must know well in advance exactly what is required and exactly how this will be achieved. In order to make achieving the deadlines possible, the programming function will have aligned itself around standard software routines to do the bulk of the analysis and reporting. Although it might save a huge amount of time, this will be at the expense of flexibility. As deadlines approach, special processing or late changes to processing may not be possible within the timescales.
From the statistician's point of view, the analysis of the results must be correct. Although a method for this has been decided on, the exact details of how to do it might have to change at a late stage. The exact form of a test to be used might be dependent on the distribution of the data and this may have to wait until all the data has been collected. The statistician can not agree to an unchanging methodology and be held to the exact methods to be used at an early stage in the project whereas the programmer needs exactly that to plan the programming tasks and ensure that deadlines can be met. Here we have a clear conflict in approach that is sometimes unavoidable and in an area, using statistical procedures, that the programmers may not have much familiarity of.
The statistician will likely have no appreciation of the standard software the programmers have to use to save time and will not appreciate the lack of flexibility this can introduce. Enhancing standard software to add functionality may sound conceptually simple but to implement this requires the programmers to raise change requests to whoever maintains the standard software. The maintainers too have to raise paperwork such as a risk assessment and a business case that has to be approved by higher management before they can start on this task. Being able to change the software depends on resources available and the software, if changed, will have to undergo testing and validation before it can be released back into production. The whole process can stretch out to the extent that what can conceptually be achieved in minutes can take as many months and so is unlikely to be achieved during the lifetime of a study.
Programmers might be "validating" their outputs and logging the results into a recording system of some sort that says what programs have been written and who has validated it, so small late changes to tables can result not only in double programming but will likely have an administrative overhead as well. Every small late change causes a chain of events at the programming end that can be costly on resources and can threaten deadlines.
The programming administrative overhead should not be underestimated. Using the standard software, producing outputs using an interactive sas session can be done in seconds. However, doing this "officially" for batch runs requires that each program be created in a standard way with a standard name along with a program header containing standard information -- all in accordance with the SOPs (Standard Operating Procedures). These SOPs have to be rigidly followed for regulatory purposes. For each program written as described, a "validation programmer" might be required and assigned to the task. This all has to be co-ordinated and the usual way to do this is using a "tracking spreadsheet" and these details such as program change date, program author or maintainer, version number, validation programmer, validation program date and validation programming "findings" must all be entered in this spreadsheet. If the study is close to its delivery deadline then programmers might be working intensely and updating the spreadsheet often. Programmers then have to contact each other and ask other programmers to "unlock" the spreadsheet so that another programmer can update it. It might be messy and frustrating and cause delays, but if this is part of the SOPs then it must be followed. There is no option to "opt out".
If the programming function and statistical functions are entirely separate such that statisticians do no programming, then what can seem a simple and obvious task for a statistician might, to a programmer, seem a near insurmountable task. The most common of these will be where a programmer expected to code a result from a free-form text field. The result will be obvious at a glance to a statistician. They can read any and all of these entries in a free-form text field and the result will be immediately obvious. They will be immediately obvious to a programmer as well. The trouble is, to program this in a generically reliable way such that it is maintainable and workable is generally impossible from a practical programming point of view. To fix this then one option is to program for that study and that data and fix it so that the free text gets coded into the correct result. But that amounts to "introducing data". In other words, by using "fix" code in that way, the programmer ends up "deciding" what a result should be and in so deciding they are "supplying data". A programmer is not allowed to do this. The data has to come from the trial. If decisions as to results are made, they must be decided by the trial investigater and be recorded. They must be part of the trial data.
As I think of them I will add them but here is my current list of typical "gripes" that programmers and statisticians can have against each other. In reading them, what is important is understanding how they occur from the situation I have described previously. Only if you understand how the differences in approach lead to the gripes listed below can you attempt to resolve these problems.
Statistician gripes against programmers
- Why does it take so long?
- Why can't your standard software be changed to give me what I want?
- I didn't want that. Why have you done it that way?
- We need to make a few changes. It shouldn't take more than a few minutes.
Programmer gripes against statisticians
- Last minute changes
- Constant changes in requirements
- Constant protocol changes
- Unclear specifications
- Late delivery of study table shells
- Additional late requests
And now we come to the resolution of the conflict. There is very little that the programming function can proactively do to solve this conflict, though a few recommendations will be given. They are the people who undertake to produce the results on time and within budget. For that to be possible, they have to know in advance what is needed and to plan the resources. It is the statisticians who are the customers. The programmers are the service providers to the statisticians. Understanding and accepting this division into the two roles is important. A lack of recognition of this leads to wrong solutions to this conflict. I note that some large pharmaceutical companies give courses to programmers on basic statistical analysis. I think this is a good idea, it is good to know what you are doing and why and this helps the programmers ask the right questions when they need to, but if the hope is to adddress and remove this conflict then that method will not work. It should be obvious that a few programmers have advanced qualifications in statistics and yet the conflict affects them as much as those without qualification in statistics. You can educate the programmers in statistics but it is not they who decide what statistical methods should be used to analyse clinical trials and nor should they be encouraged to or allowed to. Neither is it for them to interpret their tasks using their own statistical knowledge. It is not their job.
As just stated in this section, it is the statisticians who are the customers and the programmers who are the service providers. This does not mean that the programmers are the servants of the statisticians. Rather, the programmers are like tradesmen. Imagine you were asking a tradesman to build a garage for you. The tradesman would need to know what was required in advance so that they could give you a price and plan their time. Late changes to the garage specification could be very difficult for the tradesman to achieve as not only could it cost more -- it may not be possible given the existing methods and materials. The tradesman's time beyond the agreed delivery date might already be planned for other work. It is up to the customer to decide what they want in detail well in advance and to stick with what they have decided once the decision is made. Some late changes might be possible -- others not. It might help if the customer was aware of what were easy changes and what not, if some decisions could only be made at a later date.
Just as the customer and the tradesman would have a written contract between them, I have suggested that the programming function and statistical function have a written contract between them as well that defines their responsibilities to each other.
Having stated that the statisticians are the customers and the programmers the service providers then the following are my recommendations for resolving the "statisticians vs. programmers" conflict. The first set of recommendations apply to the statisticians as customers.
Go back to the home page.
E-mail the macro and web site author.