Bayesian Sample Size & n-of-1 Trials - with Ronan Fitzpatrick & Prof. Stephen Senn

Statsols Head of Statistics & nQuery Lead Researcher, Ronan Fitzpatrick sat down to chat with Professor Stephen Senn about Bayesian Sample Size and n-of-1 Trials.

This video is an excerpt from a feature-length video titled "nQuery Interviews Professor Stephen Senn".

Nothing showing? Click here to accept marketing cookies

The full video interview is available to watch on-demand by clicking here.

Looking for more resources?

About The Speakers

Ronan Fitzpatrick is Head of Statistics at Statsols and the Lead Researcher for nQuery Sample Size Software. He is a guest lecturer for many institutions including the FDA.

Stephen Senn is a statistical consultant for the Pharmaceutical Industry. Stephen has worked as an academic in a statistical capacity at the Luxembourg Institute of Health, University of Glasgow and University College London. A leader of one of the work packages on the EU FP7 IDEAL project for developing treatments in rare diseases, his expertise is in statistical methods for drug development and statistical inference.

Interview Transcript

Ronan: So in terms of Assurance, it's kind of considered to be a Bayesian like statistic in terms of sample size determination. One other kind of school of Bayesian sample size is the approach which integrates utility and cost. This is obviously something which hasn't really seen a lot of practice in clinical trials or elsewhere but I think you have a little bit of interest in the theoretical justification of it and how much it could make an impact and just whether the way it asked the question even could make a big difference to how we even think about sample size.

Stephen: Yes there have been various attempts to incorporate cost into Sample Size Determination they’re saying it’s the big missing thing, it's the thing that we don't ask and it's sort of dealt with informally in the sense that what the statistician does is help the clinical advisor of the life scientists colleagues to explore possible sample sizes and then discovers that their faces are growing longer and longer and longer when the answer is too many. Which is a typical statistical answer and then comes back the answer, well my budget won't stretch to that so one then has to think about adapting what one is trying to do.

People sometimes get terribly shocked if they say well Delta was changed in some way in response to that but it may well be in the particular disease area it has to be recognized that feasibly given the rarity of the disease that we would only succeed in proving something you know reasonably if we find that something has a remarkable effect and so we have to increase the Delta because of that it might be a reasonable thing to do equally well of course it may well be that the Alpha and Beta that we tend to regard as being fixed in somehow a priori have to also be changed.

We have to realize that we cannot achieve the degree of what I call decision precision if you look at most Sample Size formula you will find that the Zed Alpha and the Zed Beta together somehow determine to a certain extent the target for the signal-to-noise ratio and that the decision precision has to be amended and in a recent project I was involved in collaboratively we were looking at rare diseases so that was certainly one of the things we have to think about because simply because the number of patients available is not large.

Ronan: When you look at those there's like the true Bayesian approach to sample size. I think there's a paper from 1995 by Lindley ‘The choice of Sample Size’ where he talks about that kind of decision theoretic approach to it but I think the one thing I took from that paper was just an interesting kind of aside question of should the question be these things are fixed a priori or should we be asking well how much are you willing to pay to get an extra percentage in precision like a .1 decrease or something like that. Is that something that people should be asking more often or like is the current approach where we kind of come with almost like a Platonic ideal of what we want the study to be with this Beta, this Alpha that means we need to be a little bit more flexible as you're mentioning there now.

Stephen: Yeah, maybe that would be interesting, maybe one could consider what's been the effect on sample size if we take a Frequentist frame what's been the effect of sample size of increasing the power from 85 percent to 90 percent except for example people might then discover that just to get that five percent more power there's been a considerable inflation of the sample size so that's one thing we'll consider.

I mean Lindley’s particular approach is perfectly logical provided one has all the information and one is a single decision-maker. Basically it's a double optimization problem, you have to know that for any given fixed sample size for each possible outcome of the results, what the probability of that outcome is and what the optimal decision is if you see that outcome. Given the optimal decision what the loss is given that outcome. So, you then know exactly what the optimal course of action is for each possible sample size and you also know what the expected value for each sample size is because you have all the possible outcomes and you know their probability and you can average over that.

Then what you have is also have the cost of each possible sample size. and now you have the second optimization step in which you have to optimize over all the optimal decisions to decide which sample size would actually have the optimum of the optimum.

So that is essentially how it proceeds. It's understandably not being particularly popular partly because it's so difficult, not just difficult in terms of calculation, some smart Bayesian will have somewhere solved it and given the algorithm in R I'm sure or whatever it would be possible to do it but also because it's simply so difficult to get all the inputs that are necessary and also because it's a single user approach.

So you find some other approaches for example Gittens and Pezeshk have looked at a Bayesian statistician, let's say,trying to satisfy a frequentist regulator how should this Bayesian statistician behave as regards sample size in order to optimize the revenue to the company.

Ronan: So one other thing you've talked about in a recent paper on n-of-1 trials is that different studies have different objectives like they have different ideas of what exactly they're trying to prove some are trying to make predictions, some are trying to establish the efficacy so I think one of the big contrasts there is between what might be a kind of traditional objective which is making an inference versus try and actually find inference for a future group versus where personalized medicine, which is obviously very relevant for n-of-1 trials, you’re trying to make predictions for future individuals and the big difference.

I ended up making in that paper in terms of what your overall sample size required. So do you want to talk about the insights that you got specifically from interacting with personalized medicine as an area and doing sample size problems for a trial that's being focused on that area.

Stephen: Yeah I think that people and statisticians tend to forget that there are a number of different basic inferential models. Under certain circumstances they don't have much of an effect but on some they do and one is whether one thinks of the particular population from which the clinical trial is somehow drawn as being consisting of a population of randomization, so we imagine we have exactly the same patients and exactly the same circumstances but simply we could have allocated them differently to different treatments what would the results have looked like.

So one way of exploring this is to think of the permutation distribution for example in particular test statistic when this is the case. Another one is to think of them as being somehow drawn from a normal distribution, this is very typically done in simulations and lots of linear models sort of implicitly assumed this and then you can sort of regard them as being random realizations of some particular hyper population. What exactly this population is, is never expressed but people sometimes say ‘oh it's the population of patients or possible patients’ and then maybe they say ‘well actually all possible patients are satisfying the inclusion criteria.’

Of course this is not particularly logical because the population of patients satisfying the inclusion criteria could very easily be very very different from the sample of patients satisfying inclusion criteria but nevertheless we sort of carry out clinical research trying to say something about future patients. So one could describe these purposes maybe as a shorthand would be the causal, did the treatment work for the patients we actually studied. We fixed the patient's we consider all possible permutations and we answer that. Also, how would it work in future for future patients.

Now it turns out that if certain components of variation are assumed to be zero that two analyses are exactly the same, so we can in fact use the causal results that we've found and the uncertainties associated with them for the group of patients we studied to say something about the future effective treatment but for other circumstances that's not the case and this is also reflected in other fields for instance, it's part of the endless debate between fixed and random effects metra analyses often described as being due to where there's heterogeneity and in my opinion this is quite, quite wrong.

The situation is that if the heterogeneity is zero the two analyses give very similar answers but they're still serving a different purpose and that is something which is not sufficiently understood so this is maybe something we should think about a bit more in clinical trials. I'm both a cynic in an optimist on this. I'm a cynic in the sense that I don't actually believe that the random effect models that we use, capture all of the necessary components of variation.

There are so many different ways and it's unreasonable to imagine that we can somehow sample from the future which is effectively what we're having to assume. On the other hand I'm an optimist in the sense that I sometimes feel people make perfect the enemy of the good actually just proving that treatments work at all is so hard and life is so short that maybe it's enough to prove that the treatment worked for the patients in the clinical trial for us to consider well in that case it's worth using in practice.

So it's simply because there are other things we should be looking at rather than yet finding yet another proof that somehow it works in this subgroup or that subgroup, we should say well resources we’re spending there are actually competing with other drugs in the pipeline which might actually render these results historic anyway. You know if you look at the way in which proton pump inhibitors came along and improved upon the previous generation of h2 antagonists and you know etc.

Ronan: I suppose with n of 1 studies in general or specifically, obviously that's an idea that's related to personalized medicine which I assume is how you ended up doing stuff on this and trying to improve practice there so in terms of the overall culture around personalized medicine or the ambition of personalized medicine how is your feeling now that you've kind of been involved in that area of the promise of that area going forward.

Stephen: Sorry I really didn't answer your question about n of 1 trials. N of 1 trials are useful if you want to establish, to what extent, the response to treatments that you observe varying amongst patients genuinely varies and not just randomly, not just in a way which is just to do with the fact that patients are not the same from day to day. So they are a principled way of doing that and potentially extremely powerful but unfortunately only applicable for diseases which are chronic and fairly stable because, for example, you can't use them for survival analysis because we can't resurrect individuals and say whether they would have lived longer on the alternative treatment it's just not possible.

We can't use them for infectious diseases we can't say I've cured your sexually transmitted disease, go out and infect yourself again so I can try a second time. These are not options available to us but they're a principal way of doing that and I think in some areas would be useful because they would tell us what the scope is for personalized medicine if we see that the components of variation is large there is an appreciable difference in the way in which patients respond to treatment then maybe we could dig deeper and see if we can find some way in which this can be predicted and maybe develop treatments that work better for certain subgroups.

If we don't see much variation then maybe we say let's find another field, let's find somewhere else to see whether we can improve existing treatments by personalization.

Ronan: A lot of fieldwork that they see needs to be done to answer that fundamental question of how much variation is really due to any individuals rather than shared across us.

Stephen: Yes I have to say that I have done a lot of practical work on clinical trials but most of my work on n-of-1 trials has been rather theoretical, it's been difficult to get involved in the area and it's been difficult to get them started. As I often say to statisticians, if you're flying in a plane and somebody on the tannoy says is there a doctor on the plane just because you've got a PhD is not a reason to answer. They're not looking for someone like you, so statisticians don't actually do clinical trials they don't treat patients and so all of these ideas have to be tested against the reality.

There are practical realities to overcome of treating a patient a number of times and switching the treatment that involves quite a long period of observation, it involves willingness for the patients to record their symptoms for quite a long amount of time, it means bringing them back into the clinic get the second course, third course, fourth course maybe fifth or sixth course of treatment that they're going to get as part of the design. These things have to be tested to see whether they can in fact be made in reality.