- Clinical Trial Solutions
- Learning Center
Get sample size updates by email
Receive great industry news once a week in your inbox
Get sample size updates by email
Receive great industry news once a week in your inbox
In this webinar, we review common survival analysis models & evaluate the impact of delayed effect models and adaptive design.
Oncology trials are among the most common and complex clinical trials. Survival analysis is the most common statistical area of interest for oncology trials. Recent years have seen growing interest in two areas regarding oncology trials: delayed effects and adaptive designs.
Immunotherapies have commonly had delayed effects not traditionally seen in oncology trials. This has led to a suite of proposed models that can flexibly deal with these delayed effects.
Adaptive designs are growing generally in clinical trials but survival analysis presents unique considerations when doing principled adaptive designs, whether that be group sequential design, sample size re-estimation or MAMS.
In this webinar, we will examine the role of delayed effects and adaptive design in oncology trials.
We give an in-depth look at sample size determination in oncology trials and consider the effect of these additional considerations.
Looking for more resources?
Transcripts of webinar
*Please note, this is auto-generated, some spelling and grammatical differences may occur*
So, hello, and welcome to today's webinar - Sample Size for Oncology Trials: Survival Analysis, Delayed Effects, and Adaptive Desig...My name is Ronan Fitzpatrick, I’m the Head of statistics here, at Statsols and nQuery Lead Researcher.
So in terms of what we'll be covering in today's webinar, the first will just be an introduction to oncology on the specific challenges of survival sample size.
And then we'll be looking at kind of two, I suppose, more specialized areas in, in sample sites for survival.
Namely, the issue of non-proportional hazards, particularly, due to the increased interest in this area, as immunotherapies, have had this so-called delayed effect.
And then adaptive designs for survival analysis, specifically focusing on group sequential design and sample size re-estimation. And then we'll have some conclusions of discussion.
Before we get started, obviously, this webinar is demonstrated using nQuery and sponsored by nQuery. Increase, obviously, one of the world's leading sample size.
Plots are on trial design software, used by over 90% of companies who had trials reach Phase three clinical trials.
Enlarged, 2020 is in class, I'm yeah, that's been checked. Then. You can see here a variety of different options available for early phase, confirmatory trials.
And you can see here the statistic about how widely inquiries used and some of the reviews that we've had over the years.
So, moving straight into Part one. Oncology on survival sample size, I suppose. Before we can talk about the sample size aspect, we need to talk about oncology clinical trials as, well, is that kind of implicit question from the title of this section that, Well, why are you talking about survival analysis as well?
Oncology trials, we know are one of the most common types of trial, about 30% of total trials, I believe, are related to oncology.
Now, I believe that statistic is slightly smaller, typically, for phase three trials, which I suppose reflects the wide variety of different proposed treatments put forward for oncology at early stage. But, you know, this kind of reflects the fact that oncology to study of cancer is such a huge area as populations of continue to age, particularly in the developed world. What we've seen is a cancer becomes one of the most common end of life, situations, end of life, diseases, and obviously, one of the most severe diseases on, most of, suppose the most oh, you know, I will say awful, but, you know, difficult diseases to deal with.
Of course, which also reflects the fact, the cancer as a category is actually a much wider range of diseases when you get under the hood, when you look at the genetics, and you look at the specific elements of each type of cancer. But, you know, cancer as a whole is obviously an area which we, you know, a huge amount of money has been invested that the governmental and pharmaceutical companies sponsor levels to try and find solutions to help ameliorate or, hopefully, at some stage. Here are some of these cancers over time.
The important point for us is that, when we're talking about oncology phase three trials, so, our confirmatory trials, the ones where we're recruiting thousands of people, survival or time to event modeling is by far the most common approach, you know, constituting you know nearly 60% of the total trials at this stage.
And this makes sense, of course, because, you know, oncology cancer, this is a case where we're looking at severe outcomes, namely, death.
And, therefore, having survival analysis as a key component to analyze and decide whether our treatment is effective, makes a lot of sense, not proportions.
I would just say like binary proportions like yes slash no type categories are, It's also quite common around 30% and in early stage designs. including in some of the methods for early stage phase to survival designs that are available to create a linen she model. Basically, the proportion based analysis will be more common.
But a phase three, we're really usually talking about survival analysis. And in many ways, the rest of this webinar reflects what we've done in previous survival webinars. So, if there's anything you I end up covering in detail here, it's probably been covered somewhat, in one of those previous survival webinars.
So, in terms of like, what the general measures are, well, I did the, the gold standard measure unsurprisingly, is overall survival. Or, less, basically, Did you survive, or not, or you're comparing how people survived in one group versus the other book. Progression free. Survival, basically.
Where obviously there's no deterioration in your condition, like, there's no like, the tumors are reduced in size, That is obviously a very common one common endpoint to use as well, because obviously that will tend to happen sooner compared to OS.
And so, you know, there is, obviously, a lot of interests and models, that kind of model, these jointly where, for example, progression Phase five or maybe your intermediate endpoint, But overall survival is your final endpoint. So, for example, it's not part of the discussion today.
But when we're talking about multi arm, multi-stage designs, there's a set of designs from royston, where the choice of which arms to drop in an oncology trial will be based on. Say, something like PFS Portend, the decision at the end of the trial about which ones are most successful will be based on OS. And, so, that's a ticket, I, dash D, design.
From royston, for multi multi stage time. So, you know, this idea, And then the group social designs, which model? These jointly are, also have a lot of interest recently, I won't go into those today, but if that's an area that's of interest to you, that's very useful for us to know in terms of our development priorities for nQuery, But also, just in general, in terms of future webinar topics.
So, in terms of sample size for survival analysis, where like, you know, like I suppose first off, you know sample size determination, this is usually the process of defining, how many people do we need in our study to have a suitable power or probability of success. Success in this case. Meaning basically in basic terms a significant P value, given some assumptions about what we expect, the effect size. And then either nuisance parameters, a B, and obviously some fixed parameters like our, our significance level.
And, you know, survival analysis, obviously, usually is related to the most common models, which are log rank, test, the regression model. But I think the big point, just to make about sample size respond more specifically, it's a power is related to the number of events, not the sample size. So the sample size is really what we're doing the calculations today, and when you're looking at sample size for survival in general. But really talking about sample sizes. Here's our best guess for highly HIPAA.
We think we need to achieve the number of events.
But it's really, It's important to note, if the number of events that's driving the analysis, the number of events that will define, you know, the, the P value, and, therefore, it's the number of events that we're really targeting.
So, if things turned out better than expected, in the sense, maybe we recruited quicker than we expected, or maybe on the, on the less optimistic side, perhaps, of events were coming in quicker than expected, then that's a situation where, you know, the events might happen sooner than you expected. And, if you are doing an interim analysis or as usual, the stuff, the trial at some specific number events, then you can do that. What am, I happened earlier, than you might have expected, and not what the required sample that you initially had.
Now, in a clinical trial context, you may be asked to continue on to the total number of sample sizes being recruited, particularly if that cohort already exists in the trial right and right rather than just right censoring them there. And then.
But basically, the important point to take the events that we're following, not, the sample size sample size, is really a meta model, sample size calculations above the Events. And, I think, the fact that it's a med problems. Why we have so much flexibility in sample size determination because you can kind of play around with this.
Because it's not really affecting the underlying hypothesis part of the problem, It's really just data, with a more practical problem of how people do we have, do we need to get the number events that we do require based on the statistical hypothesis part of our sample size determination? And, of course, that also reflects that. Survival analysis methods are quite complex in the first place, As you can kind of see in this slide here, looking at just a small selection of some of the considerations that you would have. If you are designing a survival analysis trial, you know, from what you expect the survival curves look like, and, like, which tests are appropriate.
You know, how do you deal with unequal follow up, at the planning stage, you know, like, how long do you think the trial is going to be. And, you know, are we going to follow up everyone until the end of the study, or will we stop follow-up for subjects after a fixed period of time? You know, dropout censoring, crossover. These are all problematic aspects, but which can be integrated in my experience more directly in the survival sample size calculation, because of that meta model.
Basically process, compared to other calculations, where typically you're using an ad hoc type of calculation. Like a one minus, B here, your N divided by one minus your proportion dropouts. and yeah, so your stuff like crossing over and stuff, how are you going to deal with that?
Like, to be fair at the sample, if we're talking just about sample size calculations, like some of these more in-depth considerations, like the particular stuff like crossover and like the effect of the estimated in particular to do that.
No, I think, as time goes on, I'm sure those considerations will be, you know, there'll be talk about them. But I have I have an imagination that, you know.
When you introduce these kind of additional considerations, you have to make additional guesses about what your take is going to happen. And, you know, maybe from the perspective of a sensitivity analysis, it will be useful to know what that effect is.
But for your core sample size calculation that you put in your paper Or your protocol, I'm not sure that these considerations will end up typically being used for that purpose. I think it'd be more for internal evaluation purposes, you might look at stuff like crossover and drop in and so on.
OK, So that's kind of an introduction to the Sample Size area So what I'm gonna do now is take a real-world example from New England Journal of Medicine of ... for advanced pancreatic neuroendocrine tumors. Obviously, we're looking at advanced pancreatic cancer here and we have the real sample size determination statement that they use in the study. And that's been, you know, summarize on the right here in tabular format.
You can see that, obviously, we're dealing with Advanced Cancer here, so the median survival is that we're talking about here are quite short, like we're talking about six months under the placebo, and then only nine months for our treatment group using a ....
So obviously, we're dealing here with a case of people who are very, you know, very advanced like advanced unproblematic stage of their of their cancer diagnosis, and, of course, pancreatic cancer.
We know is one of the more severe types of cancer and difficult ones to treat. So, obviously, the hole pairs to, you know, obviously lead to more overall survival for people. But, obviously, I think, to some extent, it's also by sending people to lifetime, hopefully, improving their quality of life, as well, at this point.
And, so, we can see in this particular calculation, that we have those meetings survivals, and that they have a, they have a trial, here, that's going to go on for about, um, about two and a big years. So, obviously, that's an months, by 20, you know, what, we're talking about weeks here, of 74 weeks, Minimum, and targeted.
So, I started 74 weeks for the cruel period.
And then, the minimum follow-up period of around 39 weeks, in total, that's around two years, is about 28 months, effectively, OK. And so, what I'm gonna do now is kind of replicate this example upfront and then show some of the additional flexibility that you could have added, if you want to do a sensitivity analysis, for example, to see the effect of different assumptions within an nQuery. But obviously, that's helpful to holds for other sample size tools that you might have, for example, during open our or SaaS.
So for anyone who's not familiar with a query nQuery, the main calculations will happen in the table in the top left-hand corner is it's kind of where the action happens. The names on the left-hand side are the required inputs for this calculation, Each column is an independent independent calculation and then the yellow arrows are the solver rows, these are ones that, we can solve for given all of the required information as being provided. Gray just means, that something is read only.
You can see it as we select Rose, there's kind of help cards on the right hand here to give you context what it says on a law that says this table, we have something called a side table, which is really just a helper function that will allow us later on to convert the median survival so that we got in the original statement into the exponential parameters that you can see are acquired in the, this table in these two rows.
And that's a relatively easy process to do an nQuery using this particular two.
OK, so let's replicate that example first, just to ensure that we have a good baseline, that we, we can do the calculation that they proposed. So, they had a significance level of zero point zero two five at the one sided level.
Now, you notice that the the cruel period was in weeks, boss. I think it's important to note that in any kind of survival analysis and in similar areas, like if you're looking at models like Poisson, or Negative binomial, you want to ensure that all the time dependent units are done on the same time unit scale. Which is to say that if you're median survival, it's a month, then you should have your accrual period of months. or vice versa if you wanted to have them both of weeks or boating years or whatever your particular preferences, they only important thing is that they are on the same time unit scale. So we're gonna stick with the month scale that we use for the median survival. So that means that we need to convert that accrual period. And I follow up period into months as well. I'm going to do that simply by dividing them, bye for. Now I can do, I can get the, hit the calculate arise by selecting out from the assistance, many there.
But for now, let's just, obviously, I've done this calculation before, so if you take 70 former divided up by four, we get 18.5.
And then if I was at 74 and add 39 and divide that by four, I'd get 28.25.
So let's put it in that accrual period of 18.5. And the maximum I'd like to follow up with 28.25.
So what this is saying is that for the first 18.5 months, we'll be recruiting subjects. And in this table, we're assuming uniform accrual. So we're getting an equal amount of people per month, effectively. But then for the last, basically 10 months or so, from 18.5 to 28.25 months, we'll have stopped recruiting.
But we'll be continuing to follow up with the subjects that are still, have not had the event at that point in time. Meaning that, in theory, someone who was recruited at the very started, a trial could have, you know, a follow up time of £28.25, but someone recruited at the end of the trial. of the. accrual paired, my has a maximum follow up potential of only around 10 months. So, that's where you need to adjust for that, Obviously, the deal with the fact that, you know, different, people are having a different chance of, of having the event of dying. Because, obviously, you know, if you've been in the trial longer, you're more likely than the event. So, that's why they needed, this kind of becomes useful. So, then we have this specified here, and then we have our exponential parameters. So we are assuming here to survival curves that are exponential, or not use an exponential parametric model. We're using a log rank test.
But we're just assuming that the distribution of the two curves is exponential. And so if we go to our side table down here and enter on median survival, it's in months of 9 and 6 months, respectively, for, for the first and second group.
You will see then that we get the exponential parameters of about zero point zero seven seven zero point 1 1 6 respectively, giving a hazard ratio of around zero point six six hundred sixty six, basically to over tree.
Or, in this case, six divided by nine.
So if we click Transfer it, that will move up to the main table.
Then all we need to do is enter our power of 92.6.
And then we'll get 176 per group, 282 events required. But if we quickly go back to our slide, you'll see we had a total of 352 patients, Which, if we divide by two, is 176, and 282 events, as given in the original statement. So, we've now replicate that example. And there's 392. Number here is really just the 352 divided by zero point nine.
If you wanted to drop by calculation here, I suppose just one thing to notice is you'll notice the first paragraph where the events calculation is Doesn't include any of the information. about a cruel and stuff like that. So, if we were to show why that's important. Well, in this particular table you're using you have the option to use the Show and felt Proclamation. Freed when it was one of the, I hate to schoenfeld approximation for the number of events calculation, independent of the sample size calculation. And, if you go to the original paper, this is based on, which is looking at folks paper. You know, you'll have two terms. On the left hand side. You will see the calculation, that includes the alpha term and the hazard ratio and the beta term like the significance level on the power terms and the, the inverse cumulative test statistics for each of those.
So, that's kind of, you know, the events calculation. Now, on the right, you'll have this probability of the event section like in brackets, and that's where all of these other parameters come in to kind of give you will have people do. we need to get that number of events.
And, to illustrate that, we can just very quickly show that if we were to enter only the significance level, the hazard ratio and the power, we would get a total required number of events of 282. Now, note that this isn't true in every single table nine query in some tables. Such as, for example, this table STT six. The sample size and the events calculations are much more closely intertwined and therefore you can't do the data. This is based on, for example, the Lakatos method which uses a non stationary Markov process. So you can't you can't do the say you, you start with the sample size and then are iterating over that.
And kind of assigning people to different states from the initial state of hopes have been available, or, you know, not having an event or dropped out. And so in that case, you're starting from the sample size and the events can't be done independently.
But you can see here for example, the deaths row, this, this column which replicates this first example.
I'm not gonna go into too much detail because of time constraints and the more interesting topics and then we're gonna be looking at a later on. But, you can see, here, that we get the same sample size, and that the number of events, is around the same, that is actually 285, if you add these two numbers together, bought offsets only tree after 292, from the original calculation.
And, I think, on average, I think simulations have shown that this lakatos approaches a bit better, but I do know that some regulators prefer the Lockheed and folks approach perhaps because it's it's more it's easier to verify its veracity.
For example, there are kind of free parameters in this calculation that exist that might lead to slight variation, similar, to example, the effect of, you know, using simulation versus an analytic formula.
And, you can see here that I have also briefly shown in this table that there are alternative linear rank tests available and nQuery for sample size calculations, namely the ..., breslow. And you can see that for, if you were to deal with non compliance, that we could see that the log rank test will become less efficient on. Obviously, if you require less sample size, then you're more efficient. Well, as non compliance goes up here, you can see that the log rank test starts losing to these are the ones. And that's mainly because these other two tend to wait earlier events more. And so, in the case here, what's happening is that non compliance means lots of people are leaving the treatment group and go into the control group, and therefore, there's less people available as time goes on. Which I'm for it.
Which unfortunately means that the log rank test, which is still trying to find equally know, that the set that has the effect of those later time periods where that effect has got very strong, is weaker compared to these ones, which are kind of already don .... They've already done most of the work to kind of use an analogy there.
So hopefully that should give you an idea of kind of your basic calculations. But of course, you know, there are a lot of assumptions on this table on in this table, namely that you have these constant event rates. You have these constant, and you have this uniform accrual, and of course, you know, it is worth noting. The nQuery provides a lot of options for dealing with situations that are like that. For example, we have, this table has to 220 to 624, whereas the 2.22 allows you to deal with the sample size calculation, Not from the effective of knowing a priore, the accrual period.
And the follow-up period of broader calculating, what the accrual and follow up period should be based on some fixed, some fixed race of accrual effectively.
So for example, you can see in this table that the row here is the accrual rates is unallowable recruiting per time, and then we can calculate the accrual period based on that information in this piece wise one. I'm going to be using example of this piecewise one for a very similar table. So if you're, if you're not sure here, just the important thing, it's like, if you're interested in dealing, if you have an accrual rate instead of an accrual period.
So you know you're going recruiting 100 people per month, rather than knowing that you're gonna be recruiting people for you, 12 months, Then there it's tables available for that. And I'm happy to go through this if people are interested in, if you can contact via e-mail. And then there's a Piecewise one here, it's quite similar. But I suppose just to say, you know, there are simulation tools available nQuery for survival, And if you wanted to see, for example, you know, if we wanted to verify by your simulation, our original example, we could do that as follows. So we could create a simulation where we had two time periods. And in this case, we'll say they go from the first period for the accrual.
And then the accrual of 100% in 18.5 in this period, then we just need to get our hazard rates from the original table. We can just copy and paste. So it's across.
You can use the flex filler arrived, and we can deal with drop out here. and then, you can set a number of simulations. Or, let's just say it ties, and just to be quick, and then 176, then, we should get power of around 92.6. And if we increase, the number of simulations will probably get something that's closer to the original. I'm just doing this briefly here, because we will be returning to this table to show off the effect of non proportional hazards. If we were to use the original log rank test. But I suppose I just want to show it off. here. You heat. This first column is just kind of pre specifying what's happening before the trial started. This column is like for the first 18.5 months, basically the original accrual period that we specified earlier, we're saying that 100% of total accrual happens Here. Again, we're assuming uniform here, but obviously, if we split this into, you know, 3 or 4 parts, we could have, you know, non uniform accrual by assigning a different percentage of the accrual non proportionately to each of those sections. I've done that in previous survival webinars.
So, if you're interested in that, I can touch it, will be happy to share an example of that, and then, we, you know, have this constant hazard rate. Each of these are giving these expected survival percentages in each group.
And so, that, hopefully, gives you a kind of whirlwind tour of introduction to, kind of, basic, sample size calculations. For the Log Rank test.
Now, there are plenty of tables, also available for the *** regression model, ranging from, kind of ones that deal with the classic two sample case to a table, like oral T six, which deals like deals with it, like a regression problem. Where you have a, you have your log hazard ratio, your beta, your, your, your beta term. You're so, you know, you can deal with that that case, as well. But given the time constraints, I'm not gonna go over that date once again, if you're interested, I can touch, will be happy to share examples.
So hopefully you have now a good grasp on sample size calculations using survival and obviously we've used the example of a treatment for pancreatic cancer. So obviously we're dealing with the oncology area. And obviously these kind of two areas of survival analysis and ecology are very tightly intertwined, particularly as I said, in phase three clinical trials. But, I suppose, you know, both the cops regression model and the log rank test, have traditionally relied on using the hazard ratio, And, in particular, on the assumption that the hazard ratio represents a constant hazard ratio, basically, that we have proportional hazards bought.
As we're about to see in a moment, there has been a lot of work, an interest, in dealing with studies where that proportional hazards assumption breaks down.
So, what, why it's a problem? Well, I suppose the problem here is that, if you ignore the fact that you have non proportional hazards, you can get misleading results. And, you know, there's a lot of debate about this, which I've already go into today.
Because, know, I've Redis and I've done not necessarily someone who is falling either way at the moment. But effectively, you know, if the hazard ratio is not constant, then the hazard ratio could be misleading analysis of how effective treatment is. And you know, you may end up saying it's worse than it is or better than is based on that now. That's not to say that the hazard ratio from doing a standard Log Rank test or *** regression model, when in the presence of non porous knowledge is illegitimate, many would argue that it actually represents what you've called, the average Hazard ratio. And that that does have some clinical meaning. Meaningfulness bought some, have argued that, you know, it would be better to model what you believe, the non proportional hazards.
You know, structure will be, there's many different structures exist. So we'd see on the right-hand side here, we have like delay treatment effects where the effect of our treatment doesn't kick in until a certain amount of time has passed. We can see a crossing treatment effect where something that starts off better gets worse by the end of our trial. And then we can have a diminishing treatment effect where, you know, initially, we're doing quite well. But by the end, they converge to basically be more or less equivalent.
And so, no, this is a situation where ideally, you know, we, if we knew this was going to happen beforehand, or that it was a possibility that we would have a analysis strategy, that was robust cities, or more robust as well as in the advertiser. The average hazard ratio. From the standard analysis, that better reflects our scientific understanding of why these effects exist.
Because there's a lot of different proposals for why this happened. Different causes won't. Like, for example, the daily treatment effect, the most obvious ideas, just thought, well, our treatment takes a little bit longer to get going down the standard treatment. So for example, this is the big thing in immunotherapies, where you know what we've what they've seen as these a lot of these huge delayed effects way of thinking like, you know, a few months for the effect actually emerge at all for this amino therapies. And of course, I can add that to some sense makes sense.
Like immunotherapies are more targeted intervention unless I suppose aggressive intervention than traditional oncology treatments, such as chemotherapy, and radiotherapy, obviously, you know, your body kind of knows very quickly when chemotherapy is happening for, for, for, for very obvious reasons, whereas immunotherapies much like perhaps more traditional drugs seen in other areas may not be expected to have that same sense of, you know, immediately taking over and having a huge effect On boats know. That is one hypothesis that just takes longer, but they're also people who have talked about the fact that, you know, you could get the exact same or very similar type of trajectory. Where you have this initial delayed effect. If you have a situation where the immunotherapy is targeted and is effective for certain types of people. Certain responders, basically due to their genetic profile. For example, and button, there are some who are non responders. That actually effectively creates something that looks like a delayed effect. And the problem is that if that's, you know, if there's a responder versus non responder hypothesis versus delayed effect that would actually have an effect. Them walk analysis models actually makes sense to deal with you Know, for example, like a frailty type of approach might make more sense. in the latter case. Where you have this on non covariant, like, gee, you know, genetic profile or something like that.
Or you may discover you may want to model the responder covariates perhaps if you have some genetic markers that you believe are good candidates for why you have responded to non respondents. You may want to model those directly in your model.
The important thing to take from this is that this is a very active area of research and there'll be a huge number of proposals for models that can deal with one or more or all of these types of different NPH types of situations. But there's no consensus on the best approach. As I said, some people just think we should use. The La Grande can just use the average hazard ratio and be happy enough with that.
Whether it's others, pose very aggressive modeling strategies that are perhaps more fragile in the kind of more standard proportional hazards case board, which try to more explicitly target and model the various aspects of reasons why the NPH has emerged. And so, if you wanted to allocate, this slide, provides kind of a whirlwind tour of the different proposals for dealing with MPH from, you know, as I say, using the average hazard ratio using weighted tests, like the ones we talked about earlier. The ... are aware, these all put different emphasis on different types of events. So, for example, if you're more interested in the change early on, then you could use, like the ... or ...
are aware, which are basically you will waste the impact of those events, or the difference in events earlier in the study, versus later on. So, for example, you know, if you're talking with distributions, if you're talking about a log, normal type survival distribution, these are often considered to be more efficient for that.
But, then, there's also the piecewise models or change point model, which where you explicitly say we think that there's no effect or the minimal effect for, say, the first X months. And then after that the effect emerges. And therefore, we're gonna wait, we're gonna work on how we're going to basically have an explicit model that's going to say like that, that that that the null hypothesis for example, which has no distribution assumption. That can say a parametric sense. But you can do it in a non parametric using a K M, the piecewise model as well. But pea sized exponential would be a kind of parametric example of a of a change point model.
and then you have models which kinda combine those two concepts together where you have different weightings for different points. So you might use like the exponential early on and then use the ..., sorry, the ... earlier on an exponential later on.
Or perhaps you would have different effect sizes for different times, So for the first six months, of same hazard ratio of one, but after that point, assume a hazard ratio of zero point five, and then basically piece those together, and use the appropriate weighting on each of those parts.
one of the most commonly suggested the subject has inevitably happens when something against ... create a lot of feedback on its weaknesses. And strengths, is the combination tests, which are effectively combination of way to test. So, basically, you know, these three are linked together, and then zero point two is linked to zero point five. Because zero point five is really a mechanism by which to take multiple sets from this second category and then combine them in a way to fashion, To kinda make an overall inference which is robust. Because, if you take enough of these tests, they should cover more or less all the different types of survival curve, you can have And basically you have these flemming Harrington Curves.
Is the most common one, which are used for the maximum combination tests, where you have four different? well, the current proposal usually has four different versions of the Fleming Harrington test deal with all the different types of NPH assumptions that could be had from those, and that's something we're actively looking at at the moment, in terms of Sample Size Determination for more, open, to have something for them. The next release. So, if you're interested in that area, we are actively working on it. And then there's a kind of another approach, which is the restricted mean survival time, which is basically what is the difference in the curves at some pre specified time T star. And there's other models out there, there's responder based models, there's frailty models, There's running tight tests. There's a wide variety of different options out here. But hopefully, this gives you a flavor of some of the bigger ones. But also to, we'll be talking about today, are the weighted piecewise models and the restricted being survival time.
So, let's look at those briefly here in this demonstration. So, the weighted piecewise log rank test. So, they said it's kind of a test which combines two aspects from some of the other models we talked about above, namely the piecewise part on the weighted part, where the piece wise is basically saying, We expect a different hazard ratio per time period in Tech, technically, you could take that even further in tech. We have different.
You, know, different models completely over time. Obviously has a racial implicitly. Means, we're talking about piecewise Exponentially here, really, unweighted means that we're going to the weight that we pulled in terms of the overall test statistic will change over time.
And, namely, for the model we'll talk about today, which is called the Apple model from ..., we're going to assume that for the first X period, up to some time, T, like we're going to assume that the hazard ratio was born. In effect, we're talking about delayed effect model, where we don't think there's gonna be any difference in our treatments up to some time T, And then we're going to assume a constant hazard ratio from that point on. So, that's the busier. A piecewise exponential model, which to the exponential curves the same. Same race, up to time T, And then, with a constant, like, you know, being a constant ratio after that point.
I suppose this model has a, no, it's probably the one in the sense that most explicitly tries to model a delayed effect type approach, a particular film with sample size calculations, which is, you know, good in the sense. That, of course, that we're actively trying to deal with the problem. That we believe will exists in our study for an immunotherapy. And then see what effect or improvement we get by using this weighted piecewise test statistic compared to the average hazard ratio from the standard test. But, of course, the big problem with this is that we have to make this strong assumption about well, how long do we think the delayed effect is? The delay duration T is going to be before we do the study. And that's quite a strong assumption do and there's also some issue with baseline hazards, but there have been some issues to deal with the former problem and namely, Zhu et al. They extended this method to have a random time logged, so you could say, we think the time. And I could happen anywhere between, say, 3 to 9 months instead of in this Apple model specifying, the genetic is going to happen exactly at six months.
So, you can deal with a random time, like there, we're going to cover that today, but dot dot, dot, Apple, plus Method is available then, Craig, if you're interested.
Then, I thought the other one, I said, we had mentioned, is the restricted mean survival time. So, the ... or, excuse me, in some form is the mean survival time under some restricted time horizon, which we call T star on. What. Basically, what we're literally just saying There are taking, if we were talking to the or MST for a single survival curve, it would basically just be the area under the Kaplan Meier curve at time T.
So, obviously you could have an unrestricted means of all the time which is the total survival curve for whatever you have. But of course, for any practical survival analysis you're going to have some rights censoring, period zero point anyway. And so, what we're really talking about here is our choice of T star. That data just before the right censoring occurs.
And the reason that some have said that, or an S T, will be superior solution. And the ... difference is basically just the difference of these two parameters, which is basically the equivalent to the area between the two survival curves. Something you can imagine, if I take the survival curve for one, and subtract the area under one curve, and subtract the area of the other curve.
I'm getting the area between the two curves, is that a, this is robust to non proportional hazards, and two, It has a kind of intuitive meaning, because it's basically the, the, or MST difference would be the mean number of extra additional time minutes of survival you would expect on average per subject for on average up to time T star. So, if I'm saying there were six in mind survival time, 12 months, A difference is equal to, you know, two months, I'm saying that, on average, someone would have two months of extra survival up to 12 months. So, that's what I'm going to do now. Obviously, the big issue with that, is that a bit like the previous example, the weighted piecewise, log, rank test? We're obviously having to make a choice about T Star. We're having to choose, like, is there some clinically meaningful time where we go, like, we really only care?
Well, no, no, no, we care mostly about survival up to say, a year or two years or five years, and it's only really if we can make that choice and that that we can kinda really make the or MSC clinically relevant. like. so really, you know, if we're talking about aggressive cancer that we're talking about the example with, with Advanced Pancreatic Cancer like I suppose or MST at 12 months after a year would make sense because obviously, that's a very aggressive disease where we have immediate survival without any intervention of six months. So it might make sense to pick 12 months. It's kind of your, here's where we, here's the cutoff, at which we're hoping to have additional survival around.
It's worth noting, the sample size methods are developing very fast here. We have one method and then create mode. We are looking at a lot of other methods as well. But the orbits the difference. You know, that question of T Star is probably why some people are reticent to user. And the fact that the inference is very sensitive to that choice of T Star There, you know, does cause some concern. Particularly if you're looking at T stars that are further in the future If you're looking at T stars, for example, that are late on in the trial at the point at which there's very, there's a huge amount of survival left subjects left to have the event. In those cases that it is known that the or MSC difference enormously itself of course, is very volatile and obviously that's not necessarily a characteristic that you would like.
But, if we're talking about clinically relevant periods where you know, there's still a fair amount of surviving, then, form A, C is, certainly, I don't see why it wouldn't be a very unnecessarily difficult can't look at, so, To speed things up, what I'm actually going to do is just take the previous example.
Board, basically, for the, where the weighted piece life in Iran log rank test, I'm going to assume that, for the first six months, that there's no effect, and then see what the effect on power would have been of using a log rank test. And then for the or MST, I'm not I'm going to just assume that there is no delayed effect for the Oregon see, but the best way to show what the effect of using ... would be compared to the original example. So, hopefully, you get some decent insights from that. So, let's use lifted with the piece wisely. Weighted, linear rank, test first. So, as I said, well, we want to do is really just replicate.
The original example that we had in this very first, the very first table.
And so in this case, most of these parameters are quite similar, like, well, let's assume the accrual period and the follow-up period with the same 28.25 and 18.5 or 1525 respectively.
We have a hazard ratio of zero point sixty six after our time to treatment effect. The T that we talked about earlier on, let's assume 90% power will keep it round. It is talking about the 93.6 in the original example.
And then 50% of people are in the treatment group, or zero point five, proportion, in the treatment group. And then what we need to do, then, it's like, OK, we need our time to treatment effect. So we're seeing in this example, for the first six months, the two treatment groups were the same. Obviously. Because we really must aggressive drug related chemotherapy. So that wouldn't really relevant here, but let's assume there's an immunotherapy that we're proposing that, you know, would replicate more or less the Bureau misstep effect and assume that for the first six months, it doesn't have an effect on.
I've chosen this probably conveniently that if we're assuming a median survival in the control group of six months, then, of course, we know that, you know, the proportion who would survive after up to six months would be around 50%.
Like, that kind of makes sense.
So, we can see here that we end up getting.
A sample size around 768 or 384 per group.
But if we go back to our simulation table, we can actually see that if we had used the original the original Log rank test. So, remember, it's STD Tree here to simulate table. It didn't log rank, test.
We can, we can model what the power would be using a log rank test, using a standard, just average hazard ratio on the P value associated with that, compare to that piece wise weighted, log rank test for this case. To do that. We're going to add an additional quality compared to our previous example.
And the accrual we're going to assume is still constant, but of course to do that we just need to divide 6 by 18.5 to get 32.4% will happen in the first six months of our accrual turning point. And then the, obviously, the remainder, 67.6% happens to 18.5. So there's not, there's not really happening.
You know, this is really just, we're doing this, just to kinda make the calculation work. This is the same assumption, is having uniform cruel, because we have divided 6 by 3.5, you get this, and then, 100 minus, this, to get this, then, obviously, no recruitment in the last 10 months or so.
Then for the exponential Hazzard race, in the Control Group, um, which is zero point seventy seven Sorry, is zero point one one five.
What we're going to do is, we're going to set that constant for all of these, So, group one here. We will assume that the control group.
So, we're saying, in the control group, the event rate is going to be the same for the entire period. That makes sense, right? Where we're, like, delayed effect. It doesn't really matter for the, for the placebo group that's going to have the same rate, no matter what. It, given that you're using away, delayed effect exists or not.
But for the, the treatment group, we're saying, for the first six months, that the treatment effect. It's the same in both groups. So for this first six month, we're setting these two rates to be identical to each other.
But in the second two periods, we're going to assume that the event rate that we had, if this event rate, using this zero point sixty six hazard ratio, that is now the new event rate from that point on.
So, you know, if we go back to our original, if we go back to this example here, If we go back to the first column here, you know, in this case, we had switched to groups, here, sorry about that.
But, basically, you can see that the event rate was constant on zero point seventy seven for the treatment group over all the study.
Then, it was equal to point 1 1 5, 5, 8 across the entire time for the control group, the placebo group. Whereas, in this new example where we had laid effect, we're assuming that for the first six months, both groups have the same event rate.
And only after those six months, then, we say the event rate from the original example emerge.
And if we just set the drop out to zero, and then we'll do the simulation as before.
And, of course, you know, there's, there's two ways we do this. First, let's just check, well, what happened.
We had used our original sample size. You can see the power has gone way down if, you know, this was what was really happening compared to the original example on surprising because, you know, the first six months are quite important for a study where median survival is equal to six months.
But even if we were to increase the sample size all the way up to 768 or 384 per group, as we have U, which is what is required According to the piecewise weighted log rank test, we will see that if we enter 384, which is, let's be honest, it's already nearly double the original sample size, we see. We only get around 55% power, So that's a reduction of, you know, nearly 30, 35%. So, if we're using a standard log rank test to do this analysis, we don't have around 55% power compared to the weighted log rank test, which has 90% power.
So, of course, you know, if this model was miss specified, if the two time to treatment effect was not equal to six is equal to three, or nine, then obviously this model would have issues, or could break down. But assuming that were correct about this idea of when the time to treatment effect is on how many people will be alive at that point. Then this model is obviously more powerful. Requires a lower sample size than if we were to try and do the same analysis. Same delayed effect, using a standard log rank test.
And, as I said, for the ... example, while we're briefly going to do, is just kind of replicate the very first original example, and then see what the effect would be on the power by choosing different or MST T stars like the time at which we're going to evaluate the restricted mean survival time.
Now this is actually using a Weibel distribution, but it's very trivial to convert between a wide distribution and a exponential distribution. We simply set the control. The shape parameter equal to one, and then we set the scale parameters to one divided by the exponential parameters. So, if we go back to our original, If we go back to our calculator.
If we take one, and we divide that by this, this, the race, in the in the control group, we get the 12.9 ace, approximately.
So we'll put that here.
And then if we take the other the control rate of by zero point one one five five, and we just go to our calculator again.
Will just cancel it out, take one, we divided up by our value again, we get approximately 8.6, 5 6 will just copy and paste that and then we'll take those in. So, just to say that Or, because we're dealing with weibel here, we've converted from weibel are exponential rates from the original example until the relevant weibel parameters. It's relatively simple, because there's a very non relationship, assuming shape parameters are equal to one, and then we have our truncation time here, will ignore F now, this is a non inferiority table, But if we set the non inferiority margin to zero, that's equivalent to a superiority hypothesis. We need a seed.
Will you see 1, 2, 3, 4 again, and then one thousand iterations sample size ratio of one, and then we have our total sample size of 352.
So, remember, like we're kind of looking at the original example here. So, we'll be at 176 per group, and then we're going to see, like, well, what? Even if we're looking at the simple case where there is no crossing and stuff like that, how would the ...
on comparative data? And we're gonna look at three different to, you know, iterations of this, I'm assuming a truncation time, which is we're going to evaluate ... at six months, 12 months, 24 months. So, six months, 12 months, 24 months.
Now, this is a simulation tables that are a little bit, uh, you know, loading time and stuff like that.
But, I think, you know, from my perspective, the main thing take away here is I'd like there's a fair amount of sensitivity depending on what you end up selecting as your as your truncation time. So, you can see here that going from six months, we'd have a power of 62.8% for the ... difference. It's about 85.9% after 12 months, with 94.2% for 24 months.
And so, you can see that if we wanted to have power for the log rank test, an enormous T to be around the same, for the exact same example, it would need to be somewhere between 1 to two years, basically would be around when the power will be equivalent. But if we were to look at a much earlier truncation time, underwent, have a much lower power.
And so, you know, there's a big question here. I'm like, what would be a clinically meaningful choice for the truncation time?
Like, what is the time at which, when we're making a decision about how useful this treatment is, We want to kinda make that inference about it. Like, so, in this, you know, here, we're saying no.
After 24 months, we're assuming no X number of additional months of life for someone where compared to, you know, at 12 months, we expect X number of additional months of life. For someone like, where do you make that line? And I think that's like, it's a big question.
But yeah, there's a lot of interesting analysis and discovery going on In terms of, like, well, that makes sense. And of course, you know, the big thing about this is this is robust. Don't, like any distributions are assumptions like, if we're kind of looking at almost two ends of the same scale, where the Piecewise waited long ... test is a very, the Apple marriage is a very specified model. A very specific idea what's going to happen. And then building the inference around that compared to the ..., which is basically like, OK, take a cut off. And then after that we're just using the non parametric K M curve difference, basically are the air a difference in the area between those two curves. So a lot less assumptions going on in this case.
OK, and then what the final, I think we only have 10 minutes or so. We'll do a very brief introduction to adaptive design. I've covered adaptive design for survival analysis in much detail previously. So if there's anything that you want to know more about, do get in touch, I will be happy to descend on our material or to look in future webinars, Optus. But Adaptive design in Brief is basically any trial where a change or decision is made to the trial while it's still ongoing. And doing that in a formal pre-specified way, if we're talking about phase three clinical trials, typically. And this could be stuff like, stopping to trial early, or increase the sample size, or enriching or finding doses. And adaptively, finding doses. And the idea here is that, you know, we want to give to trial as more control to improve the trial, based on information becomes available.
So, you know, like the analogy I would kind of like, you know, if you knew exactly what was going to happen a priori before the study occurred, then there's a certain way you would have designed your trial. platonic ideal.
Perhaps on the adaptive design perspective is that if you can make the changes a trial while its ongoing, then you will end up in a trial. That's much closer to the platonic ideal than trying to make, you know lots of guesses about what that was going to be. Without any of the information of the trial is actually happens. And the hope is that by using adaptive designs, you're making earlier decisions, And you're reducing costs and you're increasing your chance of succeeding. But obviously, at the cost of some additional complexity, additional logistical issues, in terms of dealing more with the independent data monitoring committees and maybe test statistics that are comparable to the ones that you would use for fixed term type trials. And, of course, from a regulatory point of view, the more innovation that you're doing, the more back and forth that you will have with the regulator. But it is important to note that the regulator is, the FDA in particular is very open to adaptive design.
If you collaborate with the merrily, that's the big thing to emphasize in, their adaptive design guidance, come to us early, talk to us if you're able to show via simulation. But the characteristics of this trial makes sense. We'll be happy to help you get along with that. So things have definitely changed, and you should, and should not feel ready for a privilege. A huge thing.
But, know, we even see the power of early decisions in the context of the covered one thousand trials, which happened occasionally, obviously, bit off topic for an oncology trial webinar. But, of course, the cop of one thousand trials, all, the ones who succeeded, nearly all stopped early due to the high efficacy that we saw in those trials in over 90% for Pfizer. And that meant that those vaccines are now out at a much earlier stage than if they had no guess about everything beforehand, or if they'd have to go the whole 150 events that lengthy originally had to have for power purposes. They were able to stop, you know, closer to 50 events have occurred. So for the purposes of this section are example is basically just the the previous example but extending to a group Sequential design where we have an o'brian Fleming spending band on one sheet, the County Utility Band, on tree looks. So I'm going to briefly to that example here and nQuery.
And so this is basically exactly the same in the top left-hand panel as previously because we have the same accrual period. And this is basically nearly exactly the same as the table that we had from the previous, from the previous example.
So we have a hazard ratio. We have sample size ratio of one. And then before we fill in the power, let's set our group sequential design parameters here. If you're not familiar with Kubernetes design parameters in a nQuery, you know, I'll be happy to send on examples of how that works. But basically, on the left-hand side here, we're setting our ground groups which design input parameters, namely, how many looks we have. In this case, tree, that's total trade, including the final analysis, our spending functions. And then, obviously, for the blank sheet, accounting require an additional parameter, gamma parameter, which we're sending in here. And then we set our power equal to.
92.6%, as per the original example. And you can see that we now need 187 per, per per.
So per per group, compared to 176. So, you know, if an increase of about 10 per group. But obviously now we have this additional flexibility that if the upper efficacy bound is cross of 3.71 after, we have around 100 events. That we can stop the trial early and say it's successful. And it was below 0 point -2 6, then we would stop the trial early for fertility, because we are such a low chance that this trial to succeed. And then, in terms of P values, that's really, if the P value was less than zero, is there, wanted to look, one would stop early for efficacy? But if it was above zero point six point six, P value, then we stop early fertility. And then the P values are slightly less aggressive for these ones.
But you can see here that we're having these chances to stop early, this is relatively loads, relatively low chance of stopping early in this trial.
For efficacy, which means that the Peabody required at the final allowances, 0, two tree, is only a little bit below. The original, you know, your standard fixed term trial P value of zero point zero two five. So that area, what I call, unreasonably, gotta do that and you say, oh Brian Fleming, which is a conservative band which doesn't stand much early on.
Because there's very little chance of stopping early for efficacy comparatively. It's because that reduces the area of controversy. So, for example, of this trial, if you've got a P value of zero point zero two, four, you can imagine that people might be annoyed because they were expecting well, that's successful, right?
But, obviously, because you've spent some of the alpha, because you need to adjust for the factor, was a cheating. But, you know, you're getting multiple Chances multiple bites the cake, to finish the trial, early, you need to adjust, for that fact, to do that in a principled fashion, that doesn't lead to type one error inflation.
And so, that's what a group, sequential design looks like.
Hey, I suppose, you know, it's worth going back to the very first point I made, which is, you know, it's the number of events that drives the interim analysis, not the sample size. It's after 101 events occurs. That we really mind what's going on before that, It's not really that interesting.
So there could be, you know, 200 people in ... 300, it's only when 100 events happens that we do an interim analysis.
Which means that to a certain extent, you know, you don't really know in calendar time when the, when those are likely to happen.
Even if, you know, even just based on the accrual, But we could extend that previous example by using sample size re estimation, which is really an adaptive design, where you have the option to increase the sample size. And we're going to focus here on the on blinded sample size re estimation, where we're looking at the effect size, we're looking at the hazard ratio in this case, rather than the blinded case, we're looking at a nuisance parameters. Like, let's say, the control race. And we don't really have time today to go through the details, I've gone through this a lot in previous webinars. But basically, the blind, the sample size, re estimation framework, most commonly cited is this idea of increasing sample size when your intro effect sizes promising. Does the commonly used name for design of a promising zone design, where promising is kind of a user defined, like that's kind of, in your own opinion, what effect sizes are promising. But depending on which methodologies, you may be curtailed on how much flexibility you have in that.
And my opinion, you should, you should think of this as basically, an extension of groups control design, where instead of having two options, which is to either stop early or continue to the next look, you now have an additional option, which is to increase your sample size. And the FDA describe, this is one that could provide efficiency because you can power, per se. You know, like, I'm more optimistic, effectively, what you expect to happen, a prompt.
But now you still have that option not only to increase the sample size. If the effect size happens to be smaller than what you hoped for, but which is still clinically relevant, is still worth rescuing and getting a significant P value.
And usually, we're basing that increase or when we can do increase some conditional power, which is just the power given what has happened in the trial thus far. There's two main methods the that we talk about here, the CDL and the C.h.w., where the CDL is. You can use the same approach as a standard Crucial design.
Books are much more restricted on when you can do with a sample size increase, namely it has to be a second. The penultimate look and it has to be for conditional power greater than 50%. Or there's some range below 50% you can use for a given design to 50%. You're guaranteed to not inflate type one error. And then your c.h.w., which is a weighted test statistic approach, which is way more flexible. And it's really just a subset case of the adaptive group sequential design framework as a total.
And briefly illustrate this, if we go to our previous example and select interim monitoring and sample size re estimation, it will open up a table like this, which works very similar to that group sequential design. Table, where, on the left hand side, we set the rules for our group sequential. For our sample size re estimation, the default that you usually have is the Chen, the max amount approach where it's an ultimate look where the sample size re estimation is an option. And it will happen if the conditional power is above 50%. However, we're allowed to derive using native polka, really CUI. ... are made up.
If we do that, you can see that for this very specific design that you can actually go all the way down to a minimum conditional power around 33.78% and increase sample size without causing a increase in the type one error.
So let's assume that we're going to allow maximum number of events of 906.
That's probably quite an aggressive thing may be impractical but this is just a hypothetical example, and we're going to assume that we're only going to increase the sample size until the conditional power equals our target power, which is usually, by default, assumed to be your original power from the original design.
You have the option to increase always up to 906, if you're worried about people back calculating what the interim effect size is, and who are involved in the trial, you know, that's quite an extreme strategy for dealing with that particular problem, and so, we're gonna assume here an interim hazard ratio of zero point eight.
So, you know, we have a 20% reduction versus a, you know, 33% reduction. And that would give us an interim test statistic Z Statistic around 1.21, which gives us the conditional power.
If we were to just continue to trial as is, saying there's around the 47% chance that we would find for significance at the very last look.
Given what's happened so far, this entrants sadistic, and then we'll go to the second look where now we have the option to nQuery sample size.
If you enter this interim hazard ratio of zero point eight, we would obviously find that our conditional power is between the minimum conditional power bound of 33% on the massive digital power of 92%, and therefore, it recommends increasing the number of events needed from 300 to around 850 tree. That's a very high increase, of course, so you might go Well, maybe it's not what rescue at this point? but, Obviously, the hypothetical example. And if we actually had followed a strategy, while we would see is, If we entered zero point eight for this number events, we would get? We would find for efficacy, we'd have a test statistic well, above the threshold of zero point one hundred ninety nine tree of around 3.25. Then, of course, because it's a toy example, if I were to run the clock back and remove the sample size re estimation.
This would be an example where the original design would not have signed for significance. So remember, this is exactly the same, except we've stripped out the sample size re estimation. We're still getting the conditional powers.
If you're interested in that information, Apigee can see here, we would have found fertility at the end, because this test statistic is just below our threshold of zero point one nine nine three one point. And this is quite close. It's not about 1.96. I was on the area of controversy, for lack of a better term. But you know, it's still quite high. Or perhaps could be approaching significance. Even that's a turn to statisticians would hate to hear I'm personally I wouldn't enjoy hearing either buying it, it's just worth noting that if you're interested in doing more complex group sequential design analysis in an nQuery that allow for you know, piecewise a cruel or, you know, Piecewise Hazard curves, those options are available to nQuery as well. And these are basically analogous to the tables that I showed earlier, the ones that kind of looked like the, these tables, that kind of work, very similarly to the simulation tables we talked about earlier.
OK, I think we're, I think I've run it slightly overtime.
I apologize for that, but, obviously, he does the recording. I suppose that's less of an issue, hopefully.
I think it's worth noting that there are some complications with doing survival analysis and using adaptive design. I think, You know, the first thing, of course, is that the Unknown Paolo pins. That, you know, your certainty about when interim analysis, for example, are going to happen, is much lower than for equivalent test, where you're using, like, fixed follow-up for a mean, or proportion, which means that, you know, you have to do more work in terms of predicting Windows internals. Can do.
Next month's Webinar, I'm hoping to give you a preview of a feature, an nQuery.
We'll be having adding a tool that will give you the option to make some informed decisions based on what's happened in your child so far, for when a, given interim analysis might occur. Basically when some milestone number events will happen, such as with a little teaser there. But, obviously, it's also worth noting that you know, there's higher numbers likely available in the active cohort, an interim analysis occurs, that might be an issue, particularly if you're dealing with like, overhanging stuff that happens oftentimes the cryptographic design.
Note that the assumptions for, like the groups that you design and the sample size re estimation can be quite strong. You're kind of assuming these constant treatment effects and stuff like that.
On average, Joe, you know, as is often the case, like you, as we're talking about, the average hazard ratio and an MPH problem, you know, deviations from this still have critical, They can still be interpreted carefully, meaningful to just don't mean in perhaps exactly the same thing that you got from a fixed term analysis. And just one small thing about sample size re estimation for survival of courses that, you know, if you need to increase your number of events, there are two mechanisms you can do to, you have two options available to you. one would be to, obviously, add more subjects.
But if the total number of events required, you're increasing two is less than the original sample size, you can obviously also choose to just increase the length of the study that basically allow more events to occur.
The subjects who are still available to have the event Boss, DOK, optionality could introduce bias because, if you know that the effect size is better, earlier, all in a child, like you have a wide gap, like, you know, basically, subjects early on, do better on your treatment versus the control group.
Then, you might be incentivized to increase sample size compared to increasing the length of the study. Because, of course, if you increase the sample size, you're getting a lot of new subjects to study where you're, where you believe your effect size of doing the best.
Whereas vice versa, if you're, if you're a drug characters better in the tail of the distribution towards the longer end of the period, then you have incentive to increase the time of the study. Because, of course, then, you have more of these people who've been in this study for a long time, and who therefore will contribute the better effect size that you're seeing in that part of the survival curve.
So Fried Live in corn in 20 17 showed that like that there is an ability to kind of, you know, optimize this in a way that would increase your chance of getting significant P value in a way that, you know, isn't based on, you know, some actual clinical need, clinically relevant criterion. Now, obviously, if you're doing a phase three trial, you're usually doing a, you have a day in, independent data monitoring, many and stuff like that. So, that should be at some of that.
But, you know, the fact that there is this optionality is potential to kinda, you know, tilt the scales of it. It's something to keep an eye on when it comes to sample size re estimation.
four smile analysis.
So, discussion to conclusion survival analysis, it's the most common approach for oncology trials in phase three.
Survival analysis implies a requirement for, you know, flexibility when you're doing the analysis result, and see also on the design and sample size considerations for the trial. Remember, that sample size for survival is based on the target number events, not the sample size itself. The sample size is really just a matter calculation for how many we need to get the number of events.
There's an increasing interest in non proportional hazards particularly due to immunotherapies, and the delayed effect that we're seeing they're nowhere.
No questions of whether delayed effect, it's coming from responder. Effects are coming from an actual just delayed effect. In the sense, it takes time for .... Bought the way it looks like in the current the Kaplan Meier curves as of late effect.
But, remember that and it's simply, yeah. So, yeah.
So, you know, we're dealing here with situations where there's various different options available that could be used for this particular model. And an adaptive design is increasing also in popularity facade analysis, sample size re estimation. But remember, that there are unique challenges for survival adaptive design.
As I mentioned at the start, if you have any questions that you want to send after, you know, after seeing this recording, feel free to e-mail us at info at ... dot com, or if you want to know more, you can also go to ... dot com directly. And if there's any feature here that you don't have an nQuery right now or you're missing one of the higher module tiers like nQuery Pro, you can try those for free by simply going to start solids dot com forward slash trial. We just need your e-mail. There's no, Nick did no additional detailed stuff like that. And you can try the software in your browser. So we have a like, a virtual machine type of option where you can try this in your browser, see what it looks like, see our fields then after that, decide whether you wish to go out and purchase the product itself.
And if you're interested in any of the points made today, I've included a pretty comprehensive set of references for each of these sections.