Sample size determination is a key part of trial planning but is typically focussed on statistical success criteria such as Type I error and power.
Prediction calculations instead focus on more practical considerations such as accuracy or timing. In this webinar, we explore where these two areas intersect and how each can help us improve our understanding of the other.
Sample size determination is a commonly required step in the planning and design of trials. Sample size calculations are usually focused on statistical success criteria such as statistical power or interval length.
At the same time, there is significant interest in prediction calculations for practical considerations such as milestone timing prediction and the development of clinical prediction models.
These two tracks of understanding how to create successful trials have a number of interesting interactions which help us understand and improve upon the value of each.
In this webinar, we focus on two areas of interaction:
1 - Clinical trial milestone prediction for a survival group sequential design
2 - Sample size calculations for clinical prediction models.
Looking for more resources?
*Please note, this is auto-generated, some spelling and grammatical differences may occur*
So, hello, and welcome to today's webinar, Sample Size and Prediction Calculation - Milestone Prediction and Sample Size for Prediction Models
We will be going over sample size determination and prediction calculations, particularly in the context of clinical trials, looking at some of the areas of overlap. An interesting synergy between these two distinct philosophies and areas and calculations, in the context of clinical trials.
The two case studies that will be focusing on today is milestone prediction in the context of group sequential design, and sample size determination for clinical prediction models. If you have any questions, please feel free to enter them into the questions tab on the right-hand side of your webinar software, and I'll try to answer as many of those as possible at the end, but most likely, I will answer those via e-mail, for any they don't actually get to.
My name is Ronan Fitzpatrick and the head of statistics here at nQuery, and ..., a researcher, since nQuery. I've given workshops to places like the FDA & JSM hopefully in the near future are getting back out there in the road again. But obviously doing a lot of these webinars and seminars for both internally here and obviously with other organizations as well.
So, what are we covering today? Firstly, we'll just have a brief introduction to both Sample Size determination and prediction calculations, just to kind of give a little bit of context, but not a huge focus today. And then focusing on the two case studies where the areas of sample size determination and prediction calculations interact quite strongly. Namely, the prediction of milestones, such as when our interim analysis is going to happen, in group sequential design, and then sample size determination for clinical prediction models. And then maybe a brief moment of completion, conclusions, and discussion at the end of the webinar, including some of the questions been answered if we have time.
As mentioned, this webinar is presented by nQuery. nQuery is the complete solution for optimizing your clinical trial from early phase, up to confirmatory designs.
It covers a wide variety of fixed term Bayesian and adaptive designs, and is used by over 90% of organizations who had trials approved by the FDA. Have a license of nQuery. You can see some of the reviews here.
So, with that out of the way, let's just get into the very brief introduction to both sample size determination to prediction, to kind of just make sure everyone's on the same page.
So, sample size and prediction. Sample size determination. Of course, we know quite well from these nQuery webinars, is focused on finding the appropriate sample size for your study, and usually, this is based on some type of statistical goal, or success criteria, and this is usually mostly done on a pre trial basis. Now, of course, Adaptive Design, kind of moves, is a little beyond this boss. The primary focus is, how many people do I need my study to be able to say that I have a high chance of success, and success might be defined, such as statistical power, which is obviously your probability of rejecting the null hypothesis? The null hypothesis is false. The interval widths if you're looking at it comes interval is your kind of primary goal, and then there's other things like interim stopping rules in the context groups, which design, which is what we'll talk about later on.
On the other hand, prediction is usually focused on projecting future outcomes in our trials. So we're interested in, kind of, more practical outcomes, such as, like, when is X going to happen?
How likely is X going to happen? And this is usually based on models for teacher use, pre-existing data.
You know, either from, you know, other trials, or other data sources, databases, et cetera. Or, perhaps more interestingly, based on the interim, data, has become available in your trial as its ongoing. So, from a sample size perspective, usually the data to come in is really only been used to define what's going to happen based on some pre specified rules.
Whereas in the prediction area, because we're not making the statistical inference, typically, we're kind of just talking about, you know, practical things like, you know, enrollment goals and stuff like that. Then, we have more flexibility to kind of just use the interim data fully in an exercise that and kind of make dynamic choices based on. So you have things like milestone prediction, like reaching events or enrollment goals, clinical prediction tools, success, failure, probabilities, stuff like this. And these are kind of where prediction is usually give.
So while sample size is usually focused on these kind of pre trial statistical success criteria, prediction is usually focused on these more practical success criteria in inverted commas. So, this idea of, like, you know, power is very tied in with the inference framework that you chose to use for your clinical trial, where its prediction has focused on practical, things. Like, well, how long, it's actually going to take me to get to the point where I can decide to make a statistical choice, or not based on the information has become available.
And that's where these interaction areas, and kinda become quite interesting, because the sample size is kinda more, trying to toward statistical side, and based on the kind of or treat pretrial assumptions. Whereas, these prediction things is more towards practical, kind of real-world considerations, and on a more ongoing basis. So, the two interaction areas we're going to talk about today, as I mentioned earlier, are the milestone prediction for groups control design, and sample size determination for clinical prediction model criteria.
So, we'll get straight into the first case study, clinical trial milestone prediction.
So, there's kind of two main milestones that we're going to talk about today. Enrollment prediction, which is mostly relevant for pretty much any clinical trial. the terrain and that event prediction, which is more specialized towards survival or time to event trials. So, let's briefly talk about, enrollment prediction first, albeit our case study will focus on the more complex problem, event prediction.
So, enrollment prediction, it's pretty simple. Basically, you have a clinical trial if you've probably done a sample size calculation. So you're now going, OK, we need a thousand people in our study to be able to make the inferences that we want, but now we have the problem that we need to open that find sites, open sites, recruit people, You know, and before even recruiter what you recruit them, you need to screen them, you need to assign them to your treat your effective treatment groups.
And there's a whole pipeline that's required to get, you know, from the abstract concept of a sample, up to actual people in your study being assigned your treatment or not. And that's where enrollment prediction comes in. It's kind of taking the pipeline and turning data into a statistical model that can be used and then generate reasonable predictions of what you think's gonna happen, usually based on what's happened so far. Or perhaps based on other similar studies which had similar enrollment plans.
So, these sample size milestones, these enrollment milestones are a key milestone for the vast majority of clinical trials. And, you know, you can find statistics that, you know, the majority of trials fail to reach their enrollment goals. So this is obviously something that is very important to try this on the ground, and, you know, the level that you want to treat as a statistical problem, or as a kind of more logistical problem, or a logistical modeling. Kinda said markup type, modeling is really up to yourself, but for today's purposes, we'll kind of treat the event, or the enrollment process as if it was just a statistical process, and kind of ignoring the complications of screening and such.
But even within that, you know, we can't ignore the kind of practical realities of our trial completely. So, there's kind of multiple levels you could think of, the enrollment process of coming from, You could think of it as a global process. So, basically, you know, you're recruiting 10 people per month, and basically, you could go, OK, that's going to roughly give us what's going to happen in future months. Not too, interested in how those are distributed across our sites, Or it's going to model as if it was a global process. But if we wanted to go down to another layer, we took the region, so some regions might be doing better than others, like states, or countries, and EU. And then, you know, one level below that would simply, be modeling individual sites. For example, hospitals, and then, seeing how they're doing. And then recruiting, or removing hospitals based on their performance thus far. Like these key decisions, that people are making, I kinda wanna know.
You know, trying to make the best decision possible, To ensure that they reached their goal, which is getting enough people at the trials. So that they can actually, you know, do the analysis. That the trial is designed to do so.
Be it, let's say, adding and dropping seismic sense, I'm like, know, if you're talking about, well, how do we do this modeling, just from a kind of this kind of simplified contexts, where we're just kind of treating this as a statistical process, rather than a kind of logistical process. Then, you know, there's a variety of different ways you could detect. It's fairly simple equations that you could use, There's simulation, there's kind of Bayesian approaches, if you want to borrow from previous studies and, you know, all of these are probably based around one statistical model or another such as like Gamma Poisson or Poisson or piecewise, Poisson, or similar.
And, so, that's one type of Goal, and that's one that's very relevant for pretty much any trial.
On the other hand, if we're talking about survival trials or time to event trials, then the key thing that's of interest at that point is our effect. So the actual happening of the event of interest, whether that be progression in the context of oncology trial would adopt be, you know, a death, which obviously is something means you your dopamine, the overall survival cohort. And so we're usually worried about using inference, using *** aggression, or log rank test, about how long it takes for that event to occur. And, of course, we're trying to see whether the time that it takes on your treatment group is less than the time on the control group. Assuming that the event is negative, which usually is such as death.
So, yeah, overall survived progression.
because I'm, these are common endpoints, bought the important thing in the context of survival trials, that the statistical decisions, particularly when to end the trial, undo the right censoring or when to do an interim analyzes. These are based on the event process, not the enrollment process.
So, really, you know, if you have a thousand people, but no one's had the event from a practical point of view and a survival trial, that Cohort has not giving you any fully usable information. Yeah. Now, obviously, if everyone survived for like 12 months already, that's giving you some information. But from the inferential point of view of distinguishing between Year two groups, you're not getting what you need to be actual to do, the time to event comparison.
So, in a survival trial, similar to what I often say in the context of sample size determination for survival trials, it's the events that are driving the inference, not the sample size. The sample size is really just a meta process, creating people who can have the event.
So, the big thing that means is that if you're doing a survival trial, that if we're doing modeling of, when we want to do the interim analysis, or when we take the trial, it's gonna end, we need to model the event process explicitly. Not just the enrollment process. We can't just focus on how recruitment is going.
And because you now have this more complex situation where you need to model not only the survival process, but you'll probably need to model the enrollment process. Particularly if the enrollment is still ongoing at the point that you're making this prediction. But you also need to model competing processes, like dropout, cure a fraction, things like this. Then, you're in a situation where this is a much more, comprehensive level of modeling required, to kinda get an accurate inference about the, when events are gonna occur, when milestone number event is going to occur.
And you know, it, You know, enrollment itself. As we mentioned, very often deviates from the assumed projection for pretrial while, it's even more so true for survival trials where you know, the expectations up front can often be quite different from what actually happens in the trial. That's why you are doing, of course, to a certain set. That's why we're doing trials, because we don't know what's going to happen. If we knew it was gonna happen beforehand, we wouldn't need to do clinical trials. We could just approved drugs based on our priors or are based on our you know information that we thought was troop or face to face one type trials. So, this all makes sense. But, you know, we're talking about like you were to combine enrollment prediction and survive a prediction. If we're talking to other stakeholders people involved in the trial sponsors and the trial's been ongoing for some time, it's behind schedule. It's really important that that point a understand based on what if we just stick with what we're doing? How long is this going to take? And then also what type of things we could do to get the saint Back on Track? That's where it is like. This prediction makes sense.
We're talking about group sequential design, which I wish I'd already talked about here, today, in any detail, but basically, in a group, sequential design, you have multiple pre-specified interim analyzes say. For example, let's just say you have one interim analyzes. After 50% of the sample size if you're talking about. It means of abortions Or let's say 50% of the events occur then.
In that case, we may want to add that interim analysis, know when the study is going to end, or perhaps when we start to try, we want to know, not only when it's a trial going to end, but when is the trial going to have its interim analyzes.
So today, we're gonna focus on the former scenario, where we effectively have a group sequential design, we assume that we got to the first interim analysis and now cause where we have that information. Because we have to unblinded, because that's how we've tried to design, works, you have to do a comparative analysis, or more accurately, The data monitoring committee is likely to be doing that, then. It makes sense that, well, you know, we already are doing this and as we already have access this, data, perhaps, would make sense to use that data to get a more accurate projection of when the study is going to end, compared to where we thought it was going to end using our pre pretrial sample size determination.
So, just to briefly mentioned, there's a variety of different methods available for enrollment, event prediction, of course, there is that these aren't the slides here, but you know, we'll be focusing on the simulation approach in today's webinar.
So, as I said, we're just going to take a kind of relatively simple group sequential design. We're going to take the scenario where we designed this trial and then we get our goal for the interim analysis, which this is a survival trials that I'll be based on the number of events. And then we're going to do some playing around with Prediction, Type Turtles the sea. Well, based on that, how much deviation do we have from the initial assumptions? And then what kind of things would be perhaps due to try and get this trial back on track, to increase the sample size, for example.
So, there's nothing too crazy here. We're talking about a trial where the placebo median survival or five months, seven months and treatment group, zero to five hundred one-sided tests, Calvin, for your standard alpha level for a clinical trial for Face Tree. So the initially expected accrual period is 24 months with a minimum follow-up of 30 months, So most like both of the study length will be, recruitment will be ongoing, only the last six months. Last half a year, will the recruitment be finished? And then we can see that they expect drop out of around 4% per year. And they are going to have a sample size of 230, which we'll see quite quickly is actually gives us the power of around 90% chicken and egg kind of thing here, the way they describe this, they talked about sample size first, but I assume that this was derived from the power calculation. And then, in terms of the group sequential design aspect, we're going to talk about a, to look design.
So, one interim analysis, one final analysis, and what the inter analysis happening, halfway through the trial, in terms of the event information. So, very importantly, that's not when half the subjects are recruited, that's when half the target events have occurred.
And of course, we haven't got the target events here yet, but we're going to do that calculation in a moment.
And just in terms of the spending functions, both the efficacy and the fertility bands will be the blank sheet canny, gamma spending function with a gamma value or HST parameter. And not just to say that the fertility bound to be non binding, as expected, very rare in clinical trials that you would have a binding futility. Boundary. I think most sponsors like to have the flexibility that, if they cross the futility boundary that they aren't forced to stop the trial early without affecting the overall type one error, which is what happens if you have a binding futility barked.
If look at the, it's do this sample size calculation in nQuery. And then we'll have an idea of what the overall design looks like, what what the number of offenses where the interim analyzes are likely to occur, based on information. So what I'm using here is ... 25, this is a sample size table, which is for Group Sequential Design. For piecewise, accrual, piece wise, event rates on piecewise dropouts. So if we wanted to do in this table, we can specify that the event rate changes over time, or the dropout rate changes over time. But in this case, actually, it's just a constant, the battery constant dropout rates, so things are a little bit simpler.
So we'll specify the things that are quite easy that we kind of have from before. So we saw there's a one sided to zero point zero two five significance level and we're going to specify two periods for our table because we want to have one period where accrual happens. And then one period for the post accrual period. So, you know, from zero to 24, for when a cruel happens, assumed to be uniform and the original sample size calculation and then the remaining six months for accrual does not occur.
So we're gonna convert these median survival the exponential rates as what use an nQuery. And with that, I will also need to convert this 4% dropout rate into an exponential rate as well. That's that's just the way nQuery, generally uses this information.
Do we do that? Let's just focus on the group, sequential design. So, as mentioned, there'll be two, looks and total, so just be clear about that one interim analysis. And one, final analysis by default, this is, you know, equally spaced design, like equally spaced looks, and that's what we're going to have today. So, once half the total events occurs, we're going to do an interim analysis. And then once the total events that we're targeting occurs, we're going to do our final analysis. But, remember, the sample size could be, you know, in various states of completion, the re enrollment process might be in various states of completion by the time we reach that particular events threshold.
And then we want to add R two. Spending functions, which has mentioned were both blank sheet, the county, or the gamma family, with parameters of in each of them.
I've mentioned it's a non binding fertility bound because that means you have the flexibility that. If you cross the fertility boundary and you choose to continue to trial for whatever reason, that your trial would not have inflated type one error by making that choice. Obviously just flexibility for very little cost makes a lot of sense. And that's why it's generally the standard way this is done.
OK, so, as I mentioned, we need to get the hazard ratio. And to do that, we're going to go to the assistance venue. Or we're going to select the survival parameter converter.
And in this, we're now going to enter our 5 and 7 months to get our hazard ratio. So remember, in Group one, we have a, has a median survival. Or five months in the middle. And we have seven months. We get something like this, but we can also do the other way round. It's basically the same thing.
Basically, that's just, this is the one divide, the reciprocal of this.
You can see that the exponential rate for the treatment group is around zero point zero nine Nine are basically zero point one zero point one four. In the control groups of talking to exponential rate on this month, race, like, if we were to turn this into a year scale, or the week scale, by, say, dividing by 12 for years, or multiplying, by four for weeks, these numbers would obviously changed. So we're implicitly using the time unit by the choice of whatever time. You know, we put it to this median survival rope.
So we'll take this other hazard ratio here by zero point seven one four.
And we'll enter that into the original table.
OK, and then we opened this side table here, this looks quite similar stuff, they've done in previous webinars for survival, like STT Tree. Basically, in the first row, we need to specify the end of the time periods that are of interest, in column 2 and 3. So we want the end of, the call him to period, to be 24 weeks. That's the end of our cruel period and then, the end of the study to be 30 months in. And the only difference really between these two rows will be that, actually, these two columns, I should say, is that this has got to equal 100 for a cruel percentage in column two and N zero, and called, it treats. So, all we're specifying here is that we want 100% of total recruitment to have happened, between 0 and 24 months on a uniform basis, which is what we saw in the original sample size, determination table. And then, we want normal recruitment after 24 months, 30 months, which is when the study is going to end.
That's what we're planning to write sensor to study.
So now we need to get our exponential hazard rates, so we can easily get that by going in here and taking the value here.
So we're going to take the control rate of about zero point one four are going to enter that into the group, one extra actual hazard race.
You can see here that we get back zero point zero 9 9 point zero nine nine as we have from this table. So you can see we're getting the correct exponential hazard rate for this particular hazard ratio.
You can see here that you also get given that the trial had been fully stocked from beginning you'd CW ..., 3.6% survival rate after 24 months and then the 1.5 after 30 months in the control group, but that's closer to nine and 5% respectively and the other group.
Now the other thing is you have these two exponential dropout rates, which obviously you're allowed to be piecewise again, but they're constant in this case. And they said specified that the exponential dropout rate should correspond to a 4% dropout rate per annum.
And you can easily do that calculation in the same Survival Parameter Converter tool. So all we need to do is specify our exponential rate on the month scale. So that means if we're talking about rates per year, we need to have 12 months be the time that we're going to do the drop by processor and it's a common people survives a drop by process. So, this is the survival proportion of zero point ninety six, right?
So we're saying, after 12 months, 96% of people would survive death dropped by process. And that gives you an exponential parameter or exponential dropout rate of around 0, 0 tree, which is, you know, significantly lower than the two event rates. That's what we property expect. Most clinical trials.
Now, we basically need to enter this value into the relevant rows, these four rows. OK, so now, this table is fully complete.
So if we complete the rest of this calculation, So we set this sample size ratio to one, and there, we enter our sample size and age group of 230.
Then, we will see that we get approximately 90% power for this design, and that the Group one events is around 197.
And then the group two events is around 177, so unsurprisingly, we're getting more events in the control group versus the treatment group were more deaths in the control group versus the treatment group effectively.
And we can see here that it's basically 198 plus 177. So, this isn't equal, you know, exactly at an integer, but it's pretty close.
So if we go back to our original looks table, which we could do by just selecting, simply selecting this, Looks tab at the bottom, Then you can see that this table on the right has not been filled with the group sequential information.
And you can see here, that the total number of events is effectively around 375, and they would expect that this interim analysis, based on the exponential rates given, would occur around 17.3, 4 months into the study.
And basically, you could make a fairly quick calculation of how that would correspond to the sample size expected at that time, as well.
So if we take 17 point tree for one, and when you divide that by 24, which is the total expected accrual linked, then we see that that's around 72% of the total accrual periods. So assuming uniform accrual, if we multiply our total sample size 460, so 230 multiply by two, that we would expect that act as interim analysis, we would have reached around 332, or people would have been recruited at this time. So, what we're saying here, is that, you know, with a fairly simple calculation, just take the ratio of the expected calendar time, divided by the expected, took the cruel time.
Then, we multiply that by our total sample size, then we're basically getting back what the expected enrollment will have occurred at this 17.3, 4 time, effectively.
You can also see these efficacy balances, or on the disease scale. These are basically just the log hazard ratio divided by the square root of one over the eight the events in each group.
And you can see here that we have no privacy bands, right? 2.9 at the interim analyzes and N at the fertility boundary. And then, obviously, these converge at the final analysis. You either have defined for utility or efficacy at the final analysis. People, you have to make a decision at the final analysis. And you can see here that on the alpha scale, on the P value scale, if we're just using a standard log rec analysis, we have about you need to have a P value of less zero point zero zero two to fight for significance of this interim analysis. But it's very close to the original zero point zero two five. At this final analysis, I think that reflects what usually Apps Script chance of dying, you're kind of usually conservative data analysis so that the final analysis is relatively representative or close to what the fixed term analysis would it be?
This kind of reduces the area of controversy I would call it where, effectively, if you get a P value at the end and a zero point zero two tree, and they're going, that should be significant. And you're like, well, no, we spent alpha, so it isn't you kind of want to avoid those kinds of controversies usually when dealing with trial as some people like them.
So that's the design that we've done. So, we've kind of replicated the sample size calculation.
So, what we have here is basically a full sample size calculation.
We have 90% power bores know, once we get this interim analysis, we might have a number of what's the actual interim analysis? actually happens when we reach like 187 or 188?
Sample size. Let's just say 187.
Then the big question is, well, look, this is what we thought was going to happen before the trial started. If we were to take the real data as it became available that, we're going to use for our interim analysis regardless.
We could not make better inferences about what we think the rest of the trial is going to do based on the information. And then make practical ideas about like how much are we going to overshoot or undershoot this original total study length. We are targeting a 30 when we needed to get this like 374 to 8 or 75 sample size calculation. And, you know, for this example, we actually have access to that data. This is data, that's available, that we have available. So, we're going to take the real data that existed at this interim analysis.
Or we're gonna use that to make some calculations about like what happened in the actual trial versus what actually happened, what we expect will happen pretrial, and then maybe some of the things you might do to kind of ameliorate the issues that might have existed between those two.
So to do that, we're just move over to the predict tool that was added in Armor, Both recent update, nine.
Available in the expert tier, we're going to basically do the kind of simple calculation to allow the easiest comparisons. So we're going to compare, We're going to use the online events so fully. We're going to know the group assignment in the original dataset and we're gonna use that a group assignment. We're going to assume that enrollment status is ongoing because it wasn't this case. Even, Yeah and we expect that. Of course that original calculation, like remember this time that we set the interim analysis as well below the accrual period. So of course, we would assume that a cruel is still ongoing atis interim analysis.
And worlds are going to assume we're going to use a subject eight only. So, I've talked in the last tree, webinars a lot about this tool, and looked at site data, but for comparison sake, it will be easier to kind of use this subject data only level, and then be able to kinda quickly compare what happened compared to what actually we thought was going to happen in the kind of design stage.
So, we're gonna use the subject example dataset. We're gonna have a treatment ID equal to treatment. here.
And we basically did this. Just recall humans in the dataset that we have. So if we go to subject example, you can see, we have a country, site ID, arrival time. So when did you arrive in this study? How long have you been on the study, either until now, or in the case that you hadn't event until you had an event or dropout? If you had a dropout, what treatment group we were in, as well as the status here, is just, you know, where you, either available at the current time when this data set was take it, so you had been recruited. But you hadn't had the event. Had you had the event, which is indicated by one. And then there's , would mean that you dropped out, just to say, that, obviously, the dropout process doesn't have to be drop out. If this is anything, that means that you, as a subject where no longer available to have the effect.
So I think the big thing to take away from this first set of information that's accrual options thinking. So this is based on the real data, we've kinda used the data to generate this information here, is that the current sample size is 402.
So, this is went like this. It goes well, as we'll see in the next screen. We've basically reached the Interim Analysis. We have 187 events, but the current sample size is 402, which is a fairly substantially higher than the 332 that we were expecting.
There could be multiple reasons that happens, right. Like, that could be, that the recruitment is going slower than expected. It could be that the event processes going slower than expected. It could be that the drop by process is higher than expected Because, obviously, if you're dropping out, that's kind of taking away the number of people who can have the event.
And all of this contributes effectively to a number of hypotheses. But using this tool, we can kind of delve down in and see which one is the most likely to be true.
And, you know, the first thing we'll first hypothesis will explore is laser. The event process, sorry. Is the enrollment process slower to expect that we're recruiting people slower than when you were hoping to? And we can do that by just checking what the accrual rate is up to this point.
So, this case, it's around 19.4 per month, OK? And how does that compare to what we originally had, Where we can very easily calculate that, by considering the expected overall recruitment rate, which would simply just be equal to the total sample size, divided by the expected length of time. So we have 460 people that we were expecting at the end of the study. And we thought that, that would take 24 months to recruit. So remember, our cruel period is 24 months. We're thinking, would take 24 months to recruit forums to see people. We divide 460 by 24, we should get at 19 point.
So, our original recruitment rate that we're expecting with, around 19.2, which is actually slightly lower than the achieved recruitment, right? So, based on this, it seems that the reason that the trial has taken longer to a breach, they enroll the interim analyzes, the target of events for the interim analyzes. It's not because the recruitment rate was slower. That seems to be more or less on targets.
So we click Next, and then we get into the Event Dropout Information. And obviously, very quickly, we see this is where the action is. So, if we go back to our original design, what were the event rates we were expecting?
And each group, we were expecting event rates, actual event rates of about 1.4 point 1 in the control and treatment groups, respectively. If we go to our prediction engine, the best exponential assumptions that has so this is the kind of best fitting exponential values for the treatment control group. It's like zero point one, oh, 8 5, and zero point zero eight one tree, both of which are below what we're expecting, particularly the control rate, that control rates, in a fairly substantial amount, lower than expected.
So, remember, zero point one hundred and five point eight zero eight one tree, versus zero point one four, and zero point one, which is what we expect to get the initial, at design stage.
So, the event process in both groups seems to be slower than expected, and we can also see that the hazard ratio here is slightly more conservative than what we were hoping for in the initial clinical trial. And, I could certainly go through like, a, like, sample size re estimation, and stuff we were looking at, like, maybe that needs to change our sample size. But, I think with this case, I think the power for that hazard ratio would still be, if targeted, initially, with the sample size will be around 85%. So, it's not a huge difference. It's probably the case where it's optimized for estimation would be, you know, probably necessary, though. You may choose to have it, if you wanted to, But focusing on the practical problem, which is OK.
You know, we're not really worrying about the statistical inference side. We're kind of assuming that the power is still relatively high, it's not huge divergence. While we're really interested to hear is like, OK, given that the event rate has turned out to be slower than expected, what does that practically going to mean for the length of the study, for how long it's gonna take for us to get to the point where we're finished, his toilet unreached, the tree and over 74 events, or trade at 75 fence we are targeting, initially.
And so, it's important here that, you know, like, this is slightly less than expected. But the important thing for our protective really, is that the event rate is slower than expected. At a target of events is 374, which is, you know, you can make the 375 to correspond exactly to the, the original design, and they're very close. Obviously, it depends kind of which algorithm you use. But it wouldn't make much of difference in this case.
Just to check, did dropout make much of a contribution to this? No, we can see that the dropout rate, both groups is a boat below zero point zero zero three. So, the dropout rate is actually doing better than expected, in the sense that there's less dropout than we initially planned for, or were worried about. And that means that, really, what's driving the fact that this trial is taking longer than expected to reach the targets. That we expected initially to reach, The end of the trial, were kinda behind schedule is because of the event process. So remember.
In this universe, we're about 20, 20.7 months into this study, so we're obviously about three months behind where we hope to be in the initial study. So, remember, we were initially targeting, to have our interim analysis, around 17.3, 4 months into the study, and now we had to wait actually to 27.7 months. three extra months of time has passed over where we were hoping to the interim analysis, to reach the interim target number of events, so to reach the 187 events that we were initially looking for. So, remember, you know, we needed around 187 events. It's taken three month's board and expect to get to that target to do this interim analysis.
So, what are the implications if we assume that this trend will continue for the rest of the study on the actual end of the study time?
Well, we can actually model that using the predict function. So, if we just use the default assumptions which are the best fitting assumptions, assuming a Poisson constant, uniform accrual and exponential survival process. Then we will see very quickly that the, the total length of the study based on the current trajectory is a little bit behind where we were hoping it to be.
So if we scroll over here to the results on the right.
Based on where we are right now, the accrual period is actually expected to finish a little bit earlier than expected around at zero point three of them onto, you know, maybe a That's maybe a week and a half. More like earlier expected. But the total study like this, that nearly four months longer than we expected Now, whether that's such a considerable difference that you need to start considering some changes, getting more aggressive in recruitment, stuff like that. Obviously, it depends on the people on the ground and decisions they want to make. But basically, what we know right now, we're expecting that this study will take around four months, longer than expected.
So that's, you know, that's a fairly, you know, it's not a big deal. But taking the context of clinical trials, many would probably take data calls diskette. These kind of delays are almost routine as much rarer to end up going on trial. Like, you know, like there are cases where things go better than expected like the culvert 19 vaccine trials. And 2020 went better expected, unfortunately, because, at culvert 19 was so prevalent. But in most clinical trials, it's the opposite problem that we're usually worried about. And we can see that the, you know, even if we look at the kind of 95% range for the study duration, you know, it's really between 32.3. And up to 35.7. So that's not quite it's arranged that nearly it's all about 30 by significant amount bullets, You know. It's so certainly, it's hard to believe that they still had the trial is going right now. We're more likely to end up getting where we want it to be initially.
And, reality, we go to the enrollment prediction plot. You can see that there was a little bit of a kind of slowing down in recent months, but we were using kind of the long term trend. So, you know, if this slope but, decrease here was actually, going forward, then we would actually need to be like, more aggressive in terms of getting more sites back open, and stuff like that. But we can see that the events prediction and stuff, this is kind of mostly fine. And then it dropped my prediction is kind of why the droplets relatively rare, so we're not getting much information there.
Right, so the logical thing to do is, like, well, OK, if we play around with, but the kind of things that, the degrees of freedom that we have, is, there any way we can get this trial back on track. Could we do it like an ad hoc sample size increase? Because, remember, they like that. Like, that's not what our issues.
And certainly, you would have to go through the regulator on your DMC to do this book. Cause you could potentially bias your trial, because if you're adding people to your trial, what that could mean is that if the event processes more aggressive in the initial stages. So they like the different seed to control. The trip group is more obvious at early stages of disease. Then you could be making your trial more optimistic than actual years. Whereas if you were to choose to say, increase the length of the time, to wait for the doctor 7775, 2 events occur, you might be biasing more towards the latter end of the range. There's a very good paper, a frightening corn, about, like, sample size increases in the context of clinical survival trials, and how this could potentially biased things. But, for now, we'll assume that you have those degrees of freedom, and that you're gonna focus on, Perhaps, increasing the sample size a little base.
And, you know, I've kinda pre done a calculation here that would perhaps solve the problems. So let us assume that they were going to be more aggressive in recruitment, going forward. Would open up a bunch of new sites, get people on board very quickly. We kind of had these and reserve.
Sit in the recruitment rate for the final, well effectively, in this case, four months or something. We're going to increase by about 50% over the original, like, 20, 20 per month that we had initially, And then we're going to increase the targets appetite from 460 to 540. So, we're going to increase the sample size requirement for all by, by by 80, basically.
Remember, our initial target sample size from the calculation was 460. We're not jumping this up a bit to kind of reflect what we want, We're going to assume that this event process, we don't really, we can't really change the event process, since that's the entire point of the trial, is to understand this, to, you know, get an accurate inference about this. So, we're using degrees of freedom that we potentially could hop, which is increasing the sample size, be more aggressive in recruitment, and we're seeing if that could potentially get our trial back on track.
And, obviously, this is pre-dawn calculation. We go to a lot of scenarios, see how they each do. But for this particular scenario, if we were to do this kind of very aggressive plan and get that approved by the regulator, because they see lucky, perhaps they believe that the event rate or the difference between the two groups is constant over time, which is consistent with the exponential assumption. Then in that case, it's not as problematic to do something like this from a statistical inference point of view.
And you can see here that if we did this, then, we would end up getting a total study like that ends up being more representative of all we had initially. So it ends up being around at 29.97. And the accrual duration only had to be increased by about 1.5 months to achieve that. So, you know, we've recruited 80 extra people. Obviously, not a trivial matter. We've had to be more aggressive to get more people to try quicker. Obviously, that's incredibly on trivial, that it very much depends on the context of your trial. But if you were able to do that, then you could have got this trial back on track effectively. You did, or at least you would expect it to be back on track. And now you have a plan of action for what you're going to do going forward.
And so, hopefully, that gives you an idea of where group sequential design and prediction can interact.
So, if you want to know the details of the prediction, delighted there, you can see this on this slide here. But obviously, I kind of inferred most of it from the original group centered sign and just kinda rounded some stuff up like, the sample size calculation will train are 35 This is actually 374 using the method that they use. Like, they rounded their events instead of using the decimal values. And that's how you get the slight differences here, but in practical sense, that they're more or less the same.
And, you know, if we had more time, I could go over the effect of sites and blind differences unblinded, but I've covered that a lot more in previous webinars. So, if you're interested, feel free to go to that. There's a link at the end of the slides where you can find all previous webinars that I talk a lot more of this example from a prediction point of view, Side of conduct you can design, in that webinar.
So for our next section, we can look at sample size for prediction models. So we're gonna look specifically at we're talking here is clinical prediction models because clinical prediction models are basically, these models are developed usually using demographic or, you know, older biomarker type data to create models that can help you make better risk. Evaluation is better probability, assessments of likely diagnostic or prognostic purposes. So diagnostic obviously means like we're trying to find out if you have the disease or not. So using all these characteristics, can we predict or give a probability that you have to disease? Or prognostic is given. These risk factors are other considerations. How likely is it that you're going to get to disease in the future?
Like, how lucky you are going to have something happened to you going forward. So, you know, more or less, they're kind of doing similar things, but, obviously, they can have quite different considerations, and, obviously, in one, case, and diagnostic, you can get that information quite quickly. Because you could do the gold standard test and find out whether they actually do have the disease or not. Whereas, for prognostic, you'll need to follow up a cohort for a period of time and actually find out how, well the model data on people after, you know, follow up and kind of use that to infer whether the prognosis, the prognostic predictions were actually accurate enough.
And the important thing is, these models is that you don't just want a model that has kind of high performance metrics, like our R squared and stuff like that. You wanted to also have good discrimination, and good calibration. So discrimination is basically the ability of the model to push disease people higher than non disease people, and assuming that, like, higher means more, likely to have to disease, and whatever the scale of interest is, whether it be a probability or some kind of risk score. So that's discrimination. We want to basically be able to distinguish the two groups from each other or the multiple groups from each other. That are of interest for our diagnostic or productive purposes, but calibration is also very important, because we want to see that how representative were the i-pod, or particularly the probability output of our model, to what actually happened.
So you could have a model that's good to discrimination that always puts the disease people higher than the other people. But in terms of the probably gave that you actually have the disease is a fair way off. That's obviously more problematic when you apply the onto the population level. You're maybe assigning more people to the disease group than they should be. And that's where you want calibration to be well in calibration is probably the most difficult part of developing a prediction model they can take. Anyone can put in data. I get an R squared out. I think most people can understand, like, here's what the scores on average were for disease people versus non disease people, where they hire for disease people, great. That's got decent discrimination. Or like you could do, like a 2 by 2 table.
Where as calibration, you need to kind of map on the kind of the kind of statistical inflict a statistical model, the probabilities onto what actually happened and then see whether that was accurate to kind of at its best. That's got its most fundamental level.
At the kind of lowest level of prediction or i-pod available from the model, and, you know, prediction models have had a lot of success. So the Wells Gore key risks nottingham prognostic score, these are widely used and they're growing ever more popular.
Every day, there are more and more papers preferring new prediction models for new disease types across any area that you can think of. You know, from cardiology to psychology at, to, you know, social studies. Basically, anywhere that prediction can happen, it's expected to happen. And obviously this kind of follows on from the huge success that we've seen for prediction. That's a predictive analytics, machine learning area, such as that, in the context, obviously, of, you know, information, technology, social media, areas like this. Where, you know, deeds have been hugely successful in generating a lot of value, a lot of shareholder and financial value in those areas. And inevitably, there's a question of, well, if these are so good at creating these incredibly sophisticated prediction models in the context of ads and stuff like that, then why wouldn't we use this same type of approach for medical decision making?
Why wouldn't we use the day, all this data we have available, all these medical databases, you know, from our various national authorities. And use this to make better predictions, better tools to quickly find out whether, how much, someone, who's at risk, or how likely they actually have the disease without having to do, you know.
More invasive procedures, which, you know, gold standard tests often are.
So, you know, we've seen this huge growth in the developments of these models, boss, it has to be well noted for for for several years now that the quality of the models has not quite been where you would want it to be. So, a lot of models are generated, but not much time is being spent. And making sure that these models actually stand up to scrutiny in real-world clinical settings. And there's been a lot of work from a lot of great people on creating frameworks to hopefully create higher quality prediction models, answer in particular this. There's this tripod statement that was made at a 3 or 4 years ago At this point. They gave a very good rundown of everything that you should put down basically a checklist of I think 21 items that you should be able to specify and scribe.
two, basically like give you a good description of what you're, like, how you're going to set up your study, Who you're going to be using for your study. What are the success criteria for your study? How are you going to calibrate today? Use discrimination? How are you going to publish your results? Are you going to make sure your results are all public? You know, transparency.
And it's just a situation where, you know, that's been designed to hopefully give a easy to use framework for people to quickly, while not the people to generate these models in a principled fashion, which means that the probability that a kind of erroneous clinical prediction model looks better than expected, doesn't it's less likely to get to get through the gateway.
So, if we're talking about issues like that, they focus on, they talk about selection bias, like the way that you like, which subjects you pick, or which wouldn't use greenhouse, or select, has a huge effect on high representative, that clinical prediction model might be in the real-world. There's very little validation of the model, which is like, where you see how well the model performs in either a sample problem.
So basically, you take, you know, you have 100 like, you have a thousand people, you develop your prediction models, let's see how it does for 100 actual patients going forward, it hurts a prognostic quality, Very little of that happened, very little validation and very poor validation even when it does happen. The reporting of these models, in terms of, like, what, like, been able to replicate them, and why, and it's actually generating the model. It's very poor. And then, there's this other kind of standards, digital problems and stuff. Like that caught a mania, taking continuous measures, turning them into binary measures, Like, basically, throwing away a lot of useful information that can be used for prediction.
And, you know, when we're talking about prediction, we're talking about this more practical. I pour it already.
Worried as much about the inferential statistics, like the P Values and the Comfortable. We're worried about, Well, how well does this model discriminate?
No Disease. People from non disease. People are for people who are likely to disease from people who are not likely to leave at high. Well, dude, like the probability that come out of it, cars for certain, for individual subject correspond to highly likely that person actually wants to get the disease by taking the kind of your cohort of self similar people to subgroup and seeing how they actually perform like, how many of them actually have disease. And, you know, a very recent example was the culvert 19 prediction models, both prognostic and diagnostic. There was a review done by Wetlands, 232 Mountains were found, 226 of them were found to be at high risk of bias, and zero of them were felt validated.
So, there's a lot work papers generated on the creation of prediction models for cover 19.
The quality of that work is highly questionable based on this research based on this study.
So, you know, this is definitely happening here, bought what kind of low hanging fruit of this area. Like, you know, dealing with selection bias, It's kinda complex. You need to think about epidemiology. Think like Confounding Collider buys stuff like that. Validation requires you to have access to a whole suite of new information. New subjects actually try your model on poor reporting is low hanging fruit as well. And dichotomy is appalling ****, but so many things are take more effort to do.
one simple thing that researchers could be doing to improve, like, their ability to know whether their model is worth doing, or we're trying, based on the amount of information that they have is to do a formal sample size calculation, But unlike traditional sample size calculations are based on statistical criteria, such as power, Infant like interval with these are kind of inferential statistics. If we're talking about a prediction models, we're talking about these discrimination and calibration and model performance criteria. Then, we can just, you know, they're still success criteria. We can still get reasonable ideas of how likely we are to fly and get to a minimum level of the success criteria using simulation using sample size calculations. Rather than using what is right now, which is these ad hoc rule, it's like 10 events per variable.
So if you imagine a binary model, and, you know, you either yes or no, disease, not disease, then you need 10 disease people per variable, is the kind of rule of thumb. And that was found to be wholly inadequate, for the actual purposes, that date that the authors claim that it should be used for. And that was a lot of good work done by people like Martin Funds Meet. And, in terms of elucidating, why does.
So in doing some calculations that are more accurately represent that so that people like Riley ..., they've been really involved in generating a lot of sample size calculations that actually target the things you should be targeting if you want an effective prediction model. Now, to be clear, we're focusing here on prediction models that use kind of more, I suppose, traditional multivariable, GLM regression type models. So, linear logistics, my binary data clocks, I'm talking about survival. Like we're talking about here, you're kind of standard tools that you're very familiar with statistics, but rather than using these four P values and an inference, although, obviously, can still do that if you want to, we're focusing here on that performance in terms of them being able to, I've put a probability of survival that probably beat disease, like get probability of getting a score, like a, like, a weight score, like their weight or something like that.
Like, How good is this model at making predictions about future samples, based on the multi variable model that we created, using all of the variables that we know about age and gender, and deal with starting, wait, TO socioeconomic factors, et cetera, et cetera.
And these authors came up with four important criteria that should all be fulfilled if you want to have a good prediction model, or at least have a higher chance of having a good prediction models. And they thought these were the mean performance, which basically just like, how well does this do for the average person, so, here's the, like the average person, how well does this model do for them. And you can find, you know, relatively easy measures, such as the margin of error, like your comps intervals, to assess that individual performance, is like an attempt to assess, Well, how well does it do over the whole suite of individuals who are subjects, You know, not just how well it does for the average person. But for the people who are like out on the extremes, who are more unusual, like, you could even go into subgroup type stuff if you're going further, but, you know, basically, as well, to do over the entirety of it. And in this case, we're looking at things like the absolute percentage error or in the case of continuous data to residual variance.
Just some measure of How will how well overall this study due to face each individual's in aggregate effectively. And obviously you could in some cases, look at certain subgroups individually and see a certain subgroups hold up, particularly if those have to be more important, but we're focusing more on the macro level individual performance.
And then there's two other concepts that we think that we need to think about overfitting. I don't think the optimism adjusted optimism adjustment effectively. So overfitting is basically just dealing with the fact that when we're, you know, when we're talking about our model, which is done on the data, we have available. You'll inevitably, there's gotta be some overfitting to what's in that actual dataset. So you know, our model will almost certainly work best on the dataset that the model was built on, compared to other datasets. Like, I think, logically, that makes sense, But you can use a shrinkage or ... to effectively, you know, like, there are various ways you can do this, in terms of, like, we're moving variables and stuff like that. But basically, just you can, you can, if you know that a priori you can approach, you can apply an appropriate factor of shrinkage kind of bring this back towards. Kinda mean values can.
I not get too much pulled by kind of like outliers and stuff like that and create a model which is more likely to represent the kind of better average performance and out of sample invalidation samples, for example, or in the actual real-world. And so you know, this is something like overfitting is highly common, it's highly problematic. And therefore, you can at least make some attempt to deal with the effective overfitting dealing with your kind of optimism. And then as well, by basically making adjustments a priority for your model rather than just kind of playing by ear and saying, well, this model, like it's a bit over, its overfitting, it's a bit optimistic. It's fine. We'll just kind of deal with that. You can actually deal with that. I consider that, at the sample size, and the planning stage of your development, of the prediction model.
And you could do that using the optimism adjusted R squared. And there's also more methods proposed for sample size, for updating your model, like whatsapp allows you to update your model once you have new data available, like the external validation of the model. Like, how many people do I need my external validation set, my out of sample set to validate my model? There's way more work going on in this area. It's a very active area of research. Very interesting. So, if you're interested, certainly, look at that.
Just to say here on the right, there's these equations are for the kind of four criteria for the logistic or binary model. For the margin of error. The mean absolute percentage error this equation is, kinda just take it from a simulation study to kinda get this at the overfitting sample size requirements, you can see that, or squared here, this, this is kind of your ... squared. And then you have your optimism adjusted shrinkage, which basically is just inserted into this equation to get the D N required for your optimism adjustments.
So, I'm going to take a very brief example here of calculating sample size from one of the kind of foundational ... papers in this area. From Riley and Davis. He took another paper by Huda, where they developed the prediction model for new, too busy predicting fat, free mass. They had a variety of, like, they, you know, they had a target, or squared, or adjusted R squared of 90%. And, you know, they expected amine or 26.7 kilograms and put seven standard deviation. And you can see that the for their model. They hope to have 20 predictors to accurately predict watcher.
I was going to be on the scale of fat, free mass in children from 4 to 15.
And they saw that, for this case, you would need 254 people to develop a model that they feel would be likely to be useful for making predictions in this domain.
So in terms of how this works in egg crate, there's not really much to go on here.
Because, effectively, this is a very simple nQuery table. So if we go to NQuery, and we go to the compute, minimum sample size here, or to 15. So just to note, this isn't the Taleb and nQuery right now, this will be included in our 9.1 update which we're expecting to be released at the end of November. So this isn't available in the sulfur right now, that's the disclaimer. But it's going to be available in approximately A month is a preview build subpart. Obviously just to kind of get you interested in what we'll be doing in our next update, which will be focusing on most of the standard nQuery tables, opposed to nine point though, our recent update, which focused on the predict module, which we used here previously.
So because it's a relatively simple, we'll kinda just kinda quickly go through it. So we need are adjusted R squared. So this isn't the raw or squared that we expect from the model is an adjusted R squared. We can kind of see what that means here on the right-hand side. So the adjusted R squared of the ... squared, which the case of the linear case, it's really just the normal R squared. It's just the percentage of the variation explained by the model. So in this case, we're just using the adjusted R squared.
We have 20 candidate parameters. So just to mention here that we're using approaches that might remove some of these parameters if they happen to be not very useful for making our assessment.
We have our target level of shrinkage, which is zero point nine.
We have our Model intercept, which is equivalent to the mean value in this case. But, that's equal to 26.7. So the intercept and the mean, or, basically, the same thing.
In this case, with a standard deviation of 8.7, then you can see that we need this multiples of margin of error. This is set to 1.1. This is just kind of a common default suggested by the additional authors. So you can see the definition here. It's the maximum margin of error using the estimation of the interest at the residual standard deviation.
So 1.1 or 10% would be the the most commonly used value for this.
You can see here, the target level of shrinkage is used to adjust predictor effects to account for overfitting.
So, you can see here that we end up getting 254, which is what we saw from the original paper. And you can see that the four criteria that we used are explained here as a word. My slides, like here's the Cry For Criteria, are busy taking the maximum sample size from these four calculations. So, if we very quickly.
Go back to our slide, go back a slide here.
There's four equations, these are for logistic, but they're quite similar for linear regression. You know, we have 4, 4 criteria. We're going to do all four of the sample size calculations and the one which has the maximum sample size is the one that we end up actually using. For our purposes, in this case, I believe it's the fourth criteria: the optimism adjusted sample size. The shrinkage won't actually end up at the same samples as well. And you can see here that's around 12.7 or 8 and 13 people per predictor, which you can see it's a little bit about the kind of 10 per predictor cavalry will use that kind of traditionally albeit data for logistic regression.
But I assume it's kind of similar for linear because people have kind of transpose that the other models erroneously. You see, the actual level of shrinkage is a little bit higher than the target level of shrinkage.
But, not a disaster to see so, and you can see that the, that, me, like, the comfortable, for the mean.
For the intercept slash mean, is around 0 point 26.58 to 26.82, a relatively tight margin of error, I would say there. So, you would probably be happy with the performance here, for our sample size of 254, And you can see here that we get a full I push, kind of, summary statement of what happened here, as well.
And nQuery, there will also be similar tools for the case of binary outcomes with logistic regression, you know, slightly different inputs here, but the principles are more or less the same. And then, for the survival *** regression model, as well as similar ideas going on here. And each one has its own distinct ideas avoidable bought the overall principles, remain the same, across these unsurprisingly, considering that these are within the, you know, the GLM, the Generalized Linear Model framework of super kind of all familiar with that modeling approach, to these types of problems these days. So, we'll probably, like, this will probably be talked about in future Webinars. So, if you're interested in this topic, please get in touch: And please let me know. I will be happy.
I'll be happy to let myself, or some of my colleagues, to be doing future webinars, know that we may want to look into this, this subject in depth in future. So, this is just kind of a taste, a preview of what's coming up in an upcoming release. I just thought it'd be interesting and kind of fits with the other predictions stuff that we talked about earlier, This kind of area where, you know, sample size. The idea is a sample size and prediction are intersecting and kind of finding overlap here In ways that are, which are I find interesting. And even though these two areas have kind of, you know, maybe upfront, you may not considered to be natural partner for each other.
So, to finish up sample size and prediction, these are important aspects of clinical trials and its areas of interaction, despite their different kind of objectives. And perspectives. Milestone prediction provides a practical view on when interim analysis, for example, are likely to happen or when the study is likely to happen. enroll in prediction, kannapolis find when interim analysis is going to happen. For example, if we're looking at mean, or the mean data or are binary data. But if we're looking at survival data, we need to go one level at additional level, look at event prediction, which builds on the enrollment. Prediction builds on having to model stuff, like dropout and stuff. Now, we're adding in the effect of, like, like the actual survival process itself.
So, there's a lot of additional modeling needed, and we also saw that their sample size methods now being developed for the production of clinical prediction model, It's part of a much wider sweet of, you know, guidance has been pulled out there, to help create a more robust clinical prediction models, which have higher probability of actually succeeding in real-world clinical scenarios. Given how much, how many of them are failing, right? That would cover 19 clinical model prediction models in an obvious failure case, it feels like, at the moment. So we're talking about these important model performance criteria. And kind of making sure that these reached the thresholds of interest, just dimension that there are, like we only talked about here, but suppose the model performance criteria, Like, how well we want this model to do.
But if we're talking about, like, say, validating the model, there are methods available for that, and, you know, as its area expands, I imagine there's a lot more sample size calculations, and look more specifically, things like discrimination versus calibration and stuff like that. But for now, focusing on these kind of model performance criteria on kind of criteria to help us to kind of make it more robust against overfitting, and over optimism, And can I get one that's probably more representative upon? That will actually work in the real world? And not just hasn't just been kind of perfectly tuned to know, perfectly predict, the data we already have packet itself, which isn't a very useful solution.
So if you have any questions after this webinar, feel free to get in touch at info at stock cells dot com.
And if you want to try and create, if you don't have access to nQuery, or you don't have access to one of the additional tiers of nQuery, like adaptive design, feel free to go to ... dot com forward slash trial where you could try an nQuery for free for 14 days using just your e-mail address. Nothing more. And you'll be able to use it within your browser while you're doing that. But, as I mentioned, if you want to see previous webinars or previous training, or other videos that I've done, or all my colleagues have done, you can go to ... dot com, forward slash start. I just mentioned that all the references used for this, that, this demonstration, are available at the end of this slide deck that will be sent to you after, very briefly after this webinars, please.
I will be answering your questions via e-mail. Hopefully, you sign up for future webinars, and I hope you have a good day. So thank you so much, and goodbye.