Engineering the Emergency Room, Part IV
So, I think we’ve come to the final installment here. In part one we talked about what a system is, and why an emergency room (ER) is one. In part two we described how to actually build a simulation of an ER, and in part three we discussed the validation of the model. Today, I’m going to describe how to do simulated experiments, and relate that information back to the real world.
So, if we were careful about how we built our model, and we built a fairly finely grained one, and we validated it well, then we can perform basic experimentation. Here we again have a problem with how it is done in the literature a lot of the time, because frequently, these systems aren’t used for true scientific experimentation. They’re just deployed, people fiddle around with the parameters, and then report the results of the fiddling. That’s fun, and can even occasionally be informative, but it isn’t scientific experimentation.
To do true experimentation, we need to think like scientists instead of engineers. First, we sequester one copy of our simulation, and use it as a control. We need to then formulate a hypothesis about our system, and determine an output metric which is both informative, and reproducible with the simulation we have. It won’t do to use, for example, medical error rates for our output measure unless they, and some means of influencing them, are explicitly coded into the simulation. In some cases, people get this wrong. They do things like report medical error rates as their outcome, when the simulation actually measures things like duration of average required shift for physicians as a proxy (tired people make more mistakes).
So, a well-formed hypothesis might look something like this: increasing the number of treatment beds in the ER will reduce overall patient time in system by 10%. Then, we take our sequestered control simulation, and run the simulation with varying random number seeds a large number of times. I can’t give you a particular number that is sufficient, because that’s going to depend on the dynamics of the system. In a small tertiary facility, that doesn’t see a lot of patients, and has a narrowly distributed time-in-system, it might be reasonable to run the simulation only 25 times. With a large facility with very diverse patients, more than 100 might be required.
But wait! I hope some of you have spotted some problems already. Not all simulation runs, even of the control, are created equal. First of all, there’s the problem that when the simulation begins, the ER is empty. This rarely happens in real life. therefore, generally simulations are given a burn-in period, during which no data is collected, to ensure that the system is operating at steady state rather than in the midst of a transient response. For ER simulations, three days is generally sufficient. Second of all, the need to decide how long a simulation run lasts. Do we simulate a week of ER activity? A month? A year? There’s no hard and fast answer. But generally, as in life, longer simulations will produce smoother results. If I run a simulation 100 times for a year, I’m going to get a much nicer distribution of outcome measures than if I run the same simulation 100 times for a week. Because random and rare phenomena have a smaller impact on the output.
So. We have a hypothesis, we decide how long to burn-in, and how long to simulate (often defined as the same length of time as the duration of data taking), and how many instantiations of the simulation to do. We assiduously record all of this data (science!) and then conduct out experimental group. We modify a different copy of the simulation, and perform the same simulation runs, same burn-in, same duration of simulated time, and record that data.
So, now, we have three groups of data. The original real-world data, the simulated control data, and the simulated experimental data. Now, we can do two hypothesis tests (we should already have validated the simulation against the real-world data). We compare the simulated experimental data to both the real-world data and the control data. Hopefully, we will find a statistically significant reduction in patient length of stay at or near the level we hypothesized.
Now, this is where the vast, vast majority of papers in this field end. Because it is generally, as I’ve discussed, impossible to conduct the experiments in the real world. And even when we have a strong, evidence-based recommendation, as the above result would be, it’s often hard to get buy in from administration and staff to make the changes in the real world exactly as they were done in simulation, in order to achieve the same results. Generally speaking, when administration agrees to make the difficult and arduous step of intervening with a complex system like an ER, they want to do all their changes at once, so as to disrupt the system for as short a time as possible. This can make it hard to check our results.
So, why not just simulate the whole shebang of potential changes that the administration is interested in implementing? Why not rearrange the whole simulation, and examine the new system entirely? Well, you can. But I am much more skeptical of the results. Because these systems are very complex. And making changes to the simulation always introduces some error. The real-world intervention is never going to look precisely like the simulated intervention. And so, when we make a large number of interventions at once, we accumulate small errors, and introduce new system behaviors which may or may not be replicated in the real world. These simulations are generally very accurate for predicting the consequences of changes in isolation. They are generally less good at determining the results of wholesale reorganization.
So there. Dr24hours’s four-part primer on engineering the emergency room. I hope I haven’t wasted a lot of your time! If you have any questions, leave them in the comments, and I’ll get around to it eventually. Probably!
I would like a case study where a model suggested room for improvement, an ER took it up, and it succeeded or failed. Case studies FTW.
I would also like to state that bronironi was simply a word I made up when first asked to create a gravatar. Since I didn’t know how it worked yet, I didn’t want my name there. Now it never lets me change it and I die a bit of mortification with every post. Also, created it years before bronies existed.
I recently published the very first (to my knowledge) paper on exactly that.
That was the first paper on that? SERIOUSLY? How is it possible that no one is doing this sort of stuff and publishing it?
For exactly the reasons in the post: it’s very hard to directly compare the real world pre- and post- intervention to the simulation pre- and post- intervention. It’s hard to get the parameters exactly the same. Also, most simulations being produced out there suck balls. I’ll start my post-pub peer review feature soon.
but that’s science, isn’t it? I mean, maybe this is a social science/engineering debacle. It’s really hard to get good data when you’re dealing with people. Social scientists are used to that – we do with it what we can and acknowledge the shortcomings (in archaeology it is worse because we don’t know what we’re missing, or how much we’re missing). But we’re okay with that. You have to start somewhere – so you have to show the world what you did do, even if it’s not ideal. Otherwise we’re never going to get anywhere.
Sure. And there are other things in the pipeline. I know U Kentucky-Chandler hospital did an intervention based on DES. I hope they publish the results.
As a silly virologist with a public health background, I’m reading these posts with interest, but I feel like I don’t really have all the tools I need to understand them. Can we back up a bit? Can you explain simulations a little better? All I’m getting is that it’s a math problem that you chuck information at.
Gotcha. Post upcoming next week about what is Discrete Event Simulation, and what it’s for. Maybe with an example!