Engineering the Emergency Room, Part III.

26 April 2012

In the past two posts on this subject, we talked about why emergency rooms are systems and described how to build the model of the system. Today’s post will focus on how to validate the model, and perform experiments with it. Because having this nice shiny model is wonderful, but how do we know it works? That is, how can we be sure that we’ve accurately captured the real world?

Determining this is called the validation process. Validation is the least well understood and performed aspect of discrete event simulation model building. To the extent that there are papers, published in fine journals, which do not even discuss the validation in passing. This is inexcusable. Every publication needs to have a comment on the validation of a discrete event model, so that other interested parties can determine how effective the researchers have been at capturing the system. If they haven’t done it well, then there’s no reason to believe the outcomes of their model.

Unfortunately, there’re no best practices. There are not even any basic standards. And in the medical world, we frequently have physicians and undergraduates using very sophisticated tools and coming to important conclusions with unvalidated or badly validated models. Separating those results from good results driven by experienced engineers can be extraordinarily difficult. So, I am working on developing standards (with a physician collaborator), for what constitutes a well-validated model.

First, there are three basic forms of validation: (1) face validity, (2) internal validity, and (3) external validity. Face validity is essentially just the eyeball test. You show the model to the people who work in the system, you show them what it does, how it works, and point out which little icon represents them, and ask them if the system looks right. There are many papers in which, if validation is mentioned at all, this is the sole method. It’s better than nothing, but totally insufficient. The second form of validation in the model’s internal validation. This answers the question: “does our model do what we think it does?” This is accomplished in two basic ways: code review, going through line by line and debugging and making sure that everything flows in the model the same way that it does in the flowcharts developed last time; and secondly, by performing system stress tests.

System stress tests consist of overwhelming the system. Discrete event simulation basically model complicated queueing systems. Not exclusively, but largely. And an emergency room is essentially just a very complicated queueing system. There are a large number of queues, in both series and parallel, which interact in often non-obvious ways. And any queueing system can be forced into instability* by overwhelming it with arrivals. But, for internal validity, all those entities who succeed in making it to the head of the queue should be processed as normally, without deviations in flow.

*For a queueing system, instability is defined along similar lines to other engineering systems. Engineering systems are generally described as “unstable” if the output (or some internal state) of the system in unbounded over time. For queueing systems, this means the actual queue, the line for service, increases without bound as time goes on.

Lastly, there is external validity. This is the really crucial element. This is the element that answers: do the outputs and performance measures of our simulation match the real world outputs under similar input conditions? So, for an emergency room simulation, this means we generate an arrival stream of patients which has identically distributed characteristics as the real-world arrival stream (in terms of severity, laboratory/imaging needs, frequency of arrivals, etc.), and then measure various outputs and determine if they are distributed the same as the real world. I believe that this should be done with traditional statistical hypothesis testing. So, for example, take the time in the system for simulated patients and real-world patients, and perform t-tests to determine if they belong to the same population. We generally, in these circumstances, will be seeking non-significance.

Another way of looking at external validity it to look at interstitial queues and compare them to the real-world queues. So for example, look at the number of people in the waiting room who are post-triage but pre-ER bed. The simulation should not have this time prescribed. It should be a consequence, as in the real world, of system dynamics. Then, compare the hourly census of these patients to the hourly census of the entities in the simulation. These curves should match up well. However, there’s no definition of “match up well”, and I’m not sure I can propose one now. In fact, I’m sure I can’t.

One basic aspect of simulation dynamics vs. real-world dynamics is that the simulation is necessarily a simplification. There’s no way to capture every aspect of the real world system. So we basically try to capture 90%-95% of the processes, all of the big common and time-consuming ones, and accept that there’s no way to anticipate or characterize everything that’s going to happen in such a complex system. This is also true of climate science, economics, and social systems design. We simply cannot model all the factors. We’d need to model the whole universe at the subatomic level. It can’t be done. So we make deliberate, intelligent, justified, exclusions.

And so, discrete event simulations of human interactive Hybrid Dynamic Systems tend to (though do not universally) outperform their real-world counterparts. And that’s fine. As long as they consistently, reliably, and predictably outperform them, they are still valid. So, if my simulated entities are reliably prosecuted 8% faster than the real-world patients, that still allows me to draw conclusions about the real world. And if I make a change to the simulation, which induces a significant change in the distribution of my entity outputs, then I can postulate with reasonable confidence that the same change in the real world would result in the same change in the real-world distributions, modulated by that 8% discrepancy.

So. That’s a description of how to validate an emergency room simulation. I had been planning to write about experimentation today as well, but I think we’ve covered enough for one day. Up next: how do we conduct experiments using this tool? And for the love of god, WHY?

One Comment leave one →

Trackbacks

Engineering the Emergency Room, Part IV « Infactorium

Sober Science.

Engineering the Emergency Room, Part III.

Trackbacks

Leave a comment Cancel reply

Feed

Ontology

Library

Administrative

Sober Science.

Engineering the Emergency Room, Part III.

Social:

Related

Trackbacks

Leave a comment Cancel reply

Feed

Ontology

Library

Administrative