We present two case studies here. The first is another application in medical records where we are studying the length of stay of patients in the hospital. We also report a second case study on graduate student academic records. Both of these studies focus on searching for specific groups of records, and comparing among different groups to evaluate the analysts' hypotheses, although the medical scenario is more mature.
4.1 Heparin-Induced Thrombocytopenia
Thrombocytopenia is a medical condition in which the platelet count in blood stream is low. Heparin, a drug used as an anticoagulant, is known to cause this adverse side effect in 0.5-10% of patients. Heparin-induced thrombocytopenia (HIT) is characterized by a sharp (usually greater than 50%) drop of platelet counts within 5 to 9 days after the first administration of heparin. However, not all drops indicate HIT. Lowered platelet count can simply be the normal side effects of heparin. To ascertain whether a patient has HIT, an additional test called HIT antibody test is ordered. Unfortunately, the HIT antibody test has high sensitivity but low specificity. When the test returns negative, then the patient is likely not to have HIT (98% accuracy). But when the result is positive, the test is only about 25% accurate. In clinical care, a hospital does not have the luxury to spend an additional 5-7 days to perform a more accurate test. Instead, patients whose HIT antibody test returns true are treated as if they have HIT. A recent medical study on 22 patients showed that a hospital treating 50 HIT patients a year can incur $70,000 to $1,000,000 and each patient can increase length of hospital stay by at least 14.5 days . These patients can increase the financial cost and stretch resources of a healthcare facility.
Our physician partners at Washington Hospital Center are interested in verifying these results and see if clinical care data differ from the study. In particular, they want to focus on the patients who have been admitted to the intensive care units (ICU). Our collaborators could query for the relevant data in their medical database and perform the analysis that way with the help of the database administrator. However, they would rather see the temporal ordering of these events and interactively narrow the data down because in order to determine whether a patient actually has HIT, the temporal ordering of events and the temporal constraints are important.
Over a period of one and half months, the developer (Wang) and an additional case study observer (Plaisant) visited Washington Hospital Center three times to meet with the physician collaborators (Mukherjee, Smith) and the database administrator (Roseman). Each meeting lasted approximately two hours. Much work had been devoted to understand the medical database and to clean it up to make it suitable for this case study. Over this period of time, Roseman and Wang worked via email to obtain de-identified medical data, and converted them into a format Lifelines2 accepts. Microsoft Amalga  was the clinical information system that served as the data source for the de-identified data on which the heparin-induced thrombocytopenia case study was conducted. Its data-centric architecture enabled relatively easy extraction of the requisite dataset. Over all, there were over 30 emails exchanged between the developer and the collaborators to discuss the topic, to refine or to add data, and to decide on the logistics (when and where to meet, etc.). When all of us met face-to-face, we spent most of the time exploring the data using Lifelines2 together. In the following exposition, "we" is used to include everyone involved in the case study.
We obtained de-identified data on all 841 patients who visited the hospital and had a HIT test for the calendar year of 2008. For each patient, we have the medical designation of platelet counts in categories (High, Normal, Low, Critical), HIT test results (Positive, Negative, Borderline), administration of any of the 9 heparin variants, admission and release from ICUs, and discharge code from the hospital (Dead or Alive). The categories were further preprocessed to include higher level categories. For example, platelet Normal and High events are considered the same in this investigation, so we created the new category High/Normal (while also keeping the existing High and Normal categories just in case) to facilitate our exploration.
Fig. 6. The temporal summaries show discharge patterns (Discharged Alive in green, Discharged Dead in black) aligned by the first admission to ICU. In (a), the raw count of event are shown, but the large disparity in number of patients between the two groups makes it hard to compare. In (b) the counts are normalized by the number of patients. It is clear to see that patients in ICU Hep HIT+ tended to stay longer in the hospital than those in ICU Hep No HIT+, where over 80% of patients in were discharged within 1 month.
Previous | View All | Next
Fig. 7. Normalized hospital discharge data aligned by first admission to ICU from 4 groups are compared here Discharged Alive in green, Discharged Dead in black). Each group is a subset of the one above it, and a "closer" approximation to true HIT patients. We hypothesized that the closer approximation, the more stretched the discharge pattern should be. The first three seem to support our hypothesis, but not the last one—we see that there are far more Discharged Dead than the others in the first month, and this may be skewing the data.
Previous | View All | Next
From the original dataset, we filtered to find patients who were admitted to the ICU and also had exposure to heparin. This ICU-HEP group has 450 patients. Then we applied another filter, to divide this new group into two subgroups: the 93 patients who had a HIT positive test (ICU-HEP-HIT+), and the 357 who had not (ICUHEP-NoHIT+). The hypothesis is that there is a difference in the length of hospital stay between those who may have HIT, and those who almost certainly do not have HIT. We aligned the patients by their first admission to ICU, and compared these two groups against each other to see if there are noticeable differences in the distribution of discharge events (Figure 6 (a)).
The large difference in patient numbers (93 vs. 357) makes comparison of raw counts meaningless and also impossible for our physician partners to detect trends. We normalized the counts by selecting "Events (Normalized by Records)". In the new summaries, the counts are normalized by the number of patients in each group, and the bar heights are normalized across the two summaries for direct visual comparison (Figure 6 (b)). It was easy for our collaborators to recognize that the discharge distribution in ICU-Hep-HIT+ looked more stretched out (wider and shorter), indicating that the patients tended to stay in the hospital longer when they had a positive HIT antibody test result.
We further created more subgroups from ICU-Hep-HIT+, ones that approximated the ideal temporal orderings of HIT patients closer. Our physician partners hypothesized that by narrowing down to patients who are more likely to actually have HIT, the discharge patterns in temporal summaries may stretch even more. First, we used the sequence filter to find those who had never had a Platelet Low/Critical, followed by a Platelet Normal/High, followed by any type of Heparin, followed by Platelet Low/Critical, and finally followed by a HIT Positive test. The filter identifies only patients who had only normal levels of platelets up until they were exposed to heparin, after which they experienced a drop in platelet, and a HIT positive result was returned. This group is named Sequence, and contains 63 patients. From Sequence, we then selected, via temporal range filter in the temporal summaries, only those who had HIT Positive results within 5-9 days after their first exposure to Heparin. This 5-9 Day group has 20 patients.
The hypothesis is that as we used more stringent filters to create patient groups that better approximate the true group of patients who have HIT, we expected see the discharge pattern to be more and more spread out. The comparison of these 4 groups in Figure 7 showed that while that seems to be the general trend in the first three groups, the last 5-9 Days group do not follow this trend. We believe it is due to the small number of patients in that group and the higher-than-average number of Discharged Dead patients in the first month.
Through these exploratory analysis exercises, we have found that for patients in ICU, those who had HIT Positive tended to stay in the hospital longer than those who had not. Extended stay in the ICU generally translates to increased cost, but we thought it might be worth it to explore with just the data we have. We wanted to see if the patients who are approximate better to true HIT patients received more resources in terms of number of platelet tests performed. In this comparison, we included the last three groups in Figure 7 and also the group of patients who were admitted to ICU, had exposure to heparin, and had a negative result in the HIT test. We aligned by each patient's first admission to ICU and compared the normalized platelet data to see how many platelet tests were performed per patient in each month (Figure 8). We had expected to see the lower groups to have higher number of platelet tests, but there was little visual evidence to support that hypothesis. There was indeed a large difference in the number of platelet tests per patient in each month between the first and the second group, but there was little difference among the other three. The reason is that the hospital treats all HIT test positive patients (the last three groups) with the same diligence and caution even though the HIT test is only 25% accurate when it is positive.
We were pleased that Lifelines2 succeeded in allowing our physician partners investigate and gather visual evidence with respect to their hypotheses. The comparisons on the discharge pattern showed that the data seems to support the hypothesis that HIT patients tended to stay in the hospital longer. However, because hospitals do not know whether a patient has HIT a priori and can only rely on the result of the low-sensitivity HIT test, we see the evidence that the hospital treats all of the HIT positive patients with heightened diligence and care with regards to monitoring platelets. If it were true that HIT patients do incur more cost, the cost would have to come from elsewhere. It is interesting to note that although both are comparisons of categorical data, the first comparison highlights the behaviour of patients (how soon they get well with the hospital's help), while the second highlights the behaviour of the hospital (how well the hospital treats the patients).
Fig. 8. Normalized platelet data aligned by the 1st admission to ICU for 4 groups (Platelet Normal/High in pink, and Platelet Low/Critical in red). We see these numbers of platelet tests only dramatically increase from the first group to the second.
Previous | View All
4.2 Monitoring Graduate Student Progress
A second application of the temporal summaries is to monitor and evaluate graduate student progress. Progress through a PhD program can be measured loosely by grades in course work, advancing through program milestones, publishing research papers, etc. Each year, the faculty of our department review the progress of each student, considering each of these factors, with the intent to offer advice to students and their advisors. The tool described in this paper is the first visualization tool to be applied to the process.
With various queries in SQL, the review can identify students who have fallen behind or are approaching a program deadline. However, temporal summaries can help to solve two classes of question that SQL supports poorly. First, are there factors that may predict falling behind schedule? Below, we ask how well being a teaching assistant (TA) for four semesters or more predicts a longer time to advance to candidacy. Second, is there evidence that the graduate review helps students be better aware of milestones and make better progress? The review has incidental benefit through, for example, making all advisors aware of graduate program deadlines, but quantitative evaluation is difficult.
There are three fundamental differences between student time lines and patient histories in the other studies. First, although the "outcome" in the medical setting is clear (months spent in the hospital, how patients were discharged), the outcome for a graduate student is much less well defined. Further, we consider and maintain only the information that describes currently-enrolled students; those who have left the program without a degree, who are on-leave, or even have completed the program, are not evaluated and are not (currently) in the graduate review data. This limitation in the data reduces the precision of any conclusions: for example, of the population of students who entered the program six years ago, only those who have not yet completed their dissertations are included, potentially increasing time-to-milestone statistics. Second, the confidentiality of current student records and the lack of a good data de-identifier constrain how we present results: We do not include screen-shots for this reason. Third, student time lines do not precisely match chronology: students may start in the spring or take a leave of absence, adjusting how time spent in the program corresponds to real time.
How might we use detailed information about a student's time-line to better predict timely completion of milestones? After an introduction to the tool, the analyst set about to determine how spending many semesters as a TA affected the time to propose a thesis. Events represented the start of a graduate career, each semester as a TA, and advancing to candidacy. To validate the hypothesis that more TA'ing implied a longer time to graduation, the analyst constructed three groups: those who had advanced, those who had advanced after four or more semesters of TA'ing, and those who had advanced after three or fewer. To do so he aligned all students by the Advanced to candidacy event, implicitly selecting only those students who had advanced. From this group, he used filters to choose two disjoint, approximately equally sized sub-populations: those who had TA'ed four or more semesters, and those who had TA'ed three or fewer semesters. When comparing these two groups and the union, it appears that, indeed, it tends to take students who TA four or more semesters on average one additional year to propose a thesis. Whether time spent TA'ing is a cause of delay or merely a symptom is not easily determined, but that does not diminish its predictive value.
To quickly evaluate the potential benefit of the graduate review, the analyst next constructed groups of students who were classified (by the SQL-query-based tests) as falling behind schedule to explore their success in later years. At each annual review, each student is given a high-level categorization that attempts to capture whether the student is On Target, Concerned, or Very Concerned. After aligning students by the first occurrence of Concerned, the events that followed showed that over 70% of the students who received such a mark were no longer marked as under Concerned the following year.
These analyses are preliminary; their results are not demonstrably true because of the biases in the partial dataset. However, to assist the analysts in exploring the data to quickly test simple hypotheses, the temporal summary provided significant help in answering key questions about the review of student progress. This initial portion of a continuing user study was conducted over a month. The developer and the analyst met and worked together for about four hours. The analyst drove the application and spent significant amount of time using the tool on his own. However, the developer and the analyst communicated steadily via e-mails.