Physician agreement on the diagnosis of sepsis in the intensive care unit: estimation of concordance and analysis of underlying factors in a multicenter cohort

Background Differentiating sepsis from the systemic inflammatory response syndrome (SIRS) in critical care patients is challenging, especially before serious organ damage is evident, and with variable clinical presentations of patients and variable training and experience of attending physicians. Our objective was to describe and quantify physician agreement in diagnosing SIRS or sepsis in critical care patients as a function of available clinical information, infection site, and hospital setting. Methods We conducted a post hoc analysis of previously collected data from a prospective, observational trial (N = 249 subjects) in intensive care units at seven US hospitals, in which physicians at different stages of patient care were asked to make diagnostic calls of either SIRS, sepsis, or indeterminate, based on varying amounts of available clinical information (clinicaltrials.gov identifier: NCT02127502). The overall percent agreement and the free-marginal, inter-observer agreement statistic kappa (κfree) were used to quantify agreement between evaluators (attending physicians, site investigators, external expert panelists). Logistic regression and machine learning techniques were used to search for significant variables that could explain heterogeneity within the indeterminate and SIRS patient subgroups. Results Free-marginal kappa decreased between the initial impression of the attending physician and (1) the initial impression of the site investigator (κfree 0.68), (2) the consensus discharge diagnosis of the site investigators (κfree 0.62), and (3) the consensus diagnosis of the external expert panel (κfree 0.58). In contrast, agreement was greatest between the consensus discharge impression of site investigators and the consensus diagnosis of the external expert panel (κfree 0.79). When stratified by infection site, κfree for agreement between initial and later diagnoses had a mean value + 0.24 (range − 0.29 to + 0.39) for respiratory infections, compared to + 0.70 (range + 0.42 to + 0.88) for abdominal + urinary + other infections. Bioinformatics analysis failed to clearly resolve the indeterminate diagnoses and also failed to explain why 60% of SIRS patients were treated with antibiotics. Conclusions Considerable uncertainty surrounds the differential clinical diagnosis of sepsis vs. SIRS, especially before organ damage has become highly evident, and for patients presenting with respiratory clinical signs. Our findings underscore the need to provide physicians with accurate, timely diagnostic information in evaluating possible sepsis. Electronic supplementary material The online version of this article (10.1186/s40560-019-0368-2) contains supplementary material, which is available to authorized users.


Introduction
In this Supplement we analyze the subset of patients diagnosed with respiratory infections (both pneumonia, and non-pneumonia) on the first day of ICU admission.
We focus on physicians' attempts to diagnose whether these patients are septic or not. Based on analysis of inter-observer agreement statistics, this diagnosis appears very difficult to make with certainty.

Methods
Inter-observer agreement statistics: We calculated two inter-observer agreement statistics: the % overall agreement, and the free-marginal kappa (κfree). For these calculations, we used the web applet described by Randolph (2005) To obtain an overall sense of the difficulty of diagnosing sepsis in cases of respiratory infection, we performed the following analysis. (1) For the set of patients diagnosed with respiratory infections, we conducted comparisons (A-J) and physician at admission; one vote from the site investigator at admission; two votes from the two site investigators at discharge; one vote from the adjudicator at discharge (only in cases of disagreement between site investigators); three votes from the three external RPD panelists. The total number of votes therefore was either seven or eight votes, depending on whether or not an adjudicator was used for a particular patient at the discharge evaluation. The fraction of votes that were indeterminate could then be calculated in a straightforward fashion.

Results
High frequency of discordant and indeterminate classifications: We examined in greater detail those patients diagnosed with pneumonia (17 in VENUS, 25 in VENUS Supplement, adding up to 42 total). We also performed the analyses including 7 additional patients diagnosed with non-pneumonia respiratory infections (bronchitis, severe influenza, pulmonary edema, tracheitis, lung abscess, pharyngitis, other): 3 from VENUS and 4 from VENUS Supplement. The addition of another 7 patients did not materially alter the conclusions drawn from the 42 pneumonia patients.

4
An overall summary of findings (% overall agreement and free marginal kappa) is presented in Table S4-1 for respiratory infections, and in Table S4-2 for nonrespiratory infections + SIRS. Figure S4-1 compares the cumulative distributions of the free marginal kappa statistic (κfree) for the two conditions. By the Kolmogorov-Smirnov test, the two distributions are very different (p < 0.0001).
For all physicians, the patients with respiratory infections (pneumonia and other types) proved to be the most difficult on which to reach consensus, with respect to the presence or absence of sepsis.  Tables  S4-1 and S4-2. The Kolmogorov-Smirnov test indicated that the distributions are different, at a high significance level (p < 0.0001).

Figure S4-2: Cumulative distributions of the free marginal kappa statistic (κfree) for respiratory infections vs. non-respiratory infections + SIRS.
Distributions of the free-marginal kappa statistic (κfree) were calculated from the data reported in Tables S4-1 and S4-2. The Kolmogorov-Smirnov test indicated that the distributions are different, at a high significance level (p < 0.0001).

Respiratory patients classified as SIRS:
In the USA cohort (VENUS + VENUS Supplemental), none of the 49 patients with pneumonia or other respiratory infections were diagnosed initially as SIRS by either the attending physician or the site investigator (Figure S4-3A). Upon discharge assessment, 6.1% (3/49) of these patients were diagnosed unanimously as SIRS, by the combination of the site investigators' discharge assessment and the external RPD assessment. (Figure S4-3D). This difference in estimated % SIRS between admission and discharge assessments did not appear statistically significant (p = 0.08; chi square test). In sharp contrast to these findings, patients not suspected of pneumonia or other respiratory infections had a significantly (p < 0.001) much higher frequency of unanimous SIRS diagnoses, both at ICU admission (108/200, 54.0%; Figure S4-4A) and at discharge assessment (122/200, 61.0%; Figure S4-4D). which is still a highly significant difference (Z score = 4.61; p < 0.0001). Further analysis of Indeterminates: We performed a further analysis on the indeterminate calls in the patients suspected of pneumonia or non-pneumonia respiratory infections (N=49) versus those with other conditions (N=200). We determined the fraction of all votes that were indeterminate, constructed the cumulative distribution of the indeterminate vote fraction (separately for the two strata), and applied the Kolmogorov-Smirnov (K-S) test to determine the significance of the difference statistic (D) for the two cumulative distributions.

Figure S4-3: Measured discordance in the classification of SIRS, Indeterminate and sepsis cases, for patients with respiratory infections in the VENUS + VENUS Supplement cohorts (N=49
The D statistic had the value 0.4064 indicating the two distributions were different at the p<0.001 level (Figure S4-5). This provides additional evidence to support the claim that the patients with pneumonia or non-pneumonia respiratory infections are especially difficult to diagnose with respect to having sepsis vs. SIRS, as they show a greater fraction of indeterminate calls.