The development and validation of a “5A” severity scale for predicting in-hospital mortality after accidental hypothermia from J-point registry data

Background Accidental hypothermia is a serious condition that requires immediate and accurate assessment to determine severity and treatment. Currently, accidental hypothermia is evaluated using the Swiss grading system which uses core body temperature and clinical findings; however, research has shown that core body temperature is not associated with in-hospital mortality in urban settings. Therefore, we developed and validated a severity scale for predicting in-hospital mortality among urban Japanese patients with accidental hypothermia. Methods Data for this multi-center retrospective cohort study were obtained from the J-point registry. We included patients with accidental hypothermia who were admitted to an emergency department. The total cohort was divided into a development cohort and validation cohort, based on the location of each institution. We developed a logistic regression model for predicting in-hospital mortality using the development cohort and assessed its internal validity using bootstrapping. The model was then subjected to external validation using the validation cohorts. Results Among the 572 patients in the J-point registry, 532 were ultimately included and divided into the development cohort (N = 288, six hospitals, in-hospital mortality 22.0%) and the validation cohort (N = 244, six hospitals, in-hospital mortality 27.0%). The 5 “A” scoring system based on age, activities-of-daily-living status, near arrest, acidemia, and serum albumin level was developed based on the variables’ coefficients in the development cohort. In the validation cohort, the prediction performance was validated. Conclusion Our “5A” severity scoring system could accurately predict the risk of in-hospital mortality among patients with accidental hypothermia. Electronic supplementary material The online version of this article (10.1186/s40560-019-0384-2) contains supplementary material, which is available to authorized users.


Background
Accidental hypothermia (AH) involves an unintentional decrease in core body temperature to ≤ 35°C [1]. This condition is associated with high risks of hemodynamic collapse and mortality (24-40%) [2][3][4], as the cooling heart results in decreased cardiac output and electrical conduction abnormalities leading to life-threatening dysrhythmias, such as bradycardic atrial fibrillation or ventricular fibrillation [1]. Therefore, patients with AH must be immediately assessed to determine their severity and select appropriate advanced resuscitation and critical care techniques.
Although AH patients require immediate assessment of the severity and critical care, there is no established risk assessment tool specialized for AH patients. This might lead to inappropriate decision-making due to a lack of accurate information for the prognosis. The severity of AH is traditionally evaluated using the Swiss grading system [1] which is based on core body temperature and simple clinical findings. However, other research has indicated that core body temperature is not associated with in-hospital mortality in urban settings [2,4,5]. Moreover, mortality is known to be associated with various other factors, such as age, activities of daily living (ADL), hemodynamic instability, hyperkalemia, and acidemia [1,2,[4][5][6][7][8][9]. Unfortunately, it is difficult to understand how these factors might influence mortality, especially in an emergency setting. Thus, a simple and user-friendly severity scale is needed to estimate mortality after AH in urban settings. The present study aimed to develop and validate a severity scaling system for predicting in-hospital mortality using data from Japanese patients who experienced AH in urban settings.

Methods
This multi-center retrospective cohort study complied with the TRIPOD statement (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) regarding the reporting of the study's methods and results [10].

Data source
We obtained epidemiological and clinical information from the J-point registry database which collects data from a network of Japanese centers that treat patients with AH [2]. Eight centers are designated as Critical Care Medical Centers (CCMCs), and four sites are the emergency departments (EDs) of non-CCMC general hospitals in urban areas of the Kyoto, Osaka, and Shiga prefectures in Japan. Each year, the centers had a median of 19,651 ED visits (interquartile range 13,281-27,554 visits). In Japan, CCMCs are certified by the Ministry of Health, Labour and Welfare based on EDs that treat patients for shock, trauma, resuscitation, and critical care which serve approximately 500,000 residents in each region; in these CCMCs, advanced treatment like extracorporeal membrane oxygenation (ECMO) is generally available [11]. The non-CCMC centers are public or private general hospitals that cover a smaller regional community, and, generally, advanced treatment such as ECMO is unavailable.
The J-point registry includes patients who are retrospectively identified at each center using the International Classification of Diseases, Tenth Revision (ICD-10) code for hypothermia (T68). These patients were treated for hypothermia between April 1, 2011, and March 31, 2016, and had a body temperature of unknown or ≤ 35.0°C. Patients were excluded from the registry if they or their family members explicitly refused to be included in the registry. Clinical data were extracted by emergency physicians using a predefined data extraction sheet. The collected data were re-checked by the J-Point Registry Working Group members and either confirmed or checked with the appropriate institution if there were concerns regarding the data's validity. Based on these factors, 572 patients were registered in the J-point registry. The ethics committee of each center approved the registry protocol and retrospective analysis of de-identified data.

Study population
The present study included adult patients (≥ 16 years old) with a core body temperature of ≤ 35°C at ED admission and excluded patients with a non-AH core body temperature (> 35°C or unknown) and missing data regarding age, sex, and mortality. The model was planned to undergo both internal and external validation [12,13]. Thus, a development cohort was created based on centers from Kyoto city (four CCMCs and two non-CCMCs), while the validation cohort was created based on centers from Shiga and Osaka prefecture and Kyoto prefecture except for Kyoto city (four CCMCs and two non-CCMCs). This approach was selected because random sample splitting is not recommended for relatively small cohorts (to avoid over-fitting the data), which should instead be subdivided based on a time period or geographical location [12,14]. The validation cohort was considered sufficient for external validation because the sample splitting was based on geographical location and not random allocation [12,14].

Data collection and patient outcomes
The institutions were categorized as CCMC or non-CCMC, and the annual number of ED visits, the average number of hospital beds, and patients' characteristics including sex, age, independent or disturbed ADLs, and comorbidities were collected (Additional file 1). The patients' clinical characteristics were defined as vital signs at hospital arrival (core body temperature, systolic blood pressure [SBP], and Glasgow [GCS] and Japan [JCS] Coma Scales) and biological data (serum pH, potassium [K + ][mEq/L], and albumin [g/dL]). Details of the patients' clinical characteristics are provided in Additional file 1. Treatment characteristics were defined as external and minimally invasive rewarming methods (warm intravenous fluid, forced warm air, warm blanket, and others) and active internal rewarming (lavage, intravascular rewarming device, and veno-venous and veno-arterial ECMO) (Additional file 1: Table S1). The outcome of interest was defined as in-hospital mortality, which was also determined retrospectively.
Prognostic variable selection, data preparation, and handling missing data Based on previous studies and expert opinions, we selected the admission values for age, ADL, body temperature, level of consciousness, hemodynamic state, serum pH, albumin, and K + as potential predictor candidates of in-hospital mortality [1,2,[4][5][6][7][8][9]. To ensure that the model is user-friendly, especially for emergency settings, we categorized the potential covariates based on their normal limit or commonly used ranges. Level of consciousness was classified as mild (GCS 13-15 or JCS 0-3), moderate (GCS 9-12 or JCS 10-30), and severe (GCS < 9 or JCS 100-300). Details of the JCS are described in Additional file 1. A status of "near arrest" was defined as an SBP of ≤ 60 mmHg, unmeasurable values, and cardiac arrest. In terms of missing values, variables with < 3% missing data were analyzed based on complete case analysis as such an analysis might then be feasible [15]. If missing values were > 3%, missing data were categorized as "unknown," because unmeasured values might be informative in clinical settings (e.g., in minor cases, blood gas analysis tends to be omitted). Tables 1  and 2 show the distributions of the covariate categories for each cohort. We did not calculate the required sample size, because the J-point registry contains the largest number of AH cases among the available literature, and we aimed to empirically include all available data to maximize the model's power and generalizability [14]. There is a consensus on the importance of having an adequate sample size; however, there is no generally accepted approach for estimating the required sample size when developing and validating risk prediction models [14].

Development and evaluation of the prediction model
In the development cohort, predictors were selected using a stepwise backward method based on the lowest Akaike's information criterion from the potential predictor candidates mentioned above. It allowed us to develop a parsimonious predictor model for variable retention, and multivariable logistic regression was subsequently applied. Backward elimination is generally In disturbance of consciousness, mild: Glasgow coma scale (GCS) 13-15, or Japan coma scale (JCS) 0-3, moderate: GCS 9-12, or JCS 10-30, severe: GCS < 9, or JCS 100-300 IQR interquartile range, near arrest: systolic blood pressure ≤ 60 mmHg, unmeasurable, or cardiac arrest preferred as an automated selection procedure because all correlations between the predictors are considered in the modeling procedure [14]. Each variable's coefficient β and odds ratio were reported with the 95% confidence interval (CI). The model's performance was evaluated based on Somers' D xy , the C index, the R 2 value, the calibration intercept and slope, and the Brier score. Calibration plots were also created to graphically depict the association between the predicted and observed in-hospital mortality rates based on locally weighted scatterplot smoothing [13].
Internal validation involved a bootstrapping procedure using 200 samples drawn with replacement from the original sample [13].
The fixed model was applied to the validation cohort for external validation, and the discrimination and calibration performances were compared to those from the development cohort. Finally, we set the clinically useful simplified risk stratification using a simple integer risk score based on each variable's coefficient β [13]. To assess discrimination performance, we compared the c-index of our risk scoring system with that of the core body temperature on admission, which is categorized by the Swiss grading system, commonly used to assess the severity in AH [1]. The diagnostic abilities [sensitivity, specificity, positive likelihood ratio (LR+) and negative likelihood ratio (LR−)] of each score were calculated. The calibration performance of risk stratification was graphically evaluated in terms of the relationship between the predicted and observed in-hospital mortality. All statistical results were considered significant at two-sided P values of < 0.05. Statistical analyses were performed using JMP Pro® 14 software (SAS Institute Inc., Cary, NC) and R software (version 1.1.456; R Studio Inc.) with the "rms" package [15].

Patient characteristics
Among the 572 patients in the J-point registry, we excluded 31 patients with a non-AH body temperature (> 35°C or unknown), 8 non-adult patients (< 16 years old), and 1 patient with missing data. Thus, 532 patients were ultimately included, with an overall in-hospital mortality of 24.4%. The patients were then divided into the development cohort (N = 288, six hospitals [four CCMCs and two non-CCMCs], in-hospital mortality 22.0%) and the validation cohort (N = 244, six hospitals [four CCMCs and two non-CCMCs], in-hospital mortality 27.0%) (Fig. 1). The characteristics of the institutions and patients are shown in Tables 1 and 2, with the characteristics and distributions being generally similar between the cohorts. Missing values in pH and albumin were > 3% in each variable; thus, these missing values were categorized as "unknown," and we conducted a complete case analysis.

Performance and internal and external validation of the model
The 5 "A" predictors (age, ADL, near arrest state, acidemia, and albumin) were selected. The variables' coefficient β, adjusted odds ratio with 95% CI, and the formula for predicted in-hospital mortality are shown in the Additional file 1: Table S2 and Formula. Evaluation of the model and the calibration plot in the development and validation cohorts were shown in Additional file 1: Table S3 and Figure S1 respectively, in Additional file 1. The calibration plot in both cohorts revealed a relatively good calibration, although the bias-corrected line revealed slight overestimation of the mortality risk.

Discussion
The present study revealed that a "5A" severity scoring scale (based on age, ADL, near arrest, acidemia, and albumin) had better ability to predict mortality after AH than the Swiss staging system based on the core body temperature, with good discrimination and calibration values based on internal and external validation. Furthermore, the severity scoring system will help emergency physicians to rapidly predict a patient's prognosis and make management decisions. To the best of our knowledge, this is the first scale to be subjected to internal and external validation for predicting prognosis among patients with AH in urban areas.
Previous literature and the present study's strengths Two reports have described methods for predicting prognosis after cardiac arrest due to AH [16,17]. The ICE survival score (based on sex, asphyxiation, and serum K + ) and the HOPE score (based on sex, asphyxia, age, K + , duration of cardiopulmonary resuscitation, and temperature) could predict prognosis after treatment using extracorporeal life support for AH cardiac arrest. However, these scores were developed based on a literature review, which included observational cohorts and case reports, and might have been affected by publication bias and selection bias. Moreover, these scores were not evaluated using bootstrapping as internal validation or a separate dataset for external validation which might have increased the risk of overfitting; thus, it raises questions regarding the applicability of these scores to other populations.
In contrast, the present study has several strengths. First, to our knowledge, ours is the largest cohort of patients with AH which allowed us to create two cohorts based on geographical location and subject the model to external validation. Second, we performed a bootstrapping procedure to assess overfitting and over-optimism during internal validation. Third, most patients were elderly which agrees with a recent report that indicated that most AH cases in Japan involve elderly people [2,3,9]. Population aging is a common public health issue in Fig. 2 Calibration plot for each cohort. In the development cohort, the ideal dashed line reflects perfect calibration between the predicted and observed mortality. The apparent performance, indicated by the dotted line, reflects the calibrated performance of the model. The solid line reflects the bias-corrected performance based on bootstrapping. The validation cohort also has ideal dashed lines. The solid lines reflect the fitted logistic calibration curve. The dotted lines reflect a smooth nonparametric fit using a locally weighted scatter plot for smoothing industrialized countries all over the world, and it is assumed that most victims of AH in developed countries will also be elderly. Most previous studies regarding AH have focused on younger patients [6][7][8][16][17][18][19], with average ages of 35 years in the ICE score study and 36 years in the HOPE score study [16,17]; therefore, these scores are not applicable for the general population. Thus, we believe that our model is more generalizable for patients who experience AH in urban areas.

Interpretation
The present study evaluated clinically relevant variables that can be summarized as the "5A"s (age, ADL, near arrest, acidemia, and albumin). In this context, the patient's values for age, ADL, and serum albumin may reflect a vulnerable physiological status, and these variables are commonly used as prognostic factors in critical care [20][21][22][23]. Hemodynamic instability and pH are also important factors in major critical care severity scoring scales [24] as they reflect the extent of vital organ hypoperfusion. Thus, we believe that the variables in our model could reflect hypothermia-related physiological changes. Similar to other studies [4], we did not incorporate body temperature as a predictor, as we hypothesized that a hypothermia-related decrease in organs' oxygen consumption could protect them despite the presence of hypoperfusion, which would prevent body temperature from being strongly associated with prognosis.
During the model's development, we used bootstrapping to account for slight over-optimism (e.g., correcting the C statistic from 0.794 to 0.746). We also found overestimation among the severe population in the bias-corrected calibration plot which appears to be mainly related to the small number of severe cases. The validation process also revealed slight overestimation among the severe population. Thus, we should interpret the findings carefully among severe cases.

Clinical implications
We believe that this severity scoring scale allows clinicians to rapidly assess the severity of AH patients, provide patients and their families with accurate information, and improve their prognosis by more appropriately selecting severe cases which require advanced resuscitation and Fig. 3 Predicted and observed mortality based on the 5A scoring system. The median predicted mortality rate is shown for the quartile-based sums of the risk scores in each cohort. The observed mortality rate reflected the proportion of in-hospital mortality. The predictions were well calibrated with the observations. The 5A scoring system provided a simple and rapid prediction of post-accidental hypothermia prognosis. ADL activities of daily living, SBP systolic blood pressure. Arrest was defined as SBP of ≤ 60 mmHg, unmeasurable values, and confirmed arrest On the other hand, in urban areas, most AH patients are elderly [3,9]. Since there is no established risk assessment tool available for their treatment, we are apprehensive about the appropriate treatment, because enough information for the prognosis is not available. For instance, elderly patients with impending death might be treated too invasively, without discussing the prognosis with their relatives, or those with good survival prospects might undergo early withdrawal of the treatment. The severity scoring system based on easily accessible data ("5A") enables easy prognosis assessment by physicians. Aggressive treatment might be reasonable for patients found to be in the low-risk group (≤ 3 points), even if they are elderly. Physicians can easily identify the condition requiring critical care for those in the severe-risk group (≥ 6 points), and based on the possibility of prognosis, they can decide the indication after discussing with the patient's relatives. Therefore, our risk scoring system can lead to rational decision-making based on the probability of prognosis evidence.

Limitations
This study has several limitations. First, despite the generalizability of our model to similar urban areas, it is unclear if this model can be applied to other settings, for instance, an outdoor activity (i.e., skiing, mountain climbing, etc.) associated setting, in which most patients are healthy young athletes. Since our model was developed from an urban population, in which treatment is focused on elderly people who stay indoors [2,9], the population and characteristics between these settings are totally different. A second limitation is the relatively small sample size which could have increased the risk of overfitting and optimism [14]. A third limitation is the absence of complete detailed data in the J-point registry regarding the AH event, the clinical course after admission, treatment, the neurological outcome, and the cause of death. Fourth, we did not compare the usefulness or diagnostic ability with general risk assessment tool for critically ill patients such as SOFA or APACHE2. Therefore, further research is needed to determine the validity, generalizability, and clinical usefulness of our model in other cohorts and to evaluate its clinical utility.

Conclusion
The present study revealed that the 5A severity scale had good discrimination and calibration for predicting in-hospital mortality after AH based on internal and external validation. We believe that this severity scoring scale can be useful to rapidly assess the severity of patients with AH.

Additional file
Additional file 1: The definition of patient characteristics and laboratory data. Table S1. The range of the laboratory data on arrival at the emergency department. Table S2. Coefficient β and adjusted odds ratio with 95% confidence intervals. Table S3. Model performance in the development cohort assessed by bootstrap and that in validation cohort. Figure S1. Calibration Plot in each cohort. Table S4. The conversion of the coefficient values to the score. Table S5. Comparing the discrimination performance in validation cohort. (DOCX 353 kb)