The operative risk stratification models in cardiac surgery: EuroSCORE II model - risk groups categorization

Background/Aim. The treshold that defines a low, moderate or high-risk patients is not uniformly determined for the European System for Cardiac Operative Risk Evaluation (EuroSCORE II) by literature at present. The aim of this study was to suggest risk groups categorization within EuroSCORE II risk statification model. Methods. A 7,641 consecutive patients were scored preoperatively using EuroSCORE II. The end point for the study was in-hospital mortality accross the risk group categories. Patients with EuroSCORE II values of ? 2.50, > 2.50?6.50%, and > 6.50% were defined to be at low, moderate, and high perioperative risk, respectively. Discriminative power of the model was tested by calculating the area under the receiver operating characteristic curve (AUC). The calibration of the model was assessed by Hosmer-Lemeshow statistics, and with observed/expected (O/E) mortality ratio. Results. Inhospital mortality observed in our sample was 3.85% (295 out of 7,641 patients). The EuroSCORE II discriminative power was acceptable (AUCs > 0.70) for the low and high risk groups, while it failed to confirm good discrimination in the moderate risk group. Hosmer-Lemeshow statistics confirmed good calibration across risk group categories. The O/E mortality ratio failed to confirm good calibration in the low and high risk group (slight, but significant underprediction ratio of 1.24; 95% confidence interval 1.05?1.43), but confirmed good calibration in all three subcategories of the high risk group. Conclusion. The results of this study showed an acceptable overall performance of the Euro- SCORE II in terms of discrimination and accuracy of model predictions for perioperative mortality across risk group categories. Validation of EuroSCORE II performances across risk group categories needs to be further studied for a continuous improvement of patients' risk stratification before planned cardiac surgery.


Introduction
Although there has been an important progress in preoperative screening, surgical techniques, myocardial protection, and intensive care unit (ICU) treatment, open-heart surgery still carries a certain risk of mortality and morbidity. Being the most useful tool for the improvement of patients' selection and counseling, scoring systems have been developed over the last two decades, and used to predict perioperative risk in cardiac surgery. Therefore, risk adjusted perioperative mortality rate following cardiac surgery has been widely adopted as an indicator of quality of care as well as for comparison of outcomes among institutions and surgeons (in the United Kingdom). Predicted probability of occurrence of postoperative death has also enabled stratification of patients in different clinical risk groups (low, moderate, high) 1 , and, subsequently, made it possible to target high-risk surgical patients in need of new therapeutic interventions 2, 3 . Being the most widely used worldwide, the Society for Thoracic Surgeons (STS) Predicted Risk of Mortality (PROM) score, and the European System for Cardiac Operative Risk Evaluation (EuroSCORE II) have recently been adopted by guidelines 4 . The EuroSCORE study group 5 in presentation of original additive EuroSCORE model, has stratificated risk groups as low (score 0-2), moderate (3)(4)(5) and high (6 ≥) perioperative risk. Although both versions (additive and logistic) of the old EuroSCORE have retained very good discriminatory power, old models no longer accurately predict operative mortality due to an overestimation of the adult cardiac patients surgical risk (poor calibration) in the range of two to three fold 6,7 . Therefore, the aged Euro-SCORE has recently been updated and renewed into Euro-SCORE II 6 . However, there are only a few reports [8][9][10] in which authors tried to determine risk group categories based on the score values of EuroSCORE II model. Our arbitrary determined risk group boundaries are based on predicted risk values which should represent a real world scenario, and should have a more clinically meaningful power than previously reported arithmetic quartile grouping (with similar number of patients), resulting in a very low score values for moderate, and especially for high risk patients 8,9 . Therefore, the aim of our study was to suggest more real risk group categorization using EuroSCORE II model.

Methods
EuroSCORE II data were prospectively calculated (online calculator (http://www.euroscore.org) 11 , and stored in the institutional database for a series of 7,641 consecutive patients who underwent adult (≥ 18 years of age) cardiac surgery at "Dedinje" Cardiovascular Institute in Belgrade, Serbia, from 1st January 2012 to 31st December 2015. Due to a low number of patients with a postinfarction ventricular septal defect (VSD) included in the developmental database of EuroSCORE II, no risk coefficient was assigned to postinfarction VSD closure procedure any more 6 . Therefore, patients with postinfarction VSD were excluded from our study, as well as from several subsequent EuroSCORE II validation studies 12,13 . Only the first procedure for each patient was entered into the registry, while reinterventions for any cause in the same admission as the primary operation were coded as a complication. The primary end point for the study was in-hospital mortality (any-cause postoperative death occuring during the index hospitalization, in the hospital in which operation took the place) accross the arbitrary determined risk group categories. Patients with EuroSCORE II values of ≤ 2.50, > 2.50-6.50%, and > 6.50% were defined to be at low, moderate, and high perioperative risk, respectively. High risk patients were further divided into three subcategories -higher, very high and extremely high perioperative risk, with EuroSCORE II values of > 6.50-13.50%, > 13.50-20.00%, and > 20.00%, respectively. The Institutional Ethics Committee approved the study and requirement for informed written consent was waived due to the fact that patients' identities were masked.
Statistical analyses were performed using the statistical package SPSS version 17.0 (SPSS, Inc., Chicago, IL, USA). Categorical variables were expressed as percentages, and continuous variables were expressed as mean ± standard deviation (SD). Statistical analyses were performed by the Fisher's exact test or χ 2 test for categorical variables and by ttest for continuous variables. A p-value of less than 0.05 was considered significant.
The performance of the EuroSCORE II was analyzed focusing on discrimination power and calibration. Discrimination measures the capacity of the model to recognize the individuals of a cohort who will suffer an event (in this case perioperative death) and those who will not, thus distinguishing low-risk from high-risk patients. Discrimination can be assessed by the area under the receiver operative characteristic (ROC) curve (AUC). The AUC is a percentage of randomly drawn pairs (meaning one death and one survivor patient-pairs) for which it is true that a patient who died had a higher risk score than a patient who survived. The discriminative power is thought to be excellent if the AUC is > 0.80, very good if it is > 0.75 and good (acceptable) if it is > 0.70 14 .
Calibration refers to the agreement between observed events and predicted probability of occurrence of these events. The Hosmer-Lemeshow (H-L) goodness-of-fit test has been the most popular test to validate calibration, measuring the differences between observed and expected out- comes over deciles of risk. A well-calibrated model gives corresponding p-value > 0.05 15 . We also evaluated Euro-SCORE II calibration using the observed to expected (O/E) mortality ratio. Ideally, this ratio equals one (the observed mortality equals expected mortality, thus the predictive model is perfectly calibrated). A value above one means that model underestimates mortality, a value below one means that model overestimates mortality. If the 95% confidence interval (CI) of the O/E mortality ratio includes the value of 1.0, the model is well calibrated 15 .

Results
A total of 7,641 patients fulfilled the study criteria (patients < 18 years of age and patients with postinfarction VSD were excluded). The baseline patients characteristics and operative details (EuroSCORE II risk factors) for our study population are presented in Table 1. There were no missing data reffering to variables necessary for EuroSCORE models risk calculation. Definitions of all EuroSCORE II variables are available on the web-site: http://www.euroscore.org 11  Discriminatory and calibration abilities of EuroSCORE II for the whole sample and across basic (low, moderate, high) risk group categories are summarised in Table 2.
The in-hospital mortality observed in our sample was 3.86% (295 out of 7,641 patients), while EuroSCORE II predicted mortality was 3.62%. Discriminative power of the Eu-roSCORE II model was determined by calculation of the AUCs. Very good discrimination was confirmed (all AUCs > 0.75; for whole cohort and for all subgroups procedures which were performed -CABG, valve(s), combined, aortic, other). The EuroSCORE II discriminative power was acceptable (AUCs > 0.70) for the low and high risk groups, while it failed to confirm good discrimination in the moderate risk group. In the low risk group, only subgroup of valve(s) surgery showed good discrimination, as well as two subgroups (aortic, others) in the moderate risk group. Surprisingly, almost all results in the high risk category confirmed acceptable discrimination [mostly, AUCs > 0.70; close to borderline in the CABG subgroup (AUC = 0.69); failed in the subgroup -other]. Although H-L statistics confirmed overall good calibration in all risk group categories (overall and in all subgroups), it failed to confirm good calibration of Euro-SCORE II model for the whole cohort and for subgroups of the CABG and aortic surgery (Table 2). However, the O/E mortality ratio confirmed good calibration for the whole sample, and for all subgroups of performed cardiac procedures, excluding aortic surgery (significant underestimation of mortality; O/E mortality ratio = 1.63; 95% CI 1.25-2.01). In risk group categories, the O/E mortality ratio confirmed good calibration in the moderate risk group (including all subgroups), but it failed to confirm good calibration in the low and high risk groups (whole risk group sample and aortic surgery), as well as for CABG patients in the low risk group (Table 2).    Discriminatory and calibration abilities of EuroSCORE II across high risk group subcategories are summarised in Table 3.

mort. -mortality; O -observed; E -expected; CI -confidence interval; AUC -area under curve; CABG -coronary artery bypass grafting; H-L -Hosmer-Lemeshow test.
In the high risk group subcategories, the best discrimination was confirmed in extremely high risk group [close to borderline for the whole group and for aortic surgery (AUCs = 0.69); acceptable (AUCs > 0.70) in valves(s) and combined surgery; while it failed in CABG surgery]. In other two subcategories (higher and very high operative risk), good discrimination was recorded only for aortic surgery in higher risk group (AUC = 0.72) - Table 3.
The H-L statistics confirmed good calibration in all high risk group subcategories (higher, very high and extremely high) for all tested procedures, except for categoryall patients in subcategory of very high operative risk (H-L p = 0.01) ( Table 3). In the high risk group subcategories, the O/E mortality ratio failed to confirm good calibration only for aortic surgery in the higher and extremely high risk groups [O/E ratio of 1.80 (95% CI 1.12-2.48) and O/E ratio of 1.44 (95% CI 1.13-1.75), respectively] ( Table 3).

Discussion
Risk estimation is one of the most powerful tools for the improvement of the standard of care and correct allocation of clinical and economic resources 16 . Owing to perioperative risk stratification models, predicted probability of occurrence of perioperative death has enabled stratification of patients in different clinical risk groups (low, moderate, high), and, subsequently, made it possible to plan the optimal schedule for cardiac surgery, moderate the postoperative workload in ICU and rationally allocated hospital resources 1 . It has been confirmed that the additive EuroSCORE model significantly correlated with cost of cardiac surgery 17 , and that ICU and postoperative stay were significantly prolonged across increasing EuroSCORE II risk group categories (subsequently enhancing the cost of open heart surgery) 18 . Therefore, it appears that stratification in clinical risk group categories should be an integral part of the cardiac surgical practice, belonging to risk assessment, decision-making, and informed consent.
Validation of risk stratification abilities of the old, additive EuroSCORE has been conducted and presented in the basic manuscript 5 by the EuroSCORE study group. Validation processing confirmed good calibration for the medium risk group (score 3-5; O/E mortality ratio of 1.04; 95% CI 0.89-1.19), as well as for the high risk group (score ≥ 6; O/E mortality ratio of 0.99; 95% CI 0.91-1.07). For the low risk group (score 0-2; O/E mortality ratio of 0.62; 95% CI 0.42-0.82) model significantly overestimated mortality (O/E mortality ratios and 95% CIs were calculated using the data from quoted manuscript). The AUCs and H-L test p-values were not presented for risk groups.
The treshold that defines a low, moderate, high/very high-risk patients is not uniformly determined for the Euro-SCORE II by literature at present. Several groups atempted to present and clarify this topic. Paparella et al. 10 categorised almost 6,200 patients into five risk groups (low ≤ 1.5%, mild 1.6-5.0%, moderate 5.1-10.0%, high 10.1-20.0%, and very high > 20%). However, that categorisation (supported by formation of a hierarchical tree, and subsequent statistical analysis) has been conducted using observed mortality, rather than predicted mortality. In our opinion, categorisation of the risk groups should be performed according to Euro-SCORE II predicted mortality, and than, O/E mortality ratio and statistical analysis should be performed. Therefore, that study is not valid for EuroSCORE II risk group categorization. Velicki et al. 8 divided cohort of 1,247 patient in quartiles, resulting in a fact that all patients with EuroSCORE II predicted risk of more than 2.35% (4th quartile), were categorised as high-risk patients. Bai et al. 9 have also devided their sample (4,507 patients) in quartiles, resulting in the high-risk group (4th quartile), with EuroSCORE II value of more than 1.64%. We do believe that it is unacceptable to categorize all patients with EuroSCORE II of more than 1.64%, or even of more than 2.35%, as high-risk patients. Even with such low risk group borderlines, EuroSCORE II underestimated mortality for "high-risk group" in both papers. Two other groups reported risk group stratification, presenting EuroSCORE II values, too, but categorisation was conducted using old EuroSCORE models. Di Dedda et al. 13 presented a cohort of 1,090 patients, divided in quintiles of distribution, but risk stratification was created according to the old logistic EuroSCORE values. In their patient population, for the very high risk patients (observed mortality 11%), EuroSCORE II predicted mortality was 6.5% (significant underestimation). Kalender et al. 19 reported octagenarians (105 patients) who underwent isolated coronary artery surgery, but the old additive EuroSCORE was used for risk group categorisation. The discriminative power of Euro-SCORE II model was not shown for risk group categories in any of aforementioned papers. The perioperative mortality related to cardiac surgery has decreased due to improved surgical techniques and perioperative patients menagement, despite sicker and more complex (baseline patients' characteristics, case mix, etc.) patients who are undergoing surgery. Although the EuroSCORE II values are generally lower (compared with additive EuroSCORE values, except for the very high risk category) for the tested group of patients 18 , we decided to determine borderlines for risk groups categorisation in such a way to stay close to the basic manuscript 5 by the EuroSCORE study group, as follows: low risk category ≤ 2.5% (basic manuscript 0-2%), moderate risk category > 2.5-6.5% (basic manuscript 3-5) and high risk category > 6.5% (basic manuscript ≥ 6). Arangalage et al. 20 were the only ones who searched for correspondence borderlines values for high risk patients between old logistic Euro-SCORE and for EuroSCORE II, and they proposed a threshold of ≥ 7% of EuroSCORE II for high risk patients, which is very close to our suggested borderline value for EuroSCORE II high risk patients.
We confirmed acceptable discriminative power of Eu-roSCORE II in the low risk group (AUC -0.72) and the high risk group (AUC -0.71). In the high risk group subcategories, only for extremely high risk subcategory, discrimination was borderline acceptable (AUC -0.69). Good discrimina- tion was also confirmed for some subgroups of performed surgical procedures across risk group categories as well as across high risk group subcategories (Tables 2 and 3). The explanation for reduced discriminative power is statistically simple. When patients are stratified according to the risk score, and than only one strata is analyzed, the regressors and their coefficients within the stratum are different from those which allocated them to that risk group in the first place 21 . Therefore, we should not be surprised if discrimination drops to a lower level within the stratum 21 . Furthermore, a minimum of 100 (and preferably 200) events (perioperative deaths) should be included in the sample size so that model performance can be adequately assessed 22 . The Hosmer-Lemeshow statistics confirmed good calibration in all risk group categories and subcategories of the high risk category, and for all subgroups of performed cardiac procedures. It failed to confirm good calibration only for the whole sample (all patients) in the subcategory of very high risk patients (H-L p = 0.01). According to O/E mortality ratio, for the low risk group model significantly overestimated mortality for the whole sample and CABG surgery subgroup, while it significantly underestimated mortality for the aortic surgery subgroup. In the moderate risk group, prediction was good for the whole sample, as well as for all subgroups of performed cardiac surgery. In the high risk group model, mortality was slightly, but significantly underpredicted for the whole sample (O/E mortality ratio -1.24; 95% CI 1.05-1.43). On the contrary, further analysis of high risk group subcategories confirmed good calibration for category -all patients, in all three subcategories. Therefore, our results are not in accordance with previous statements that EuroSCORE II significantly underestimates mortality in the high risk group category 2,8,9,13 . In the high risk group category, our study is in keeping with results of Barili et al. 7 , who showed an optimal EuroSCORE II calibration until 30%-predicted mortality.

Limitations of the study
The limitation of our study is its single-center design, and, therefore, results may not represent national and international practice and outcomes. Although our cohort recruited more than 7,600 patients, another limitation has been sample size, which generated relatively small specimens, including limited number of tested events (in this case perioperative deaths) for more precise subgroup analysis.

Conclusion
The results of this study show an acceptable overall performance of EuroSCORE II in terms of discrimination and accuracy of the model predictions for perioperative mortality across risk group categories (except overprediction of mortality in the low risk group, O/E mortality ratio). Validation of EuroSCORE II performances across risk group categories needs to be further studied for a continuous improvement of patients' risk stratification before planned cardiac surgery.