Sex assessment from the proximal femur in the Spanish population based on three-dimensional computed tomography metric analysis

Background/Aim. The studies published in recent years have shown that the linear measurements on the three-di-mensional computed tomography (3D-CT) clinical images of the hip bone, skull or breastbone can serve as a reliable alternative method for sex estimation. In spite of the fact that the proximal femur exhibited high dimorphism when examining the skeletal material, there is still a lack of morphometric studies dealing with the CT imaging of this anatomical region that would confirm the relevance of the previously obtained results. The aim of this study was to validate the reliability and precision of some proximal femur measurements obtained in vivo from the 3D-CT models and to compare the accuracies of our findings with those for-merly reported by other relevant research. Methods. A total of 146 CT scans (73 male and 73 females) were selected to take 6 measurements using the traditional osteometric methods. The 3D reconstruction was done at 1mm and 1.25 mm thick slices with OsiriX (v.4.1). The univariate and multivariate discriminant functions (DFs) were formulated for assessing sex. Results. The vertical diameter of neck and the vertical diameter of head were found to contribute the most when considered independently (90.4%–91.8%) . When combining these with the other dimensions, the prediction accuracy increased up to 97.3%. The accuracy of CT measurements is in accordance with those obtained in the traditional morphometric studies on the skeletonized femurs of contemporary populations. The 3D-CT approach showed remarkably higher percentage of predictive ability in comparison with the 2D technique. Conclusion. 3D-CT is a suitable tool for the objective quantification of osteological data. The medical scans and measurements on living individuals offer a valuable source of data from which the highly reliable skeletal standards can be developed for estimating sex, even from the fragmented remains. The method proposed here can be highly useful especially in the identification of mass disaster victims when the direct osteometry is difficult to apply and maceration of the remains is not an option.


Introduction
The assessment of suitability of poorly preserved or fragmented skeletal remains as the only source of data available for sex diagnosis is a task that forensic anthropologists frequently deal with. Due to its robustness and density, the femur is the anatomical area less susceptible to damage and can be better preserved than other long bones. When the shaft or distal end is missing, the proximal femoral epiphysis can be highly useful in the fragmentary forensic contexts. Being an important insertion area of muscles and upper body weight transmission, the upper extremity is undoubtedly affected in terms of size and shape, which could subsequently have effect on its dimorphic potential 1,2 .
A review of published literature showed that the proximal femur had been largely examined to estimate the efficacy in the sex assessment. For this purpose, some researchers have identified triangle on the posterior aspect 3,4 , while others have focused on different features of the proximal epiphysis 5,6 . In the cited studies, the metric data were recorded on the modern cadaveric femora following the traditional morphometric techniques. They were subsequently subjected to the univariate and multivariate discriminant function (DF) analyses (DFA).
As for the single dimensions, the vertical diameter of the femoral head (VHD) and the vertical diameter of the femoral neck (VND) were those that proved to assign most correctly the males and females in a variety of population e.g., Spanish, Guatemalan and Thai [7][8][9] . However, all scholars agreed on the important fact that the percentages of correct classifications vary considerably within the same and among different ethnicities as a consequence of specific genetic, environmental, sociocultural and secular changes that the proximal femoral epiphysis undergoes over time 10 . So, the existent methods are constantly being re-examined and novel techniques accomplished in order to establish the more reliable standards for the estimation of sex.
With regard to the use of image-processing techniques for the prediction of sex from the proximal femur, the published literature primarily explored how the classical osteometric measurements performed when employed on digital radiographs 11-13, and secondly, they compared the level of accuracy obtained directly on the dry skeletal material with the standard digital images measuring the precision of the same dry element 14 . All these goals were carried out to validate the relevancy of some morphometric parameters in the forensic examinations as well as to provide the populationspecific patterns for sexing proximal femoral epiphysis.
In recent years, computed tomography (CT) has proved to be a suitable tool for the estimation of sex (e.g., tali and radii, os coxae and sternum), providing reliable and precise results comparable to those obtained by the traditional morphometrics [15][16][17] . However, the number of studies that made use of the clinically relevant CT database to quantify the sex differences in the proximal femur and develop the accurate standards for that purpose is still low in the current literature 18,19 .
Therefore, the aims of this study were: to examine how accurately the proximal epiphysis of the femur predicted sex in a sample of adult living population of Spain employing the data derived from the CT scans and traditional osteometry; to explore and validate some discriminant functions obtained from the skeletal remains in the sex assessment using the medical imaging dataset to formulate new discriminant functions based on the same sample, and, to compare the classification success rates achieved in several ethnicities for the same dimensions by means of the same or different approaches.

Methods
This study was performed on a randomly selected sample consisting of a total of 146 CT clinical scans (73 male and 73 female subjects) aged between 17 and 84 years (male mean age was 62.63 ± 14.86 and female one was 56.44 ± 13.09 years) who were referred to the abdominopelvic, abdominal and thoracoabdominal CT scanning between 2009 and 2011. The material examined was conceded to the Laboratory of Physical Anthropology at the University of Granada by Castilla-La Mancha Health Care Service (SESCAM). The subjects with a history of femoral pathology, or surgery were excluded from the study. To describe the anthropometric measurement error and assess the side differences, a random sample comprising 30 specimens (approximately 20% of the cases) were measured twice by the first anthropologist on different days and it was also analysed by the second examiner. This sample confirmed the symmetry and then, for the rest of the sample, only one side was measured.
Some DFs built in our recent study 4 from a data set of 186 adults' femurs (109 female and 77 male), derived from the San José identified skeletal collection housed in the Laboratory of Anthropology at the University of Granada, Spain, were employed to validate their efficacy on the sample obtained from the medical imaging data.
In compliance with the Spanish Law (Article 16, Law 41/2002; see also 20 ), the patients' data were anonymized at the source before the anthropologists received them, with only the sex and the age information retained. The CT scans we used for this study were saved in the DICOM files. The postprocessing was performed using the OsiriX (v the Mac OS X (10.7.2.). The 3D reconstruction was done based on 1mm and 1.25 mm thick slices and six linear measurements were obtained in the anterior and posterior views of surface rendering images (resolution 512×512 pixels).
Following the standard anthropometric techniques and literature (see below), the observers located the referent points of the variable on the surface of 3D models by rotating the bone, so that the found starting and ending points best fitted to the described length. The distances and their respective values in centimetres were subsequently established by the same software. The selected dimensions are illustrated in Figure 1 and described as follows: -Greater-lesser intertrochanteric distance (GLT): the intertrochanteric distance corresponds to the distance between the apex of the greater trochanter and the apex of the lesser trochanter 21 . -Length STH: the straight distance measured between the inferior point of the length STD and the superior point of the length VHD, respectfully. It is a distance devised by Kranioti et al. 14 and also used in a study on an Egyptian population 11 . The assessment of classification accuracies was performed by applying the discriminant functions (DFS) developod from the San Jose sample of dry femora to the medical image sample data.

Statistical analyses
The statistical analysis was performed using the software program SPSS v.24 (IBM, Somers, NY, USA). The descriptive statistics of the anatomic dimensions were obtained for each of the measurements. The normal distribution of data was evaluated by the Kolmogorov-Smirnov test. To assess the side differences, the paired t-test was applied and to describe the anthropometric measurement error, the technical error of measurement (TEM), the relative technical error of measurement (%TEM), and the coefficient of reliability (R) were calculated [24][25][26] . The comparison between the mean values of both groups was performed using the t-test. The stepwise DFA was performed to formulate the univariate and multivariate discriminant equations. The leave-one-out classification procedure was used to demonstrate the accuracy rate of the original sample and the one created by the crossvalidation. The posterior probabilities were computed for each model. The p-value of less than 0.05 was considered statistically significant. We previously determined the method to be acceptable when at least 85% of individuals were correctly classified, with sex-bias lower than 5%.

Results
The results of the measurement error for each variable are presented in Table 1. The intraobserver %TEM and R values range from 0.988% to 2.396% and from 0.950 to 0.988, respectively, while interobserver %TEM and R vary between 1.160%-2.468% and 0.951-0.981, respectively. The symmetry was also confirmed by the paired t-test at a significant level of 0.05.   Table 2 presents the mean and the standard deviation by the sexes for each measurement. The average for males exceeds the average for females in all cases. Furthermore, the results of Student's t-test reported in the same Table demonstrate highly significant differences between the sexes.
The DFs to the medical image sample data and the classification accuracies obtained range from 74.4% to 90.7%. The results are given in Table 3. For the abbreviations see under Table 1. Table 4 presents the coefficients of seventeen discriminant function equations, four univariate (1 to 4), six bivariate (5 to 10), six three-variate (11 to 16) and one using four variables which have the fewer attribution errors and better separate the two groups. The sectioning points are all zero (making the corresponding calculations). From the univariate functions, the threshold values can be calculated as the absolute value of the constant divided by the coefficient of the variable (the slope model). In this study, the threshold value for STH is 87.91 mm, for GLT 57.90 mm, for VND 32.563 mm and for VHD 43.97 mm.
These classification rules developed, can be considered an accurate and easy way to help differentiate sex. The use of the discriminant coefficients in Table 4 is as follows: multiply each measurement by the appropriate coefficient and add to the constant; a value greater or equal than the sectioning point, zero (≥ 0) is classified as a male, and a value less than zero (< 0) is classified as a female. For example, using the discriminant function 12, an adult with the following measurements: is classified as male.
The Wilk's Lambda values, which measure how well each function separates cases into groups, were calculated (smaller values indicate greater discriminatory ability of the function). Table 4 also presents the accuracy percentages, cross-validated accuracy percentages and posterior probabilities for all of the DFs developed. The percentage of correct assignation ranges from 85.6% to 97.3% (85.3% to 97.3% after cross-validation).
Out of six variables analysed in the current paper, five coincided in several studies focused on sexing the proximal femur. Their classification accuracies are compared in Table 5.

Discussion
Computed tomography is increasingly proving its forensic relevancy in the osteological sex assessment. This imaging technique facilitates easy, rapid, non-invasive and direct examination of unknown deceased individuals. This way, an extensive and time-consuming maceration procedure can be avoided. The acquisition of 3D volume rendered images enables detailed inspecting and visualizing of any osseous structures and consequently accurate virtual measurements. The CT scan method can be a highly useful option in the mass fatality incidents where the state of recovered remains (fragmented, semifleshed, mummified, charred) does not allow the traditional forensic procedures to be carry out correctly (e.g., the manual data acquisition). Additionally, in the absence of suitable skeletal collections, the multislice CT (MSCT) scans can serve as a reliable alternative source of contemporary data from which specific morphometric standards for the estimation of sex can be developed, or validated.
In the present study, we aimed to explore how accurately the proximal epiphysis of femur predicts sex in a sample of adult living population of Spain by applying the traditional osteometry to the data derived from the 3D scans and to compare the accuracies of our findings with those formerly obtained in other relevant studies. To that end, several anthropometric measurement errors were calculated for the six variables selected for the study. According to Weinberg et al. 26 , the REM scores revealed a very good inter-and intraobserver reproducibility. Following Ulijaszek and Kerr 24 , we took into account a cut-off value of 0.95, i.e., a measurement error of up to 5%, which leads us to consider the R values greater that 0.95 to be sufficiently precise.   Coefficients and constants are to construct the discriminant equations; a Jackknife leave-one-out method for cross validation is used. For the abbreviations see under Table 1. The ranges obtained in Table 1 confirmed a high level of repeatability for all the dimensions considered, indicating that accurate osteometric measurements can be obtained from the reconstructed 3D-CT image data and that this approach seems to be suitable and reliably for the assessment of the proximal femoral epiphysis.
Three models developed in our recent study on dry femora 4 were validated on the sample of 3D image data. Two out of three functions show a possible applicability in sexing skeletal remains. The poor result that VDN exhibited for the female group could have to do with the secular changes that affected female VDN, contrary to the male one which was not notably altered by this trend, as stated previously in the studies conducted on French, Caucasians and Afro-Americans, born prior to the turn of century as well as those born after 1910, respectively [6][7][8][9][10]  studies assert that the secular increases in the female neck morphology decreased the distance between the male and female distributions and consequently led to a decrease in the overall classification success accuracy rate. It is possible that our original skeletal sample 4 , tested on the clinical data, was also affected by the trend due to the fact that 39.78% of it comprised the individuals born before 1909 (see Table 3, also Ref. 10). This analysis was carried out to ascertain whether the imaging-based models performed worse, better, or were comparable to those previously formulated from the sample comprising skeletal remains. Due to the fact that the compatibility was established for the two functions, the forensic contexts will determine which of them will be more appropriate to apply. Nevertheless, the formulae obtained from the CT scans are supposed to be used when they show better predictive ability; if the dry bone standard for the determinate variable is not available; or, in the identification of mass disaster victims when traditional forensic methods cannot be a choice. In such circumstances, when a rapid and accurate sex assessment is a crucial factor, both forensic pathologists and forensic anthropologists who work closely in the identification of human remains can use the CT scans. In case of degraded and contaminated DNA and severe soft tissue injuries, the identification tasks can be very complicated for a forensic pathologist with the CT inclusive. If the bone fragments are better preserved, the imaging technique will be in favour of the forensic anthropologist. After the scanning of recovered remains, the 3D reconstruction and the elimination of the soft tissue will be provided by the imaging software. The measured data will undergo the multivariate statistical analysis. Then, the results of discriminant functions will be compared with the corresponding sectioning points established for each function (zero, in this case). The bones are classified as male or female based on whether the discriminant scores were higher or lower than the sectioning points. Finally, the formulae previously developed for the examined anatomical region would be applied to assess the sex of the deceased person.
The multivariate DFA, to which our CT-scan data was subjected, showed that the most accurate single parameters were VND and VHD with 90.4% and 91.8% of correct classification after the cross-validation, respectively. Because of their high correlation, a model comprising both variables would not be as useful as other patterns achieved when any of these were combined with GLT and FNL. Although the latter performed more poorly as an independent model, it gave noteworthy results in the groups with GLT, VDN and VHD, which was the main reason to include it in the finally selected ones. We emphasize that GLT and STH as single prediction models were less sexually dimorphic than the others obtained here (below 88%). However, they gave more accurate functions when joined together, or combined with other variables selected for the study. When these formed a group based on two, three and four variables (see Table 4), the prediction accuracy increased up to 97.3%.
We assessed the percentages of correctly classified individuals for five dimensions that the present survey and the studies on a variety of different ethnicities had in common when different approaches for assessing the proximal femur were employed. We found that there is no significant difference for the Spanish population in the measurements taken by MSCT when compared with the measurements of defleshed bones, except for VHD, which better assigned sexes on virtual models. On the other hand, our accuracy rates are generally in consonance with those obtained from the skeletal samples of other populations. Furthermore, the CT measurements provided remarkably higher percentage of correct classification with respect to those obtained from the 2D digital radiographs both of living subjects and skeletal remains. As Rubin et al. 27 asserted, the standard radiographs are somewhat limited for a precise morphometric analysis due to the lack of 3D data on a planar X-ray which most likely introduce errors to the final geometry. Such distortion was not observed in our 3D-CT images, which was subsequently reflected on the percentage of correct classification (see Table  5 and Ref. 11 and 13).
Overall, our results suggested that 3D-CT is a suitable alternative tool for objective quantification of osteological data that can provide highly accurate models for estimating sex. The standards developed here should be considered as specific for the Spaniards. The possible applicability to the other Mediterranean populations needs to be examined on comparative samples of osteometric and CT data. Moreover, further research based on the morphometric evaluation using the CT imaging technique are needed in order to expand the number of anatomically relevant features that could enable novel and reliable modern population standards applicable for identification in forensic settings.

Conclusion
This study demonstrates that the clinical 3D-CT images-based linear measurements are reliable alternative method for the assessment of the proximal epiphysis of the femur in the modern adult population of Spain. Overall differences between the traditional bone measurements on skeletal sample-and the 3D-CT patient's images, respectively, are negligible for the Spaniards and can be alternatively used. They are generally in consonance with those previously accomplished from groups of different geographical origin. In comparison with the 2D technique, it was the 3D-CT approach that provided a remarkably higher percentage of predictive ability. The discriminant functions can be extremely useful in the assessment of fragmented femurs, especially in the mass disaster victim identification, where a direct morphometry is difficult to apply and the image processing techniques such as computed tomography is the only option remained.