The Depression Anxiety and Stress Scales – 21: Bifactor Statistical Indices in Support of the Total and Depression scores

This study explored several, latent factor models of the Depression Anxiety and Stress Scales–21 (DASS–21) using both a sample of clinically depressed patients and a Facebook sample from Serbia. The DASS–21, the Beck Depression Inventory–II, and the State Trait Anxiety Inventory-Trait were administered to a sample of depressed individuals ( N = 296; M age = 52.21, SD age = 11.56). A Facebook sample ( N = 376; M age =29.12, SD = 8.96) completed the DASS–21 only. A bifactor model with one general distress (G) and two specific factors (Depression and Anxiety) were supported. The three factors had high omega coefficients, whereas omega hierarchical for Depression and Anxiety were low. Based on all evidence from our study, external validation, factor determinacy, and replicability, we concluded that the Serbian version of the DASS–21 assesses reliably general distress and anhedonia in both people with the clinical level of severity of distress and in general population. The Anxiety subscale can be safely used in clinical settings when one wishes to assess individuals presenting with a mixture of distress and anxious arousal. However, until further validation support is provided, we do not recommend use of this scale when the task is to estimate precisely anxious arousal only. The Stress subscale assesses general distress only. Low omega hierarchical coefficients of the Depression and Anxiety subscales could be addressed by re-selecting their items from the initial item pool, based on which the DASS–21 was created, using those with substantial loadings on both the G and their respective group factors.

subscales might be due to a common vulnerability factor which influences all three states. Their explanation was influenced by the Tripartite model of anxiety and depression (Clark & Watson, 1991).
This model posits that negative affectivity (NA) is a dimension common to both depression and anxiety, but each syndrome has its specific characteristics: depression is characterized by low positive affectivity, whereas anxiety is characterized by physical hyperarousal (Clark & Watson, 1991). Hence, responses to the Depression and the Anxiety subscales seem to reflect two sources of variance: one general dimension (NA) and specific dimensions of low positive affectivity and high physiological arousal, respectively.
However, the status of the Stress scale has been more controversial. On the one hand, it was suggested that it was defined by the items tapping an affective state different from those captured by the Depression and Anxiety subscales but that it shares with them a common vulnerability such as, for example, negative affectivity or common environmental influences (Lovibond & Lovibond, 1995b).
Others have also argued for its specificity, in addition to its common element of NA, in both English and non-English speaking countries (e.g., Bottesi et al., 2015;Henry & Crawford, 2005;Jovanović et al., 2014). However, the exact nature of the Stress scale has remained elusive in the literature. For example, Lovibond and Lovibond (1995b) have suggested that this scale measures the constructs similar to those originated from the stress research (e.g., appraisal and coping). On the other hand, Brown, Chorpita, Korotitsch, and Barlow (1997) have argued that this scale is a measure of general negative affect/distress. Additionally, some obtained less conclusive evidence for existence of the Stress factor in their factor analytic studies (e.g., Szabό, 2010). Finally, the specificity of both the Stress and Anxiety factors (Chin, Buchanan, Ebesutani, & Young, 2018) and all three DASS-21 specific factors have been questioned recently (Osman et al., 2012).
Given the proposed idea that the variance of the DASS-21 items stems from two sources simultaneously: one general dimension (NA) and the specific dimensions (low positive affectivity, high physiological arousal, and possibly stress), the most promising analytic tool to discern the issue of specificity versus non-specificity is bifactor modelling (Reise, Bonifay, & Haviland, 2018). Using this statistical tool in both clinical (Bottesi et al., 2015) and non-clinical samples (Henry & Crawford, 2005;Jovanović et al., 2014), various authors supported the specificity of the three DASS-21 subscales. In these studies, a bifactor model with one general (G) factor and three specific factors was superior to a correlated three-factor solution or a bifactor model with one G factor and Depression and Anxiety as specific factors while the stress items were constrained to load only on the G factor. However, Tully, Zajac, and Venning (2009) and Szabό (2010) opted for a bifactor solution with two specific factors in adolescent samples. More recently, Chin et al. (2018) reported the results of both exploratory and confirmatory bifactor models on non-clinical samples supporting a bifactor solution in which only the revised Depression subscale had sufficient validity.
However, in many bifactor modelling studies, the conclusion regarding the specificity of all three scales was reached by focusing mainly on the overall model fit. The authors rarely reported additional indicators such as existence and the number of substantive factor loadings per specific factor or how much reliable variance remains within the scales once the reliability due to the G factor has been accounted for. However, the studies which considered these additional criteria, on non-clinical samples did not provide evidence for sufficient specificity of all three DASS-21 subscales (Osman et al., 2012), or the Anxiety and Stress scales (Chin et al., 2018). No bifactor modelling study on clinical samples, with detailed reliability and dimensionality indices, has been reported, leaving the opportunity that some inconsistent findings might have arisen from insufficient reporting practices. Also, examining a clinical sample and a sample drawn from a general population (i.e., Facebook sample) in a single study could clarify if factor structure may have varied in prior studies, in part, because of a possibility that the DASS-21 could have different factor structure across various levels of negative affectivity seen in different samples.
In contrast to the recent bifactor studies in non-clinical samples, a review of literature suggests that most of the factor analytic studies in clinical samples provided support for the correlated threefactor model (e.g., Antony et al., 1998;Clara et al., 2001). Even though bifactor modelling has been offered as a useful tool in discerning dimensionality of psychological instruments (e.g., Reise et al., 2018), it also has certain limitations such as the problems of overfitting or better fitting in comparison to second-order or correlated models even when the tested model does not have a bifactor structure (Eid, Krumm, Koch, & Schulze, 2018;Watts, Poore, & Waldman, 2019). Hence, it is advisable to base decisions about the preferred model not solely on the model fit but also comparison of external validity of competing models (Watts et al., 2019).
Besides comparing the bifactor and correlated-factor models, we believe that it would be informative to test a second-order model with three first-order factors (anhedonic depression, physiological hyperarousal, and stress), and one higher-order factor. Even though this model is statistically equivalent to the correlated-model, the second-order model can provide more interpretable solution, especially in the area when one can hypothesize the existence of one higher-order factor such as general negative affectivity (Chen, Sousa, & West, 2005).
Also, a comparison between the second-order and bifactor models deem important because they can inform our understanding the way the G and putative specific factors influence the common DASS-21 variance. Both models have been used to represent various psychopathological phenomena which are believed to be hierarchically organized (see Markon, 2019 for a review). However, they differ in important ways. The second-order model hypothesizes that the specific factors, which are subordinate to the G factor, mediate the effects of the G factor on the DASS-21 item variance. On the other hand, the bifactor model tests the possibility that both the G and specific factors directly and independently influence the common DASS-21 variance (Reise et al., 2018).
Apart from factor analytic studies, another line of evidence regarding construct validity of the DASS-21 explores its relations with various dimensional measures of stress, depression and anxiety (e.g., Jovanović et al., 2014;Lovibond & Lovibond, 1995b), and also with different symptom measures (Brown et al., 1997). Based on these relations, some authors concluded that the Depression and the Anxiety scales have a certain amount of specificity whereas the Stress scale taps into the general distress (Brown et al., 1997). Given the recent finding that the most of the DASS-21 item variance stems from one general dimension (Osman et al., 2012), it remains unclear what underlies different correlation patterns between the specific DASS-21 scales and other external criteria i.e., whether the systematic variance due to the G factor, the specific factors, or both are responsible for the observed relations. Additionally, a number of validation studies conducted in clinical samples is still small. More importantly, in these studies there was no correspondence between the supported factor analytic models of the DASS-21 and the scores used for validation. For example, Bottesi and colleagues (2015) supported a bifactor model; however, inferences about validity were based on partial correlations rather than the scores implied by the supported model. Given that the bifactor model was supported, a more appropriate way to analyze data was to use general and residualized group factor scores (i.e. what remains after the G factor was accounted for) while exploring their relation with external measures.
The aim of this study was to provide a comprehensive test of several competing, latent factor models of the DASS-21 that were reported in research studies using both a sample of clinically depressed patients and a Facebook sample from Serbia. The tested models were: a) the single-factor model; b) the correlated three-factor model; c) the second-order model with one second-order factor subsuming three specific factors; d) the bifactor model with three specific factors; and e) the bifactor model with two specific factors (Depression and Anxiety) while the stress items were allowed to load only on the G factor. The last model has never been tested in clinical samples, and some previous studies reported contradictory findings in general population (e.g., Henry & Crawford, 2005;Szabό, 2010). Using the samples, drawn from different populations, can clarify whether the DASS-21 factor structure varied across prior studies partly because of the symptom severity.
Another aim was to provide further evidence for validity of the DASS-21 scale. In this study, convergent validity of the DASS-21 scale was tested using the Beck's Depression Inventory (BDI-II) and the State-Trait Anxiety Inventory (trait form; STAI-T) in a sample of depressed individuals, but using the scores supported by the latent structure of the instrument. In light of recent criticism of bifactor modelling, especially its tendency to overfit, it would be important to compare contributions of the factors stemming from the bifactor and other competing models while predicting external criteria.

Method Participants
A patient sample was recruited from nine psychiatric hospitals in Serbia, in a period from 2014 to 2016. 302 patients (52% females; 48% inpatients; Mage = 52.21 and SDage = 11.56) signed consent forms to participate in the study. Most participants had high school education (41.1%). Patients were classified, according to the International Classification of Mental Disease-10, by one mental health professional based on clinical interviews, the results of clinical psychological testing, and case notes.
Psychosis and dementias were exclusionary criterion.
A non-patient sample included 376 participants (76.9% females; Mage =29.12, SD = 8.96) who were recruited via Facebook in 2015, using a Google forms online survey. Participants were informed that the survey was anonymous, and if they accepted to fill out the survey they would be asked about their typical emotional, cognitive, and behavioral reactions and experiences in everyday life. Study invitation was shared in different student Facebook groups.

Measures
The Depression Anxiety and Stress Scales-21 (DASS-21). DASS-21 (Lovibond & Lovibond, 1995a;Jovanović et al., 2014) is a self-assessment inventory for registering presence of the symptoms of depression (e.g., "I couldn't seem to experience any positive feeling at all"), anxiety (e.g., "I was aware of dryness of my mouth"), and stress (e.g., "I found it hard to wind down") in the past two weeks.
It consists of 21 items, with seven items per scale, followed by a 4-point Likert scale (from 0 = did not apply to me at all to 3 = applied to me very much, or most of the time). Psychometric properties of the Serbian adaptation of the instrument were previously reported (Jovanović et al., 2014). In the present study, internal consistencies (α) of the scales were very high: αdepression = 91, αanxiety = .87 and αstress = .88.

The Beck Depression Inventory-II (BDI-II; Beck, Steer, & Brown, 1996; Mihić & Novović,
2019) is a multiple-choice, 21-item (e.g., "Concentration Difficulty") self-report measure of severity of symptoms of depression. Validity and reliability indices of the Serbian adaptation of the BDI-II were reported previously (Mihić & Novović, 2019). Each answer is scored on a scale ranging from 0 to 3. In the present study, internal consistency was very high, α = .94.
Validity and reliability indices of the Serbian adaptation of the STAI were reported previously (Mihić & Novović, 2018). Each item is followed by a 4-point Likert scale (from 1= not at all to 4 = very much so). Its internal consistency in the present study was very high, α = .95.
All three questionnaires are adaptations of the original questionnaires into the Serbian language.

Data Analytic Plan
In the clinical sample, a small number of missing values (from 0.4 % to 3.9%) was replaced using the Expectation Maximization algorithm implemented in the SPSS software v21 (IBM corp., 2011). Two patients were identified as univariate outliers on the standardized scores, then winsorized.
There were no missing values and univariate outliers in the Facebook sample.
Multivariate outliers (6 patients and 10 Facebook users) were excluded from the data sets.
For better understanding of the DASS-21 dimensionality the following indicators were calculated: omega for the total score and the subscales (ω and ωs), omega hierarchical (ωh), omega hierarchical for each subscale (ωhs), H index, and Factor determinacy (FD; Dueber, 2017). ω and ωs reflect the proportion of reliabile variance attributable to all sources of common variance i.e., variance due to the G and specific factors (Reise et al. 2018). ωh coefficient estimates how much of the total DASS-21 score variance is attributable to the G factor, whereas ωhs reflects systematic variance that is left once individual variability due to the general factor was partitioned (Reise et al., 2018). The value of ωh > .80 suggests that the scale can be assumedto measure a unidimensional construct (Reise et al., 2018). Given that some authors pointed out to the limitations of ωhs as relilability coefficients (Rodriquez, Reise, & Havilend, 2016), we considered additionally H index as a measure of factor replicability, with the values > .80 suggesting well-defined latent variables (Hancock & Mueller, 2001).
Finally, FD, the correlation between factor scores and the factors, was considered, with values > .90 suggesting a reliable measure (Gorsuch, 1983). However, there is also a recent recommendation that the FD 2 or H values above .70 might suggest the presence of specific factors (Rodriguez et al., 2016).

Confirmatory Factor Analysis (CFA)
Five CFA models were tested in both samples. Fit indices for all models are shown in Table 2.
In the clinical sample, all fit indices for Models 4 and 5 were satisfying, showing that these two models fitted the data slightly better than the other models. However, the lower AIC value for Model 4 in comparison to Model 5 was in favour of the former (ΔAIC > 10; Burnham & Anderson, 2004). In the Facebook sample, the best fitting model was Model 5 while Model 4 did not converge. Table 2 about here<< Standardized factor loadings for best fitting models in both samples are shown in Table 3a.

>> insert
Given that it was difficult to select one best fitting model in the clinical sample based only on the fit criteria, both Models 4 and 5 were considered. As can be seen, all items had significant loadings on the G factor in all models. All items from the Depression factor had significant loadings in all models in both samples. All items that were expected to load on the Anxiety factor had significant loadings in both samples, with the exception of the A9 item in the clinical sample ("I was worried about situations in which I might panic and make a fool of myself."). However, none of the Stress items loaded significantly on their putative specific factor in Model 4 in the clinical sample. Hence, although the fit indices were slightly in favour of Model 4 compared to Model 5 in this sample, the size of the factor loadings suggested that there was not enough evidence to extract the Stress factor.
We also considered various realiability indices (see the lower part of Table 3) obtained for Model 5 in both samples. The values of ωh suggest that 92% and 91% of the variance for the composite DASS-21 score was accounted for by the G factor in the clinical and Facebook samples, respectively.
Even though ωhs for the Depression and the Anxiety subscales were high in both samples, their relative omega coefficients suggested that only 20% (.186/.938) and 22% (.204/.909) of the reliable variance within these scales left once the reliability due to the G factor was controlled for in the clinical sample.
Similar statistics for both scales were obtained in the Facebook sample, 24% and 25%.
Finally, only the G factor satisfied both sets of recommendations for FD and H in both samples (see the data analysis section). However, the conclusions about replicability and determinacy of the Depresssion and the Anxiety factors varied, depending on the criteria. According to one set of recommandations (Gorsuch, 1993;Hancock & Mueller, 2001) Table 3). Table 3 about here<< As can be seen from the Table 3, the Stress factor was not detected in the best bifactor model in both samples. However, it was well-defined in the correlated three-factor model (see Table 4). This model had satisfactory fit indices in both samples, which were very close to the bifactor models (Table   3). As was pointed out in the introduction, model fit indicators are not sufficient to support validity of the bifactor model over the competing models. Hence, the question of its validity was addressed next.

Convergent Validity of the DASS-21 Scores in the Clinical Sample
Before examinig validity, several comments are in order. In the best fitting bifactor model in both clinical and Facebook samples, the G factor appeared realiable and stable unequivocally.
However, additional indices for the DASS-21 Depression and Anxiety subscales were somewhat mixed, with their ωhs being most troublesome. However, some concerns have been rasied about the limitations of ωhs as relibility coefficients (Rodriguez et al., 2016). Besides, the Depression and Anxiety factors, despite their small number of items and low to moderate loadings, had satisfactory FD and ωs values as well as stability in our two samples. Therefore, we decided to examine their validity, in addition to G. We created latent variables based on the best fitting model in order to investigate the differential associations with the external variables. We wanted to determine whether these latent variables can incrementally predict external criteria in a theoretically meangfull way after controlling for the G factor variance. For example, we wanted to see whether the Depression and Anxiety factors from the DASS-21 could predict the BDI-II depression and the STAI-T anxiety over and above the G factor. Finally, given that the Stress factor does not exist according to the bifactor model but it does according to the correlated-factor model, we compared the predictive power of the latent variables from both models. The G, Depression, and Anxiety factors from the bifactor model were positive and significant predictors of the BDI-II scores (see Table 5 Table 5 about here<<

Discussion
In this study we explored the structural characteristics of the DASS-21 in two samples, clinical and Facebook. We identified several methodological issues in the previous studies which dealt with the structural characteristics of the DASS-21, precluding making clear conclusions regarding its structure.
In our study conducted in Serbian, we contrasted several factor analytic models that were reported in the literature and found a strong support that all DASS-21 items measure a single underlying construct in both samples, and that its bifactor structure is similar across clinical and Facebook samples. All items had substantial factor loadings on the G factor and this factor accounted for 92% and 91% of the common variance in the clinical and Facebook samples, respectively. The items from all three subscales were strongly and similarly related to this factor, supporting the interpretation of the G factor as a tendency to experience general distress i.e., a tendency to experience a mixture of negative emotions such as depression, anxiety, and to be stress-reactive. This interpretation of the G factor is consistent with our finding that the G predicted both the BDI-II and the STAI-T scores. The BDI-II measures severity of depression whereas the STAI-T measures a composite of negative affect and anxiety (Bados, Gomez-Benito, & Balaguer, 2010). Hence, the general DASS-21 factor seems to represent severity of general distress. Finally, similarity between factor structures in the clinical and Facebook samples supports the idea that the structure of negative affect, measured via the DASS-21, is isomorphic across individuals regardless of the intensity of their affect.
Existence of the G factor, at least when emotional disorders are considered, is in accordance with some older theories purporting to explain comorbidity among emotional disorders, such as the tripartite model (Clark & Watson, 1991), but also with more recent theorizing such as the hierarchical taxonomy of psychopathology (e.g., Kotov et al., 2017). According to these theories, comorbidity between anxiety and depression is explained by the existence of one common vulnerability contributing to each of these conditions and their mixed states. It is usually viewed as a temperamental liability such as negative affectivity and/or neuroticism. Hence, those wishing to pursue preventative research in this area or treatment selection should feel safe to use the total Serbian DASS-21 score to identify vulnerable individuals. Evidence for the existence of two specific factors, Depression and Anxiety, in both samples was somewhat mixed and more difficult to discern. There were several indicators supporting existence of these two specific factors: a) a sufficient number of items with satisfactory loadings on their respective specific factors, b) omega coefficients (ωs) for the Depression and Anxiety subscales were over. 90 in both samples, suggesting that both are highly reliable multidimensional composites i.e. their unit-weighted scores are precise estimate of all sources of common variance, c) their FD scores were acceptable, d) they replicated between the two samples in our study, and e) both specific factors predicted the external criteria, controlling the contribution of the G factorthe Depression factor predicted both the BDI-II and the STAI-T whereas the Anxiety factor predicted only the BDI-II. The relation between various depression scales and the STAI-T is often reported in literature (e.g., Bados et al., 2010), demonstrating once again that the STAI-T items reflect depression and negative affect rather than anxiety. The predictive relation between the Anxiety factor and the BDI-II seems somewhat surprising, but it could suggest that somatic aspects of anxiety, which are measured by the Anxiety scale, are better represented in some BDI items (e.g., irritability, agitation) then in the STAI-T. Also, this physiological arousal is often seen in depressed patients, either as a standalone symptom or a comorbid symptom with fear disorders characterized by physiological arousal (panic disorder, social anxiety disorder, or specific phobias). Finally, the STAI-T is supposed to assesses feelings of stress, worry, and discomfort (Spielberger et al., 1983), whereas the DASS-21 Anxiety subscale taps more of those fear-related symptoms. Future research should include a more encompassing set of outcome measures to demonstrate further external validity of the specific factors, especially the Anxiety factor. In particular, one might hypothesize that this subscale might predict, independently from the Depression and G factors, a wider range of fear-related constructs such as panic and panic-like symptoms or specific phobias.
However, it seems that both the Depression and the Anxiety subscales possess sufficient reliability only when it comes to assessment of a composite variance stemming from both the general distress and their respective specific factors. Once the contribution of the general factor was left out, the reliability of the subscales did not seem remarkable (see ωhs). However, as others noted, these scores represent rezidualized variables and not "observed scores" making their interpretation as reliability coefficients difficult (Rodriquez et al., 2016). Given these problems, others have suggested that among the bifactor model indices a greater ephasis should be given to H index and how much the specific factors are likely to replicate across studies (Watts et al., 2019). Considering all evidence, one can recommend that, in addition to the total score, the summed Depression subscale score can be used as a measure of anhedonia in both clinical and research settings. Concerning the Anxiety subscale, until further external validation is obtained, it is safe to conclude that this subscale has a potential to reliably estimate its specific content (physiological arousal). However, given that this subscale is a reliable composite of common and specific variance, it can reliably identify those individuals who are highly distressed and, at the same time, physiologically reactive in fear-evoking situations. Many individuals seen in therapeutic contexts can be characterized as such. Hence, use of the summed Anxiety subscale score can be safe in this context. However, if one wanted to use this scale in a research context (e.g., one wanted to examine the hierarchical structure of emotional disorders when it would be important to estimate precisely only physiological arousal) until further evidence regarding its external validity is obtained, one would be advised to use different instruments 2 . Finally, low ωhs for the Depression and Anxiety subscales might be a result of the small number of items comprising them. One could resolve this problem by reconsidering the initial item pool based on which the DASS-21 was created and to select the items which have substantial loadings on both the G and their respective specific factors.
Another option would be to consider addition of items that would emphasize the specific component of each subscale.
Different from some studies, including one previously conducted in Serbian (Jovanović et al., 2014), the best fitting bifactor model did not support retention of the Stress subscale. In the clinical sample, although the bifactor model suggested that this factor could be isolated, there was insufficient number of items to justify creation of the specific subscale. In the Facebook sample, the bifactor model with the three specific factors could not converge, suggesting misspecification of the model. However, the Stress factor was clearly identified in the correlated three-factor solution with satisfactory fit indices in both samples. However, it failed to predict the external criteria, over and above the Depression and Anxiety factor, supporting our interpretation that this subscale reflects general distress. It is unfortunate that that Stress subscale cannot tap tension, apart from the negative affect, given that some neurophysiological studies of anxiety disorders make a clear distinction between anxious apprehension (akin to worry and tension) and anxious arousal (panic) (Burdwood et al., 2016;Nitschke, Heller, Palmieri, & Miller, 1999). The Stress subscale item content seems to reflect anxious apprehension, whereas the Anxiety subscale reflects somatic anxiety. If one wants to further revise the DASS-21 in order to make this distinction, our recommendation would be to include more items that would measure apprehension of danger. It should also be emphasized that our validation measures were not perfectly matched to demonstrate validity of this scale. Hence, further studies should aim at including the measures which would be more appropriate such as physiological measures of tension, blood pressure, or patterns of brain activity.

Limitations
The study has a certain number of limitations. We focused on self-report data only. As was pointed out in the discussion section, further validation of the DASS-21 subscale might be obtained by including different sets of physiological measures. We used convenience samples. Even though both samples exceeded 200 cases, which could be considered "large" (Kline, 2005), it would still be advisible to replicate the findings given that the ratio of cases to the number of free parameters was below 10 (Jackson, 2003). Finally, bifactor models have been criticised to perform better then correlated or second-order models even when the true model does not have bifactor structure (e.g., Watts et al., 2019). However, we believe that our results support validity of the best fitting bifactor. For example, the G, Depression, and Anxiety factors were well represented by their respective indicators. The relations between the Depression and Anxiety factors and external correlates were similar in the bifactor and the correlated three-factor model. However, the Stress factor that was only identified in the latter did not recieve validation support, justifiying our preference for the bifactor model.