MONTGOMERY–ASBERG DEPRESSION RATING SCALE IN CLINICAL PRACTICE: PSYCHOMETRIC PROPERTIES ON SERBIAN PATIENTS MONTGOMERI–AŠBERGOVA SKALA ZA PROCENU DEPRESIVNOSTI U KLINIČKOJ PRAKSI: PSIHOMETRIJSKA SVOJSTVA NA PACIJENTIMA U SRBIJI Authors

Background/Aim. Various rating scales for depression are avalable, but the Montgomery-Asberg Depression Rating Scale (MADRS) is one of the most frequently used scales. The aim of this study was to analyze the measurement properties of the MADRS Serbian version for quantifying depression severity in the clinical setting. Methods. Two studies have been conducted in order to validate the MADRS. The first study included sixty-four adult patients with major depressive disorder (MDD), with test-retest situ-ation, and the second one included 19 participants (also with MDD), who had six test-retest situations. Psychomet-ric evaluation included descriptive analysis, internal con-sistency and test-retest reliability, and concurrent validity (correlations with the Hamilton Depression Rating Scale 17 ? HAMD-17). Results. The internal consistency for test-retest reliability was 0.93 in total for the MADRS, and for six test-retest situations was 0.95. The MADRS had one fac-tor structure, with explained variance of 66.26% for the first testing, and 61.29% for the retest. There were statistical sig-nificant correlations between the MADRS and HAMD-17 (r = 0.96 for test and r = 0.94 for retest). Also, it was shown a great correlation between all items on the MADRS, and for the instrument in total (r = 0.89). Conclusion. The MADRS was shown good statistical results, and it could be used in everyday clinical practice for discriminating MDD.


Introduction
The diagnostic code for major depressive disorder (MDD) is based on episodic course, current severity, presence of psychotic features, and remission status (1).Quantifying MDD severity and defining remission in research and clinical settings is mainly based on symptom rating scales, which are self-ratings or administered by clinicians.Various rating scales for depression are available (2), but the Montgomery-Asberg Depression Rating Scale (MADRS-S) is one of the most frequently used scale to quantify severity in clinical trials and everyday clinical practice (3).
Accumulated evidence from studies with different groups of people with depressive disorders indicates that the MADRS has sound psychometric properties in terms of good internal consistency, test-retest stability, and convergent validity (4,5,6,7).It was also shown that the MADRS total score has sound construct validity for an unidimensional measure targeting core depressive symptoms (4,5) and it provides the most accurate reflection of depression severity in overall (7).Some studies reported that the construct of the MADRS might be represented by two to three factors underlying different depressive symptoms, such as dysphoria, retardation, and vegetative symptoms (8,9), which should be considered in evaluating depression treatment.Good reliability and validity were also reported for the MARDS in different language versions, such as Bangla (10), Brazilian (11), Chinese (12), French (13), Korean (14), Malay (15), Persian (16), Spanish (17; 18), and Thai (19).
Research on the compatibility of the scale between the original version of the MARDS showed that there is a moderate to high association between patient and physician results (13,20).Also, it was examined whether the results of the MADRS were better when it was done with or without a structured interview, and the results showed that the scale had satisfactory reliability, regardless of whether the structured interview was used or not (3).Analyzing each item individually, MADRS has all responsive responses and the end result is more sensitive to changes in treatment (21).
MADRS shows greater sensitivity in distinguishing between moderate and severe depression compared to HAMD (sensitivity 93.5%, specificity 83.3%) (22).Also, in comparison with HAMD, significantly higher results are obtained, and it is considered to be a calibration of the scope of both instruments, that is, that the results would be equated if the cut-off score for MARDS depression was 12, instead of the original 6 (23).Possible shortened versions for HAMD and MADRS were also examined without items related to somatic symptoms (e.g.sleep, appetite, etc.) (24).In case only a rough screen is needed, a short version of the instruments can be used, but if the scale is used for diagnostic purposes, then it is recommended to have a full version of both scales.

Participants
All adults aged 18 year and above admitted to daily hospital between June and September 2017 were eligible.The main inclusion criterion was the diagnosis of a unipolar MDD episode.Exclusion criteria were the presence of any other psychiatric and/or neurological disorder or a major somatic problem (e.g.chronic illness, impairment).All patients were diagnosed according to the International Classification of Diseases 10th revision (ICD-10; 27) and to all was initiated some kind of treatment; antidepressant medications, social therapy, and/or psychotherapy.
In the research participated a total of 64 patients-from which are 36 females (56.3%), and 28 males (43.8%).Age of subjects varied from 24 to 68 years old, with mean of 46.11 years (SD=10.85).The subject who are included in the study were only those ones who provided all the data, and only they were considered in each shown analysis.
Assessments.The MADRS was administered to all subjects independently by the first author.The same rater administered the HAMD-17.The MADRS and HAMD-17 were administered again to all subjects by the same rater two weeks later (test-retest assessment).Only subjects who appeared on the scheduled assessment after four weeks were assessed with the MARDS.

Psychometric analysis
The reliability assessment of the MADRS included internal consistency tested by the intraclass correlation coefficient (ICC, the two-way random method of absolute agreement; 28).Concurrent validity was assessed using Pearson's correlation coefficient, and paired sample t-test for comparing between item results.

Results
With paired sample t-test was estimated an impact between items on HAMD-17, and items on MADRS respectfully.It has been shown that there is statistically significant difference between several items (Table 1 and Table 2).Statistically significant results for both test and retest situations were items listed above, and with Cohen's d, we found that HAMD-17 items 3, 6, 7, 10, 11, and 13 have small impact, item 9 has no significant impact, and only item 8 has moderate impact.As for MADRS, results have shown that items 1, 6, 7, 8, and 10 have significant, but small impact, and item 3 has no impact, as it is shown by Cohen's d.For both instruments sums are statistically significant, and d has small effect size (d=0.31for MADRS and d=0.32 for HAMD-17).
The ICC for test-retest reliability was 0.93 in total (95% CI 0.88-0.96;p<0.001) for MADRS, and 0.92 for HAMD-17 in total (0.88-0.95; p < 0.001).As for each item, all items on MADRS have significant and large impact (ICC=0.76-0.94),and HAMD-17 has given similar results.Exception is item number five, about transitory insomnia, where is no statistical significance.All other items have ICC values that are high and significant (ICC=0.81-0.92).These results have shown that both instruments are stable through time, and that they could show changes in patient's reaction in treatment of depression.
All items on HAMD-17 show significant reliability, with α=.89 or higher, and α=.91 or higher for MADRS.By George and Mallery, all α values above .7 are acceptable, .8 are good, and .9 are excellent (29).Following that rule, in this research it has been shown that MADRS has better reliability coefficients for each item than HAMD-17, but the total scores have shown similar reliability that is considered excellent (HAMD-17: α=.94 for test, α=.95 for retest; MADRS: α=.95 for test, and α=.94 for retest).
The correlation results have shown that there are high correlations between items on test and retest (Table 3).MADRS has significant correlations for each item on test and retest, and the r varies from .62 to .89,p<.001.The sum results have also shown high correlation, r=.89, p<.001.Similar correlations were found also for HAMD-17, with correlation between items on test and retest that has shown significant correlation for all items except one (items no 5 for test and retest has shown non-significant correlations).The correlations varied from r .42 to .86, with p<.001, and for the sum correlations are also significant, r=.87, p<.001.There are statistical significant correlations between MADRS and HAMD-17.For the first testing correlation is r=0.96(p<.001), and for the retest is r=0.94 (p<.001).(17) .866*SUM (10) .878** Correlation is significant at the .001level Factor analysis has shown that it can be extracted one factor for both test and retest items (Table 4).For the test situation, it has been shown that one factor explains 66.26% of the variance, and for the retest it is explained by 61.29% of the variance.These results are as it was hypothesized, because it is supposed to be extracted one factor for MADRS, supposing that it is measuring one factor -depression.

Method
For the second study, it has been used only MADRS instrument, which characteristics are described previously.The administration was done by the fist author, and unlike the first study, where has been only one test and retest, in this one has been done six tests, respectfully.

Participants
In this study participated 19 subjects, from which 9 were females (47.4%), and 10 males (52.6%).There was one dropout from the study, because the patient (female) did not shown to control after fourth administration.The age of the participants varied from 28 to 63 (M=47.32,SD=11.06).Participants in this study had been also included in the first study, but in that study were only included first two tests.

Psychometric analysis
The similar assessments had been done like in the first study: test-retest reliability by the interclass correlation coefficient, Pearson's correlation for concurrent validity, and ttest for six testing situations.

Results
The ICC for test-retest reliability was 0.95 in total (95% CI 0.90-0.98;p < 0.001), as it is shown in Table 5. the items for test-retest situations have shown significance at level p<.001, and the ICC varied from 0.77 to 0.95.This shows that with six tests, MADRS still has good stability throughout time, at least for a period of one and a half month of the treatment in clinical conditions.
As for the reliability analysis, all six test have shown α=.91 or higher for each item, as it was the case for the sum results, that conclude excellent reliability by each item and in total for MADRS (Table 5).The correlation results have shown that there are high correlations (Table 6).MADRS has significant correlations for all six retests, and the correlations varied from r .51 to .98, with significance at p<.01 or p<.05.Higher correlations have been shown for tests that have a closer time interval, unlike those that have more distant time interval.Also, higher correlations at significance level 0.01 had been shown in the first testing, and for the sixth retest showed smaller correlation at p<.05.

Discussion
The multivariable analysis showed the scale possesses appropriate reliability and concurrent validity.The internal consistency reliability of the MADRS in Serbian is high as well as corrected item-total correlations, what pictures high homogeneity among the items in measuring the intended concept and the consistency in rating the severity across the items even when considering individual assessments (28).The ICC for the first study was 0.93 in total (95% CI 0.88-0.96;p < 0.001) for MADRS, and 0.95 in total for the second study (95% CI 0.90-0.98;p < 0.001).High internal consistency reliability for the MADRS total score, with Cronbach's alpha coefficient above 0.8, was previously observed across studies using the original and different language versions (5,7,16,19).In addition, the test-retest reliability of the MADRS in Serbian was excellent, for both studies one and two, whereas in both α is .91 and higher, indicating satisfactory stability in repeated measurements.
The factor analysis shows that one factor explains most of the variance (66.26% of the variance for the first testing, and for the retest it is explained by 61.29% of the variance), as it was expected.Other studies have found more factors that could explain variance, that is three (30), or two (31), depending on the study.This may be due to smaller sample size in our study, and should be confirmed in later research.
Finally, concurrent validity reported previously (11,15,16) was also evident for the MADRS total score for the Serbian version when tested against the HAMD-17 total score.The correlation MADRS with HAMD is high and significant (r=0.96;p<.001 for test, and r=0.94; p<.001 for the retest).Other studies have shown smaller correlations, r=0.58 (32).
Higher correlations in our research might be because of the smaller sample size, so the results might differ in the future research with bigger sample.There are also significant correlations between items on MADRS (both for test and retest; r=0.62-0.89,p<.001), and on HAMD-17 (test and retest; r=0.42-0.86,p<.001).Correlations between six tests in the second study also are significant (r=0.51-0.98),mostly on level p<.01.Significant correlations are also between MADRS and HAMD-17 (r=0.96;p<.001 for the first test, and r=0.94; p<.001 for the retest).These results are confirmed in other studies, where the correlations exist between items, and between MADRS and HAMD-17 (31; 33).

Limitations
There are several limitations of the study.First, a small number of participants did not allow to study changes in mental health in those who deteriorated during the study period.Also, the sample in both studies was small and this limits the generalizability of the studies to other settings.Further research should include a bigger sample and also comparison to a general population, for the purpose of better validity testing.The bigger sample is referred to both first and second study.
summary, this study of the in Serbian demonstrated that it is appropriate measure for routine, clinical assessments of individuals with MDD.It showed that the measure could produce reliable and valid assessments of MDD severity and is possible to distinguish a clinically important improvement from measurement error with a large amount of certainty.However, with awareness of the limitations of the present study, additional investigations will be needed with different samples in order to set the MADRS as a gold standard in routine psychiatric practice.
Serbian for the MADRS instrument was done twice.The first translation was made in 2008 and the second in 2012 (Burkov, M., 2008 for TransCom Global Ltd. and MAPI Institute, 2012) (Appendix 1).MADRS in Serbian language has not yet been standardized.The aim of this study is to analyze the psychometric properties of the MADRS Serbian version in the clinical settings.Study 1

Table 1
HAMD-17 test and retest results using t-test, ICC, and reliability for each item

Table 3
Pearson's correlation for HAMD-17 and MADRS for test and retest by items

Table 4
Factor loadings and communalities based on a principal analysis for 10 items, for both test and retest situations

Table 5
MADRS six test-retest results using ICC and reliability for each item

Table 6
Pearson's correlation for MADRS for six test-retest situations