The assessment of effects of partialling out of socially desirable responding variance on personality traits scores in instructed applicant situations

The aim of this study was to examine the extent to which the socially desirable responding (SDR) distorts results of HEDONICA personaliy inventory (acronim based on eight dimensions of this inventory: Honesty, Disintegration, Impulsiveness, Openness, Extraversion, Neuroticism, Conscientiousness, and Agreeableness). The inventory HEDONICA was merged with components of the Balanced Inventory of Desirable Responding (BIDR) as a control inventory and was administered to a sample of 227 students under two experimental situations/ contexts, operationalized by two instructions: the standard (S) one (such as “be honest”) and the “fake good” (FG) one (such as “portray yourself in a most positive way”). Comparing scores in S and FG situations by using MANOVA, a clear distortion on all personality traits in socially desirable directions were evidenced. When, however, the BIDR subscales in the FG situation were entered into MANOVA as covariates, differences between personality scores in S and FG sitautions were considerably reduced, and became statistically insignificant on five personality dimensions. When the variance of dimensions of the BIDR inventory was removed from the variance of HEDONICA traits in FG situation, the change between intercorrelations of personality dimensions in S and FG situations did not attain statistical significance. This lead to the conclusion that the SDR bias, if even does affect test results (i.e., enhances scores in FG situation), does not affect the scale structure and predictive validity of the examined personality inventory.

personality traits (Hough, 1998;Ones et al., 1996, Ones & Viswesvaran, 1998. A low to negligible increase in validity was found when SDR is controlled, and, in certain cases, the relationships between variables disappeared indicating that SDR behaves as a meaningful personality trait (McCrae & Costa, 1983;Ones et al., 1996).
After Ellingson, Sackett, and Hough (1999), as far as SDR behaves as a personality trait, treating SDR variance as error is questionable. In a study using a within-subjects design, they compared the honest, faked, and corrected personality trait scores, in order to evaluate whether a social desirability correction is really effective, but the correction failed to fulfil this expectation. Hough, Eaton, Dunnette, Kamp, and McCloy (1990) concluded that job applicant-like individuals do not distort their responses in a considerable manner and personality scales could thus successfully be used in personnel selection. This questioned the construct validity of social desirability and the attempts to define whether social desirability constitutes error variance. The use of personality tests, particularly those based on Big Five models, in applicant selection is encouraged by findings based on meta-analyses, that their validity in prediction of individual traits is not too sensitive to various contexts (Hough & Ones 2001;Murphy & Dzieweczynski, 2005;Ones et al., 1996, Ones, Viswesvaran, & Dilchert, 2005. In a recent paper, Paunonen and LeBell (2012), by means of Monte-Carlo statistical analysis, supported the findings about only a minor decrease in criterion prediction accuracy, even if personality scores were massively infused with desirability bias.
A survey of the available literature shows that there researchers are uncertain about the effects of control scales on the validity of personality tests in applicant situations, and that new contributions would be useful.
The aim of this study was to examine the effects of partialing out of the variance of control scales on the validity of personality tests in applicant instruction conditions. HEDONICA personality inventory (HEDONICA; Knežević, 2008Knežević, , 2014 was used. Two BIDR subscales -SDE and IM were used as a control inventory. The strategy of the study was the following: if between HEDONICA dimensions measured in the two experimental situationsthe "fake good" and the "standard" one, after partialling out the BIDR variance from the dimensions measured in the FG situation: a) correlations remain unchanged, the effect of response bias would be considered to be negligible; b) correlations increase, it could be concluded that response bias suppresses authentic responding; c) correlations decrease, it could be concluded that a substantial amount of variance is shared between the response bias and personality measures.
To answer the question stated by this strategy, zero-order correlations of HEDONICA scales in fake good and standard conditions, as well as correlation of coresponding HEDONICA scales in fake good and standard condition upon partialling out of BIDR scales variance in FG situation was performed, are compared performing ANCOVA calculations. The obtained results are presented in Results and Discussion sections.

Sample
Participants of the study were first-year students of the Faculty of Special Education and Rehabilitation in Belgrade who were awarded pre-exam points for participation in the study. Since there were considerably more female then male students 227 vs 17, responses of male students were excluded from statistical procedures. Average age of included participants was 19.91 (SD = 1.62). The results of this study are, for this reason, generalizable to female population only. However, if a future study may be performed with male participants only, gender influence might be investigated, since the literature data indicate that the BIDR test is sensitive to gender (Bobbio & Manganelli, 2011).
The BIDR (Paulhus, 1984) consists of 40 items, that measure two dimensions of socially desirable responding: Self-deception (SDE) and Impression management (IM). An already existing Serbian version of the original BIDR-6, Form 40A inventory was used in this study, adapted according to guidelines of the International Test Commission (2005). The basic concept of these guidelines is incorporated in the Serbian version of test adaptation guidelines (Hedrih, 2018). Psychometric characteristics of the Serbian version of this inventory were presented in Subotić, Dimitrijević, and Lovrić (2016).
As can be seen in table A1, in both conditions, both dimensions of BIDR scale display similar and satisfactory reliabilities.
Available literature shows that impression management, social desirability response and faking are often used as synonims (see Rust & Golombok, 2009). Although some newer authors mention subtle but important differences are pointed out between these concepts, proposing for instance distinction between concepts of "image enhancement" and "selfdeceptive enhancement" (Guion, 2011), we retained older concepts in this study.
HEDONICA personality inventory (HEDONICA; Knežević, 2008Knežević, , 2014 was designed for the purpose of applicant selection in the public sector in Serbia. HEDONICA is based on the hierarchical model of relations between basic personality dimensions. This implies that the domain dimensions are based on lower range dimensions (modalities). The inventory measures eight personality dimensions. First five dimensions are taken from the NEO-PI-R (Costa & McCrae, 1995). These dimensions are; Neuroticism, Extraversion, Openness to experience, Agreeableness, and Conscientiousness, each consisting of six facets. Sixth dimension, Honesty (called alternatively Amoral 1 ), is measured through the scale composed of the following six facets: projection and rationalisation of amoral impulses, resentement, machiavelism, sadism, lust for revenge, and passive amorality. Seventh dimension is Disintegration which captures the inclination toward "psychosis" -DELTA-10, consisting of 10 facets: GEI (general executive disfunctionality), PD (distorsion of perception), P (paranoia), D (depression), FA (level affect), SOD (somatoformic disregulation), EA (enhanced consciousness), MT (magic thinking), M (mania), and SA (social anhedonia). Eigth dimension is measured by the scale Impulsivity which consists of three subscales: low control of impulse, hedonism, and lazyness. For the last three dimensions, the result is displayed as a score on each subscale, as well as the total score. The inventory consists of 257 items, 150 of which originate from NEO-PI-R one and 107 from others. More details on the self-reporttest DELTA-10 may be found in Knežević, Savić, Kutlešić, and Opačić (2017), while more details about the psychometric properties and practical use of the instrument, may be found in a recent monograph by the author of HEDONICA (Knežević, 2014).
The validity of its factor structure was confirmed elsewhere (Knežević, 2008). Eight basic personality dimensions for the sample of this study were calculated as sum scores on items on each dimension of the personality test. Reliability coefficients were found to be satisfactory in both the standard and the fake good condition, see Table A1.

Procedure
The personality inventory and the social desirability inventory were merged and applied as a single questionnaire. Within the repeated measures research design, participants were asked to respond to the complete questionnaire under two instructions, which induced two different experimental situations (contests).
The first instruction presented to the participants, providing what is called the standard (S) situation, was the following: "Please, read the questionnaire items carefully, and then give an answer that describes you mostly. You don't have to think a lot about the meaning of the item. You will provide best answers if you give the answer that first came to your mind, after you are sure that you understood what the item means. Do respond to each item. If you made a mistake, just mark again the appropriate answer. Please give answers for all items. Do not miss any of them." The second instruction, providing what is called the fake good situation, was the following: "Imagine that you are applying for a job that you consider to be very attractive and you are likely to get it, and that getting the job depends only on the answers in this test. Give your answers in such a way that you maximize your chance for getting the job, presenting yourself in the best possible way".
Although fake bad and fake good instructions may lead to asymmetric deviations of results relative to the standard situation, in this study we did not consider the instruction fake bad, in order to focus on the situation which is more attractive for practice, and to simplify the whole procedure.
Each subject responded to the battery of two instruments first in the standard, and then in the fake good instruction. It was not possible for participants to return to previous items, once they provided a final answer to each item. The time lag between the test under the first and under the second instruction was two weeks.
Items were presented one at a time, using personal computers equipped with 19-inch monitors. Participants answered test items by selecting a number on a 5-point Likert scale -1 (strongly disagree) and 5 (strongly agree). Answers on each scale were collected automatically using the "Psycho" computer program (Knežević, 2014). Answers were stored in scrambled .csv files that were coverted to the SPSS database format.
Only the BIDR, SDE, and IM scale results obtained in the S situation were used for further statistical analysis. As the BIDR author (Paulhus, 1991) recommended, the scoring key is balanced. After reversing negatively keyed items, one point is added for each extreme response (4 or 5), and an average score was calculated for the first 20 items (SDE scale) and then for the remaining 20 items (IM scale). Thus, total scores on SDE and IM scales range between zero and 20. Such scoring procedure provided that only participants who exaggerate desirable responses may attain high scores.
The resuls of statistical analysis are presented in Tables 1 to 5 and Table 1A.

Results and Discussion
At the beginning, the effect of the instructional set was examined, using MANOVA for Repeated measures, in order to examine whether the test scores on eight personality dimensions differ in standard (S) and "fake good" (FG) condition. The within-person factor was the test condition and dependent variables were personality dimensions (H -Amoral tendencies, E -Extraversion, D -Disintegration, O -Openness, N -Neuroticism, I -Impulsivity, C -Conscientiousness, A -Agreeableness). The results of calculations are presented in Tables 1.
Results in table 1 show that mean scores were higher in the FG than in the S situation for Extraversion, Openness, Conscientiousness, and Agreeableness scales. As expected, scores on Amoral tendencies, Disintegration, Neuroticism, and Impulsivity were lower in the fake good situation. The highest discrepancy between test scores was obtained on Extraversion, Impulsivity, Conscientiousness, and Neuroticism scales. Results show that the instructional set had clear and expected effect on results of HEDONICA personality test. Differences between S and FG scores were obtained on all dimensions, in the expected way and size, i.e., the respondents changed their scores on the personality test in a socially desirable manner. These results are consistent with some previous studies that confirmed effects of instructed faking. Viswesvaran and Ones (1999) found that across the Big Five dimensions, effects of mean scale scores are about half of standard deviation for FG instruction. Effect size for mean difference between personality scores in S and FG condition found in this study, although somewhat higher (Table 1), were also similar to those presented in Ellingson, Sackett, and Hough (1999), who obtained the effect size of .51 for Neuroticism, .35 for Agreeableness, and .53 for Conscientiousness. Along with Topping and O'Goorman (1997) and Holden, Wood, and Tomashewsky (2001) results obtained in this study indicate that, relative to other personality dimensions, Openness (effect size .17) appears to be less susceptible to the socially desirable bias effect.
We can conclude that the problem of transparency of personality tests (Piedmont, McCrae, Reimann, & Angleitner, 2000), really exsists, and individuals can effectively distort their responses in situations where it is advantageous to do so (Griffith & McDaniel, 2006;Viswesvaran & Ones, 1999;Birkeland et al., 2006;Griffith, Chmielowski, & Yoshita, 2007).  While in the standard situation HEDONICA dimensions correlate mutually in the expected way and to the expected extent, in the FG situation correlations between all dimensions were higher. Average correlations between scales for S and FG situations were .33 and .56, respectively. Our results confirm earlier findigs that faking increases correlations beetween Big Five dimensions (Pauls & Crost, 2005;Schmit & Ryan, 1993;Ziegler & Buehner, 2009). One should note that in the FG situation some correlations are even higher than autocorrelations. Namely, as a rule, autocorrelations are higher than correlations between scales measuring different constructs. However, since inventory dimensions here are measured in different situations, autocorrelation coefficients are considerably reduced. Consequently, some correlation coefficients between different scales got an opportunity to remain above this limit.  Table 3 shows correlations between HEDONICA personality traits and SDE and IM scores of the BIDR inventory in both S and FG test situations. In the standard situation, SDE scores of the BIDR correlate with H, D, N, and C traits significantly, while IM scores do not correlate significantly with any of the HEDONICA traits. Under the FG condition, correlations between personality traits and SDR scores were all significant and considerably higher from those obtained for the standard condition. As already reported, (Ellingson, Smith, & Sackett 2001;Smith & Ellignson, 2002;Marshall, De Fruyt, Rolland, & Bagby 2005), this fact has implications for the factor structure of the aplied instrument.
In the next step of this study, ANCOVA for Repeated Measures was used first in order to obtain differences in personality test scores for eight dimensions (i.e., H, E, D, O, N, I, C, A) in both standard and fake good conditions. Then, personality scores in both conditions were entered into MANCOVA for repeated measures as the within-group factor, while the BIDR scales (i.e., SDE and IM) in the FG condition were entered as covariates. The calculated differences between personaity dimensions, when SDE and IM variances were partialled out from FG scores on personality traits, are presented in Table 4.
Data presented in Table 1, show that before the variance of social desirability was partialled out from personality measures, differences on all personality test scores were statistically significant. However, as now shown in Table 4, after SDE and IM variances were partialled out from FG scores of personality traits, differences on all personality traits were considerably reduced, and became statistically insignificant even in cases of H, D, O, I, and A. Partialling out of the SDE variance caused a more pronounced decrease of statistical indicator sizes than did the partialling out of IM variance. From these results one could conclude that impression management, as a component of socially desirable responding, is a fairly transparent task with a sizeable effect on personality scores, while self-deception enhancement apparently does not have such an obvious effect. Results of this analysis are in line with expectations that BIDR scales act as validity scales. However, the crucial test of this expectation should be the answer to the question: does the removal of the variance of BIDR scales under the FG condition increase correlations between the corresponding personality measures in S and FG situations? Partial correlations, expectedly, should be higher than the corresponding zero-order correlations if social desirability, operating through BIDR scores, operates as a suppressor. To test this hypothesis, zero-order and corrected (partial) correlations of corresponding HEDONICA scales under FG and S conditions are compared. Differences between the two correlation coefficients, zeroorder and partial ones, are calculated by the algorithm proposed by Raghunathan, Rosenthal, and Rubin (1996) for non-overlapping dependent r. The obtained results in terms of Z statistic and its significance (1-tailed p) are presented in Table 5. Note. H = Amoral tendencies; E = Extraversion, D = Disintegration; O = Openness; N = Neuroticism; I = Impulsivity; C = Conscientiousness; A = Agreeableness. R is taken from the diagonale of the Table  2 and r-corrected is corrected value obtained after partialling out BIDR's scale variance; label ** means statistical significance p < .05.
One may see from Table 5 that after SDE and IM variances from the FG condition were partialled out, only the correlation between scores on H in two situations slightly incresed, and correlations between scores on C decreased almost significantly. Other changes, in general, did not reach statistical significance. Apparently, the pattern of the observed correlations among personality traits is different for different experimental conditions, but the underlying pattern of correlations obtained by partialling out the response bias variance, remains unchanged. This supports the assumption of multiple "levels" in self-assessment. This result confirms an earlier finding that the effect of using validity scales to "correct" possible biases in personality scores had no effect on validity or even decreased it (Piedmont, McCrae, Reimann, & Angleitner, 2000). Moreower, many studies examining correlations between control scales and external criteria (Paulhus, 1991;Douglas et al., 1996;Dunnette et al., 1962;Rosse et al., 1998;Schmit & Ryan, 1993;Hough, 1998;Griffith & McDaniel, 2006) reported that removing this variance could jeopardize the validity.
Let us recall (the discussion of Table 2) that average absolute correlations between scales for the S and the FG situation were .33, and .56, respectively. After controlling for the BIDR variance, the average correlation between the FG personality scores decreased to .42 but the pattern of correlations remained very similar to that obtained in the FG situation. Similarly, Galić and Jerneić (2013) found that intercorrelations between personality measures obtained after removing SDR variance are more or less similar to those obtained in the FG situation, leading to the conclusion that the correction for SDR response had no effect on personality measures.
The results of the present study are also in line with some other previously published results: (Hough & Ones 2001;Murphy & Dzieweczynski, 2005;Ones et al., 1996, Ones et al., 2005 indicating that validity of personality tests for prediction of personal properties is not too sensitive to various contexts. In other words, a control for socially desirable responding in personality test scores does not harm operational validities, which should mean that social desirability is neither a mediator nor a suppressor variable in personality-performance.

Limitations of the Study
In this section we list some constrains of this study, which may suggest some further investigations on this topic.
First of all, an insufficient number of male participants was the reason why only female participants were included. Thus the conclusions of this study may be generalizable to female population only. This suggests that, in the future, studyshould be repeated with male participants only, to observe the effects of gender separately, as well as to unite the tests results, i.e., to observe the sample with balanced gender composition. The relatively low number of participants (227) is also a disadvantage.
Furthermore, since the examinations performed in this study were carried out in an experimental (instructed) situation, results obtained in the standard situation may not be considered as a true representative of the real personnel selection situation situation.We suggest that results may be further extended and probably improved if results from personnel selection situation are compared to the ones obtained in the FG situation, although no substantially different conclusions may be expected Neglecting to consider the fake bad situation, with an intent to focus attention on the practically more important fake good situation, may be added to the limitations of this study.
Finally, in this study an older, two-factor version of Paulhus's model was used, as one of widely used SDR models. Having in mind, as emphasized in the Introduction section, that Paulhus provided a more comprehensive four-factor SDR model operationalized with the Comprehensive Inventory of Desirable Responding (Paulhus, 2006), this study might be repeated on the basis of this new model with more precisely differentiated factor structure.

Conclusion
In this study, effects of partialling out of control scales variance from the personality test variance, on the validity of personality tests were examined, using the HEDONICA inventory to test personality, and the components of the Balanced Inventory of Social Desirable Responding inventory were used as control scales. Different contexts were achieved by use of the two instructions: the standard and the fake good one.
Initial MANOVA confirmed that the instructional set had a clear and expected effect on results of the personality test, i.e., participants changed their scores on the personality test in a socially desirable manner. This confirmed the suscebility of self-report scores to intentional distorsion, and changed the factor structure by increasing correlations between test scores. Results of this analysis are in line with expectations that BIDR scales are functioning as validity scales.
To answer the question: whether the removal of the variance of BIDR scales in fake good condition increases the correlations between the correspondent personality measures in the S and the FG conditions, zero-order correlations of HEDONICA scales in fake good and standard conditions and corrected (partial) correlation of coresponding HEDONICA scales in the fake good and the standard condition were compared after MANCOVA calculations, and the correlations remain unchanged, i.e., the effect of response bias was negligible. Also, this comparison did not confirm the hypothesis that partialling out of the variance of control scales would improve their validity in personnel selection. Thus, results of this study indicate that HEDONICA inventory may be used without any consideration of the socially desirable responding.