The aim of this thesis is to develop, evaluate, and apply a multilevel structural equation model for the validation of 360-degree feedback instruments. The concept of these instruments is that a manager evaluates his or her own leadership competencies and is additionally evaluated by peers, subordinates, and the supervisor to gain a multi-perspective understanding of the managers’ strengths and development potentials. Even though 360-degree feedback instruments are very popular, “probably no more than one in four has been professionally developed and adequately tested for validity and reliability” (Lepsinger & Lucia, 2009, p. 55). However, the validation is an important prerequisite for the acceptance of the assessment by all stakeholders in a company as well as for the appropriateness of inferred statements based on the instrument. 360-degree feedback assessments are a special case of multisource ratings and can be conceptualized in the multitrait-multimethod (MTMM) framework. Structural equation models that have been developed to define reliability, discriminant validity of the traits, and convergent validity of the different methods (i.e., the different raters) differentiate between structurally different methods and interchangeable methods (Eid et al., 2008). In the following chapters, I will thoroughly elucidate the complex data structure of 360-degree feedback assessment and demonstrate that peers and subordinates are two different sets of interchangeable methods that are both nested within the respective target manager. They require a multilevel model that is able to include two distinct level-1 populations. As such a model does not yet exist, it is developed and presented in this thesis. It uses a planned missing data structure to incorporate peers and subordinates as two level-1 populations. Chapter 2 shows how this model is defined and how it is implemented in statistical software. The chapter includes an empirical application to Benchmarks®, a widespread 360-degree feedback instrument. The validation reveals acceptable to good reliabilities of the item parcels, low discriminant validity of the subscales, and low convergent validity of the managers self-ratings and ratings of peers and subordinates. About one quarter of variance in peer and subordinate ratings is not shared with the manager’s ratings but with the other group members. The lion’s share of variance in the single peer and subordinate ratings, however, is neither shared with the manager nor with the other peers or subordinates but is idiosyncratic. The convergent validity between the group of peers and the group of subordinates rating one manager is high indicating that peers and subordinates share a common view that diverges from the managers’ self-perception. The Monte Carlo simulation in Chapter 3 demonstrates that sample sizes of 100 target managers and two peers and subordinates are sufficient to achieve unbiased parameter estimation. Precise estimation of standard errors necessitates 400 target managers or at least four peers and subordinates. If this sample size is not reached, mainly standard errors of common method factors are affected by bias. The simulation study uncovers a strong leniency bias of the χ2-test statistic. Inferential decisions based on this test would result in too many falsely accepted models. This bias can partly be reduced by a correction of the degrees of freedom that is necessary due to the special planned missing data structure. However, the bias remains substantial and it is not recommended to trust the χ2-test for this type of model. Discrepancies between self- and others’ ratings that are typically encountered in 360-degree feedback assessments are often interpreted as a lack of the manager’s self-awareness. However, there are many other possible reasons for diverging ratings such as personality and individual characteristics of the raters, contextual and cultural variables, and cognitive and motivational aspects. Therefore Chapter 4 reviews existing literature on the association between self-other-agreement (SOA) on leadership competencies and self-awareness. As the results of previous studies are inconsistent, used different statistical approaches, and included different rating perspectives, a systematic analysis on the correlation between SOA and self-awareness is conducted. Three different multilevel structural equation models reveal that the correlations found between the two constructs are almost completely attributable to shared method variance. When this method variance is explicitly modeled by using the multilevel structural equation model that is presented in the course of this thesis, there remains no substantial association between SOA and self-awareness. In light of this result, it is not advisable to interpret disagreement in ratings as an indicator of lacking self-awareness. In the following general discussion the scientific contribution of the thesis is stressed. Possible adoptions of the model to other contexts, approaches how to deal with variations of the data structure, and limitations of the present thesis are discussed.