As the performance of weather and climate forecasting systems and their benchmark systems are generally not homogeneous in time and space and may vary in specific situations, improvements in certain situations or subsets have different effects on overall skill. We present a decomposition of skill scores for the conditional verification of such systems. The aim is to evaluate the performance of a system individually for predefined subsets with respect to the overall performance. The overall skill score is decomposed into a weighted sum representing subset contributions , where each individual contribution is the product of the following: (1) the subset skill score , assessing the performance of a forecast system compared to a reference system for a particular subset; (2) the frequency weighting , accounting for varying subset size; and (3) the reference weighting , relating the performance of the reference system in the individual subsets to the performance of the full data set. The decomposition and its interpretation are exemplified using synthetic data. Subsequently, we use it for a practical example from the field of decadal climate prediction: an evaluation of the Atlantic European near-surface temperature forecast from the German “Mittelfristige Klimaprognosen” (MiKlip) initiative decadal prediction system that is conditional on different Atlantic Multidecadal Oscillation (AMO) phases during initialization. With respect to the chosen western European North Atlantic sector, the decadal prediction system “preop-dcpp-HR” performs better than the uninitialized simulations mostly due to contributions during the positive AMO phase driven by the subset skill score. Compared to the low-resolution system (preop-LR), no overall performance benefits are made in this region, but positive contributions are achieved for initialization in neutral AMO phases. Additionally, the decomposition reveals a strong imbalance among the subsets (defined by AMO phases) in terms of reference weighting, allowing for insightful interpretation and conclusions. This skill score decomposition framework for conditional verification is a valuable tool to analyze the effect of physical processes on forecast performance and, consequently, supports model development and the improvement of operational forecasts.