Forests provide important ecosystem functions such as carbon sequestration and climate regulation, particularly in countries with high forest cover. Climate change-induced extreme weather events have a negative impact on many forest ecosystems. In Germany, for instance, the drought of the years 2018 until 2020 resulted in signs of damage in almost 80% of trees. This decline in forest vitality has additionally led to severe bark beetle infestations and widespread tree mortality, posing significant challenges to forest managers to obtain a complete picture of the state of their forests. Since a completely ground-based monitoring of forest condition is not feasible due to the forests' vast extent, remote sensing and particularly multispectral satellite image time series (SITS) analysis were suggested as efficient alternatives. Transformers, a state-of-the-art Deep Learning (DL) architecture, have shown promising results in the classification of multivariate SITS for other applications. Here, we use Transformers in combination with Sentinel-2 (S2) time series data to test if they can improve forest disturbance detection capabilities in comparison to conventional methods by automatically extracting relevant information from background variability throughout the whole time series. To match the large training data needs of Transformers, we use a two-step approach including pre-training and finetuning. During pre-training, we use outputs of earlier presented SITS approaches, while during finetuning, we use detailed reference data of known disturbances covering between 10 and 100% of a Sentinel-2 pixel as extracted from aerial images. We test three setups: DL base using ten S2 bands, DL IND using ten vegetation indices (VIs), and DL +IND utilising both as model input. F1-scores across all of our six study sites range between approx. 0.65 (DL +IND) and 0.72 (DL base) in a binary classification (undisturbed vs. disturbed) when considering both full and partial disturbances. DL base outperforms the other setups in forest disturbance detection, and detects disturbance extents as small as 40 m2 within pixels of 100 m2 size. Given the best performance of DL base, handcrafted vegetation indices (VIs) do not improve the model. Our model is competitive with existing approaches and slightly outperforms most earlier reported results, even though a direct comparison is challenging. Considering the option to further refine our trained model if additional reference data becomes available over time, we conclude that a combination of Transformers and Sentinel-2 time series can be developed into an effective tool for forest disturbance monitoring of Central European forests at fine spatial grain.