Background The analysis of differential splicing (DS) is crucial for understanding physiological processes in cells and organs. In particular, aberrant transcripts are known to be involved in various diseases including cancer. A widely used technique for studying DS are exon arrays. Over the last decade a variety of algorithms for the detection of DS events from exon arrays has been developed. However, no comprehensive, comparative evaluation including sensitivity to the most important data features has been conducted so far. To this end, we created multiple data sets based on simulated data to assess strengths and weaknesses of seven published methods as well as a newly developed method, KLAS. Additionally, we evaluated all methods on two cancer data sets that comprised RT-PCR validated results. Results Our studies indicated ARH as the most robust methods when integrating the results over all scenarios and data sets. Nevertheless, special cases or requirements favor other methods. While FIRMA was highly sensitive according to experimental data, SplicingCompass, MIDAS and ANOSVA showed high specificity throughout the scenarios. On experimental data ARH, FIRMA, MIDAS, and KLAS performed best. Conclusions Each method shows different characteristics regarding sensitivity, specificity, interference to certain data settings and robustness over multiple data sets. While some methods can be considered as generally good choices over all data sets and scenarios, other methods show heterogeneous prediction quality on the different data sets. The adequate method has to be chosen carefully and with a defined study aim in mind.