In the neuroscience of language, phonemes are frequently described as multimodal units whose neuronal representations are distributed across perisylvian cortical regions, including auditory and sensorimotor areas. A different position views phonemes primarily as acoustic entities with posterior temporal localization, which are functionally independent from frontoparietal articulatory programs. To address this current controversy, we here discuss experimental results from neuroimaging (fMRI) as well as transcranial magnetic stimulation (TMS) studies. On first glance, a mixed picture emerges, with earlier research documenting neurofunctional distinctions between phonemes in both temporal and frontoparietal sensorimotor systems, but some recent work seemingly failing to replicate the latter. Detailed analysis of methodological differences between studies reveals that the way experiments are set up explains whether sensorimotor cortex maps phonological information during speech perception or not. In particular, acoustic noise during the experiment and ‘motor noise’ caused by button press tasks work against the frontoparietal manifestation of phonemes. We highlight recent studies using sparse imaging and passive speech perception tasks along with multivariate pattern analysis (MVPA) and especially representational similarity analysis (RSA), which succeeded in separating acoustic-phonological from general-acoustic processes and in mapping specific phonological information on temporal and frontoparietal regions. The question about a causal role of sensorimotor cortex on speech perception and understanding is addressed by reviewing recent TMS studies. We conclude that frontoparietal cortices, including ventral motor and somatosensory areas, reflect phonological information during speech perception and exert a causal influence on understanding.