We propose to use a comprehensive path model of vocal emotion communication, encompassing encoding, transmission, and decoding processes, to empirically model data sets on emotion expression and recognition. The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners. Lens model equations, hierarchical regression, and multivariate path analysis are used to compare the relative contributions of objectively measured acoustic cues in the enacted expressions and subjective voice cues as perceived by listeners to the variance in emotion inference from vocal expressions for four emotion families (fear, anger, happiness, and sadness). While the results confirm the central role of arousal in vocal emotion communication, the utility of applying an extended path modeling framework is demonstrated by the identification of unique combinations of distal cues and proximal percepts carrying information about specific emotion families, independent of arousal. The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness. The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).