Background: A small number of previous studies have provided evidence that cocaine users (CU) exhibit impairments in complex social cognition tasks, while the more basic facial emotion recognition is widely unaffected. However, prosody and cross-modal emotion processing has not been systematically investigated in CU so far. Therefore, the aim of the present study was to assess complex multisensory emotion processing in CU in comparison to controls and to examine a potential association with drug use patterns. Method: The abbreviated version of the comprehensive affect testing system (CATS-A) was used to measure emotion perception across the three channels of facial affect, prosody, and semantic content in 58 CU and 48 healthy control (HC) subjects who were matched for age, sex, verbal intelligence, and years of education. Results: CU had significantly lower scores than controls in the quotient scales of "emotion recognition" and "prosody recognition" and the subtests "conflicting prosody/meaning - attend to prosody" and "match emotional prosody to emotional face" either requiring to attend to prosody or to integrate cross-modal information. In contrast, no group difference emerged for the "affect recognition quotient." Cumulative cocaine doses and duration of cocaine use correlated negatively with emotion processing. Conclusion: CU show impaired cross-modal integration of different emotion processing channels particularly with regard to prosody, whereas more basic aspects of emotion processing such as facial affect perception are comparable to the performance of HC.