Compositionality refers to a structural property of human language, according to which the meaning of a complex expression is a function of the meaning of its parts and the way they are combined. Compositionality is a defining characteristic of all human language, spoken and signed. Comparative research into the emergence of human language aims at identifying precursors to such key features of human language in the communication of other primates. While it is known that chimpanzees, our closest relatives, produce a variety of gestures, facial expressions and vocalizations in interactions with their group members, little is known about how these signals combine simultaneously. Therefore, the aim of the current study is to investigate whether there is evidence for compositional structures in the communication of chimpanzees. We investigated two semi-wild groups of chimpanzees, with focus on their manual gestures and their combinations with facial expressions across different social contexts. If there are compositional structures in chimpanzee communication, adding a facial expression to a gesture should convey a different message than the gesture alone, a difference that we expect to be measurable by the recipient’s response. Furthermore, we expect context-dependent usage of these combinations. Based on a form-based coding procedure of the collected video footage, we identified two frequently used manual gestures (stretched arm gesture and bent arm gesture) and two facial expression (bared teeth face and funneled lip face). We analyzed whether the recipients’ response varied depending on the signaler’s usage of a given gesture + face combination and the context in which these were used. Overall, our results suggest that, in positive contexts, such as play or grooming, specific combinations had an impact on the likelihood of the occurrence of particular responses. Specifically, adding a bared teeth face to a gesture either increased the likelihood of affiliative behavior (for stretched arm gesture) or eliminated the bias toward an affiliative response (for bent arm gesture). We show for the first time that the components under study are recombinable, and that different combinations elicit different responses, a property that we refer to as componentiality. Yet our data do not suggest that the components have consistent meanings in each combination—a defining property of compositionality. We propose that the componentiality exhibited in this study represents a necessary stepping stone toward a fully evolved compositional system.