In the quest to bridge the gulf between the fields of linguistics and animal communication, interest has recently been drawn to turn-taking behavior in social interaction. Vocal turn-taking is the core form of language usage in humans, and has been examined in numerous species of birds and primates. Recent studies on great apes have shown that they engage in a bodily form, gestural turn-taking, to achieve mutual communicative goals. However, most studies on turn-taking neglected the fact that signals are prevalently perceived and produced in a multimodal format. Here, I propose that research on animal communication may benefit a more holistic and dynamic approach: studying turn-taking using a multimodal, conservation-analytic framework. I will discuss recent comparative research that implemented this approach via a specific set of parameters. In sum, I argue that a conversation-analytic framework might help substantially to pinpoint the ways in which crucial components of language are embodied in the ‘human interaction engine’.