Infants and adults frequently observe actions performed jointly by more than one person. Research in action perception, however, has focused largely on actions performed by an individual person. Here, we explore how 9- and 12-month-old infants and adults perceive a block-stacking action performed by either one agent (individual condition) or two agents (joint condition). We used eye tracking to measure the latency of participants' gaze shifts towards action goals. Adults anticipated goals in both conditions significantly faster than infants, and their gaze latencies did not differ between conditions. By contrast, infants showed faster anticipation of goals in the individual condition than in the joint condition. This difference was more pronounced in 9-month-olds. Further analyses of fixations examined the role of visual attention in action perception. These findings are cautiously interpreted in terms of low-level processing in infants and higher-level processing in adults. More precisely, our results suggest that adults are able to infer the overarching joint goal of two agents, whereas infants are not yet able to do so and might rely primarily on visual cues to infer the respective sub-goals. In conclusion, our findings indicate that the perception of joint action in infants develops differentially from that of individual action.