Theories of incremental sentence production make different assumptions about when speakers encode information about described events and when verbs are selected, accordingly. An eye tracking experiment on German testing the predictions from linear and hierarchical incrementality about the timing of event encoding and verb planning is reported. In the experiment, participants described depictions of two-participant events with sentences that differed in voice and word order. Verb-medial active sentences and actives and passives with sentence-final verbs were compared. Linear incrementality predicts that sentences with verbs placed early differ from verb-final sentences because verbs are assumed to only be planned shortly before they are articulated. By contrast, hierarchical incrementality assumes that speakers start planning with relational encoding of the event. A weak version of hierarchical incrementality assumes that only the action is encoded at the outset of formulation and selection of lexical verbs only occurs shortly before they are articulated, leading to the prediction of different fixation patterns for verb-medial and verb-final sentences. A strong version of hierarchical incrementality predicts no differences between verb-medial and verb-final sentences because it assumes that verbs are always lexically selected early in the formulation process. Based on growth curve analyses of fixations to agent and patient characters in the described pictures, and the influence of character humanness and the lack of an influence of the visual salience of characters on speakers' choice of active or passive voice, the current results suggest that while verb planning does not necessarily occur early during formulation, speakers of German always create an event representation early.