Abstract
The Potsdam Textbook Corpus (PoTeC) is a corpus of eye-tracking-while-reading data where participants (N=75) read a series of German short texts taken from college level textbooks of physics and biology. The experiments were conducted within a 2x2 fully-crossed factorial design with the reader’s expertise (advanced vs beginner) and major (physics vs biology) as factors. Reading comprehension was assessed using text comprehension questions. Moreover, background questions that required additional knowledge beyond the presented text tested the general domain knowledge.
The repository contains the eye-movement data (1000 Hz) as well as the stimulus text data with extensive linguistic feature annotations at the sub-lexical, lexical und supra-lexical level. Therefore, the PoTeC is ideal for studying cognitive processes related to sentence comprehension at all linguistic levels (e.g. lexical, syntactic, discourse) as well as higher-level text comprehension.