Observer ratings are often used to measure instructional quality. They are, however, usually based on observations gathered over short periods of time. Few studies have attempted to determine whether these periods are sufficient to provide reliable measures of instructional quality. Using generalizability theory, this study investigates (a) how three dimensions of instructional quality – classroom management, personal learning support, and cognitive activation of students – vary between the lessons of a specific teacher, and (b) how many lessons per teacher are necessary to establish sufficiently reliable measures of these dimensions. Analyses are based on ratings of five lessons for 38 teachers. Classroom management and personal learning support were stable across lessons, whereas cognitive activation showed high variability. Consequently, one lesson per teacher suffices to measure classroom management and personal learning support, whereas nine lessons would be needed for cognitive activation. The importance of advancing our theoretical understanding of cognitive activation is discussed.