Header

UZH-Logo

Maintenance Infos

Implementation and validation of an item response theory scale for formative assessment


Berger, Stéphanie. Implementation and validation of an item response theory scale for formative assessment. 2019, University of Twente.

Abstract

This PhD thesis was motivated by practical challenges related to implementing and validating a vertical Rasch scale to measure students’ mathematics abilities throughout compulsory school in Northwestern Switzerland. The goal of this vertical scale is to provide third through ninth grade students with objective, reliable, and valid assessment reports based on two different assessment instruments. In Chapter 1 and 2, the two assessment instruments are introduced. By integrating their similarities and differences with the theoretical background on data-collection designs and item calibration within a Rasch framework, a four-step item calibration process is proposed to establish a vertical scale and link the two instruments. Subsequently, three studies are presented which examine specific aspects of the implementation and validation of the vertical scale. The first study (Chapter 3) investigates through simulations whether calibration efficiency under the Rasch model could be enhanced through targeted multistage calibration designs, which consider ability-related background variables and performance for assigning students with suitable items. Furthermore, it evaluates whether uncertainty about item difficulty could impair assembly of an efficient calibration design. The second study (Chapter 4) directs focus from efficient item calibration toward efficient ability estimation. Through simulations, the efficiency of a targeted multistage test design is compared to that of a traditional targeted test design and a multistage test design. The study also analyzes the extent to which each design’s efficiency depends on the correlation between the ability-related background variable and students’ true abilities, each student’s ability level and categorization into an ability group, and the length of the starting module. The third study (Chapter 5) is based on data from preliminary calibration assessments for establishing the vertical scale. The psychometric properties of the scale are examined through item analysis and by comparing concurrent and grade-by-grade calibration procedures. The content-related validity of the scale is evaluated by contrasting the empirical item difficulty estimates with the content-related item difficulties reflected in the underlying competence levels of the curriculum. In conclusion, this PhD thesis underpins the justification of an assessment system, which offers a unique opportunity to monitor students’ learning trajectories throughout compulsory school.

Abstract

This PhD thesis was motivated by practical challenges related to implementing and validating a vertical Rasch scale to measure students’ mathematics abilities throughout compulsory school in Northwestern Switzerland. The goal of this vertical scale is to provide third through ninth grade students with objective, reliable, and valid assessment reports based on two different assessment instruments. In Chapter 1 and 2, the two assessment instruments are introduced. By integrating their similarities and differences with the theoretical background on data-collection designs and item calibration within a Rasch framework, a four-step item calibration process is proposed to establish a vertical scale and link the two instruments. Subsequently, three studies are presented which examine specific aspects of the implementation and validation of the vertical scale. The first study (Chapter 3) investigates through simulations whether calibration efficiency under the Rasch model could be enhanced through targeted multistage calibration designs, which consider ability-related background variables and performance for assigning students with suitable items. Furthermore, it evaluates whether uncertainty about item difficulty could impair assembly of an efficient calibration design. The second study (Chapter 4) directs focus from efficient item calibration toward efficient ability estimation. Through simulations, the efficiency of a targeted multistage test design is compared to that of a traditional targeted test design and a multistage test design. The study also analyzes the extent to which each design’s efficiency depends on the correlation between the ability-related background variable and students’ true abilities, each student’s ability level and categorization into an ability group, and the length of the starting module. The third study (Chapter 5) is based on data from preliminary calibration assessments for establishing the vertical scale. The psychometric properties of the scale are examined through item analysis and by comparing concurrent and grade-by-grade calibration procedures. The content-related validity of the scale is evaluated by contrasting the empirical item difficulty estimates with the content-related item difficulties reflected in the underlying competence levels of the curriculum. In conclusion, this PhD thesis underpins the justification of an assessment system, which offers a unique opportunity to monitor students’ learning trajectories throughout compulsory school.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

18 downloads since deposited on 06 Jan 2020
18 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Dissertation (cumulative)
Referees:Toonen T A J, Eggen T J H M, Moser Urs, Verschoor A J, Veldkamp B P, Glas C A W, van der Ark L A, van der Schaaf M F, Schildkamp K, Joosten-ten Brinke D
Communities & Collections:06 Faculty of Arts > Institute of Educational Evaluation
Dewey Decimal Classification:370 Education
Language:English
Date:26 June 2019
Deposited On:06 Jan 2020 15:07
Last Modified:07 Apr 2020 07:24
ISBN:9789036547932
OA Status:Hybrid
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.3990/1.9789036547932

Download

Hybrid Open Access

Download PDF  'Implementation and validation of an item response theory scale for formative assessment'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 5MB
View at publisher