13th European Conference on Psychological Assessment

Original Photo adapted from Hansueli Kramer / CC BY

Session Overview

Session

PA1: Measurement 1

Time:

Thursday, 23/Jul/2015:

9:45am - 11:15am

Session Chair: Klaus D. Kubinger

Location: KOL-G-204 (Ⅱ)
capacity: 85

Presentations

On designing data-sampling for Rasch Model calibrating an achievement test

Klaus D. Kubinger¹, Dieter Rasch², Takuya Yanagida³

¹University of Vienna, Austria; ²University of Natural Resources and Applied Life Sciences, Vienna; ³University of Applied Sciences, Austria; klaus.kubinger@univie.ac.at klaus.kubinger@univie.ac.at

Though calibration of an achievement test within psychological and educational context is very often carried out by the Rasch model, data sampling is hardly designed according to its statistical foundations. Kubinger, Rasch, and Yanagida (2009) suggested an approach for the determination of sample size according to a given Type I and Type II risk, and a certain effect of model misfit when testing the Rasch model is supported by some new results. The approach uses a three-way analysis of variance design (A > B) x C with mixed classification. There is a (fixed) group factor A, a (random) factor B of testees within A, and a (fixed) factor C of items cross-classified with (A > B). In accordnce with to Andersen’s Likelihood-Ratio test, the testees must be divided into at least two groups according to some criterion suspected of causing differential item functioning (DIF). The Rasch model’s quality of specific objective measurement is in accordance with no interaction effect A x C. The results of simulations studies are: the approach works given several restrictions, and its main aim, the determination of the sample size, is attained. Additionally, our approach's power is consistently higher than Andersen's test.

Examining fit in covariance modeling with ordinal data

Christine DiStefano¹, Grant Morgan², Phillip Sherlock¹

¹University of South Carolina, USA; ²Baylor University, USA; distefan@mailbox.sc.edu distefan@mailbox.sc.edu

Fit indices are routinely used with covariance modeling to provide information about the goodness of fit between the hypothesized model and the data. These indices include relative fit indices (e.g., Goodness of fit Index, Root Mean Square Error of Approximation, Standardized Root Mean Square Residual) and incremental fit indices (e.g., Tucker Lewis Fit Index (Nonnormed fit index, Comparative Fit Index, Incremental Fit Index). Recommendations and rules of thumb for interpreting various fit indices have been presented in the literature; however these guidelines are largely built from investigations using continuous, multivariate, normal data, and normal theory estimators (maximum likelihood or generalized least squares). As most of the data used in empirical studies is not continuous and may not be normally distributed, these recommendations may not hold when ordered category data are analyzed and/or robust estimators are used. Little is known as to how ad-hoc fit indices behave under non-normal and/or ordinal data. The purpose of this study is to examine the performance of fit indices under situations of categorical data and non-normality. Conditions such as sample size, number of ordered categories, non-normality, and estimation technique will be manipulated to examine the performance of fit indices.

Statistical and theoretical reductionism in research on scientific thinking: How much can the Rasch model tell us?

Peter Adriaan Edelsbrunner¹, Fabian Dablander²

¹ETH Zurich, Switzerland; ²University of Tübingen, Germany; dostodabsi@gmail.com dostodabsi@gmail.com

In recent research on scientific thinking, Rasch modeling was employed to investigate the dimensionality of items that were meant to cover a wide variety of skills. Based on generic fit statistics and model comparisons, it was concluded that scientific thinking represents a unidimensional psychological construct. Using simulations, we argue that generic fit statistics and model comparisons based on the Rasch model merely warrant crude conclusions about the use of composite scores for practical assessments. Without strong prior theory, results from the Rasch model do not warrant theoretical conclusions about the dimensionality of the underlying psychological construct. In the simulations we compare the adequacy of various alternative measurement models for examining structural assumptions about scientific thinking. Based on the simulations, crucial assumptions of the Rasch model and their implications for theory development are discussed, expanding the discussion by drawing parallels to reductionism in intelligence and psychiatry research. We conclude that an undue reliance on Rasch models might not benefit and even hinder theory development in research on scientific thinking. Alternative measurement models and experimental studies might provide more thorough insight into the structure of scientific thinking. Finally, we discuss our study’s implications for other fields with frequent application of Rasch models.

Teaching statistical inference and the null hypothesis significance controversy

Ernest Kwan, Irene R. R. Lu

Carleton University, Canada; ernest.kwan@carleton.ca ernest.kwan@carleton.ca

Null hypothesis significance testing (NHST) is the predominant procedure for statistical inference in the social sciences. Quantitative methodologists, however, have debated the legitimacy of NHST, and the American Psychological Association convened a task force to evaluate the role of NHST in quantitative research. We describe an approach to teaching statistical inference that illustrates the problems of NHST and reviews the recommendations of reform made by the task force and other renowned methodologists. This pedagogical approach is designed for a statistics course enrolled by graduate students in a research-oriented doctoral program. Accordingly, our approach also illustrates how NHST should and should not be used to evaluate substantive theories or hypotheses of interest.