A Comparison of Differential Item Functioning Detection in National Tests of Literacy, Numeracy, and Reasoning Abilities at the Grade Three Level Using HGLM, MIMIC and IRT-LR Methods

Main Article Content

สุธาทิพย์ ตรีสิน
ปิยะทิพย์ ประดุจพรม


The objectives of this research were to analyze the quality of National Tests (NT) and to
investigate the possibility of Differential Item Functioning (DIF) in three subjects: Literacy, Numeracy,
and Reasoning by using HGLM, MIMIC, and IRT-LR methods. The research methods were divided
into three phases: 1) Analyzing the quality of NT exam item for three subjects; 2) Testing DIF
detection of the items in NT using HGLM, MIMIC, and IRT-LR methods; and 3) Comparing the results
of DIF three methods using secondary data from NT examination of 9,600 Grade three students
academic year 2013.
Results were as follows:
1. The national tests had IRT difficulty parameter values at relatively difficult levels, discrimination
parameter values capable of differentiating examiners at a good level, and guessing parameters
not exceeding 0.30.
2. The examination of possible DIF in the three subjects revealed that gender affected the test
scores; female students had an advantage when answering the Literacy, and Reasoning subjects,
while male students had an advantage in the Numeracy subject. In addition, the HGLM method
indicated that the three most common DIF tests could account for 69% of the test, followed by
the IRT-LR at 54% and MIMIC at 16%, respectively.
3. Comparison of the DIF test results revealed that the HGLM method outperformed the MIMIC
method in terms of DIF detection, namely 70% for Literacy, 36% for Numeracy, and 53% for Reasoning
subjects. The HGLM method also outperformed the IRT-LR method in terms of DIF detection,
namely 37% for Literacy and 13% for Numeracy subjects. The IRT-LR method outperformed the
MIMIC method in terms of DIF detection, namely 33% for Literacy, 43% for Numeracy, and 40%
for Reasoning subjects. Also, the HGLM method outperformed the IRT-LR method in terms of DIF
detection for only Numeracy subjects (7%) (p<.05).

Article Details

บทความวิจัย (Research Articles)


กระทรวงศึกษาธิการ. (2551). หลักสูตรแกนกลางการศึกษาขั้นพื้นฐาน พุทธศักราช 2551. กระทรวงศึกษาธิการ.
พีรญา สูงเนิน, เสรี ชัดแช้ม และสมโภชน์ อเนกสุข (2552). การตรวจสอบการทำหน้าที่ต่างกันของข้อสอบในแบบ
ทดสอบพหุมิติ: การเปรียบเทียบระหว่างรายข้อกับรายหมวดข้อสอบ โดยใช้วิธีซิปเทสท์.วิทยาการวิจัยและวิทยาการ
ปัญญา, 6(2), 49-62.
สำนักงานปลัดกระทรวงศึกษาธิการ. (2554). แผนพัฒนาการศึกษาของกระทรวงศึกษาธิการฉบับที่สิบเอ็ด พ.ศ 2555-2559. กรุงเทพฯ: กระทรวงศึกษาธิการ.
Acar, T., (2013). Comparison of the group and intercept coefficient from HGLM and LR-DIF method. British
Journal of Science, 10(1), 12-20.
Acar, T., & Kelecioglu, H. (2010). Comparison of differential item functioning determination techniques: HGLM,
LR and IRT-LR. Educational Sciences: Theory and Practice, 10(2), 639-649.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel,
SIBTEST, and the IRT likelihood ratio.Applied Psychological Measurement, 29(4), 278-295.
Kabasakal, K. A., Arsan, N., Gök, B., & Kelecioglu, H. (2014). Comparing performances (Type I Error
and Power) of IRT likelihood ratio SIBTEST and Mantel-Haenszel methods in the determination of
differential item functioning. Educational Sciences: Theory and Practice, 14(6), 2186-2193.
Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. .Journal of Educational
Measurement, 38(1), 79-93.
Le, L. T. (2009). Investigating gender differential item functioning across countries and test languages
for PISA science items. International Journal of Testing, 9(2), 122-133.
Li, H., Hunter, C. V., & Oshima, T. C. (2013). Gender DIF in reading tests: A synthesis of research. In New
Developments in Quantitative Psychology(pp. 489-506). New York, Springer.
Ong, M. L., Lu, L., Lee, S., & Cohen, A. (2015). A comparison of the hierarchical generalized linear model,
multiple-indicators multiple-causes, and the item response theory-likelihood ratio test for detecting
differential item functioning. In Mellsap, R. E., Bolt, D. M., Van der Ark, L. A. & Wang, W. C. (Eds.),
Quantitative Psychology Research(pp. 343-357). DOI:10.1007-978-3-319-07503-7_22.
Taylor, C. S., & Lee, Y. (2012). Gender DIF in reading and mathematics tests with mixed item formats.
Applied Measurement in Education, 25(3), 246-280.
Yildirim, H. H., & Berberoglu, G. (2009). Judgmental and statistical DIF analyses of the PISA-2003
mathematics literacy items. International Journal of Testing, 9(2), 108-121.