Next-generation Enem assessment with fewer items and high reliability using CAT

Authors

DOI:

https://doi.org/10.18222/eae.v35.10142

Keywords:

Student Evaluation, Psychometrics, Item Response Theory, Higher Education

Abstract

The Exame Nacional do Ensino Médio [Brazilian High School Exam] (Enem) is a test that includes an essay and four 45-item tests. Its reliability and the impact of fatigue on scores are important considerations, so Computerized Adaptive Testing (CAT) may be a way to address these issues. Therefore, the present study aimed to verify the possibility of reducing the number of items on the Enem, using a CAT. We used tests from the 2009 to 2019 editions of the Enem. We simulated a CAT, which ended when the error was less than 0.30, or when 45 items were applied. On average, the application ranged from 12.0 (Languages and Codes – LC) to 29.2 (Mathematics – MT) items. The results point to the potential of reducing the size of the Enem to 20 items for a proportion that varies from 39.8% (MT) to 94.8% (LC) of the population.

Downloads

Download data is not yet available.

References

Ayala, R. J. de. (2009). The theory and practice of item response theory. The Guilford Press.

Barichello, L., Guimarães, R. S., & Figueiredo, D. B., Filho. (2022). A formatação da prova afeta o desempenho dos estudantes? Evidências do Enem (2016). Educação e Pesquisa, 48, Artigo e241713. https://doi.org/10.1590/s1678-4634202248241713por DOI: https://doi.org/10.1590/s1678-4634202248241713por

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. http://doi.org/10.18637/jss.v048.i06 DOI: https://doi.org/10.18637/jss.v048.i06

Chalmers, R. P. (2016). Generating adaptive and non-adaptive test interfaces for multidimensional item response theory applications. Journal of Statistical Software, 71(5), 1-39. https://doi.org/10.18637/jss.v071.i05 DOI: https://doi.org/10.18637/jss.v071.i05

Debeer, D., & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164-185. https://doi.org/10.1111/jedm.12009 DOI: https://doi.org/10.1111/jedm.12009

Domingue, B., Kanopka, K., Stenhaug, B., Sulik, M., Beverly, T., Brinkhuis, M. J. S., Circi, R., Faul, J., Liao, D., McCandliss, B., Obradovic, J., Piech, C., Porter, T., Soland, J., Weeks, J., Wise, S., & Yeatman, J. D. (2020). Speed accuracy tradeoff? Not so fast: Marginal changes in speed have inconsistent relationships with accuracy in real-world settings. PsyArXiv. http://doi.org/10.31234/osf.io/kduv5 DOI: https://doi.org/10.31234/osf.io/kduv5

Ferreira-Rodrigues, C. F. (2015). Estudos com o Enem a partir de uma abordagem psicométrica da inteligência [Tese de doutorado]. Universidade São Francisco. https://www.usf.edu.br/galeria/getImage/427/2977366806369866.pdf

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage Publications.

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (Inep). (2009). Exame Nacional do Ensino Médio (Enem): Textos teóricos e metodológicos. MEC/Inep.

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (Inep). (2010). Guia de elaboração e revisão de itens. MEC/Inep. http://download.inep.gov.br/outras_acoes/bni/guia/guia_elaboracao_revisao_itens_2012.pdf

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (Inep). (2012a). Entenda a sua nota no Enem: Guia do participante. MEC/Inep. http://download.inep.gov.br/educacao_basica/enem/guia_participante/2013/guia_do_participante_notas.pdf

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (Inep). (2012b). Guia de elaboração de itens Provinha Brasil. MEC/Inep. http://download.inep.gov.br/educacao_basica/provinha_brasil/documentos/2012/guia_elaboracao_itens_provinha_brasil.pdf

Kalender, I., & Berberoglu, G. (2017). Can computerized adaptive testing work in students’ admission to higher education programs in Turkey? Educational Sciences: Theory & Practice, 17(2), 573-596. http://doi.org/10.12738/estp.2017.2.0280 DOI: https://doi.org/10.12738/estp.2017.2.0280

Kan, A., & Bulut, O. (2015). Examining the language factor in mathematics assessments. Journal of Education and Human Development, 4(1), 133-146. https://doi.org/10.15640/jehd.v4n1a13 DOI: https://doi.org/10.15640/jehd.v4n1a13

Masri, Y. H. E., Ferrara, S., Foltz, P. W., & Baird, J.-A. (2017). Predicting item difficulty of science national curriculum tests: The case of key stage 2 assessments. The Curriculum Journal, 28(1), 59-82. https://doi.org/10.1080/09585176.2016.1232201 DOI: https://doi.org/10.1080/09585176.2016.1232201

Mizumoto, A., Sasao, Y., & Webb, S. A. (2019). Developing and evaluating a computerized adaptive testing version of the Word Part Levels Test. Language Testing, 36(1), 101-123. https://doi.org/10.1177/0265532217725776 DOI: https://doi.org/10.1177/0265532217725776

Muñiz, J. (1997). Introducción a la teoría de respuesta a los ítems. Psicología Pirámide.

Nicewander, W. A., & Thomasson, G. L. (1999). Some reliability estimates for computerized adaptive tests. Applied Psychological Measurement, 23(3), 239-247. https://doi.org/10.1177/01466219922031356 DOI: https://doi.org/10.1177/01466219922031356

Pasquali, L., & Primi, R. (2003). Fundamentos da teoria da resposta ao item – TRI. Avaliação Psicológica, 2(2), 99-110.

Peres, A. J. de S. (2019). Testagem adaptativa por computador (CAT): Aspectos conceituais e um panorama da produção brasileira. Revista Examen, 3(3), 66-86.

Primi, R., Nakano, T. de C., & Wechsler, S. M. (2018). Using four-parameter item response theory to model human figure drawings. Avaliação Psicológica, 17(4), 473-483. https://doi.org/10.15689/ap.2018.1704.7.07 DOI: https://doi.org/10.15689/ap.2018.1704.7.07

Primi, R., Silvia, P. J., Jauk, E., & Benedek, M. (2019). Applying many-facet Rasch modeling in the assessment of creativity. Psychology of Aesthetics, Creativity, and the Arts, 13(2), 176-186. http://doi.org/10.1037/aca0000230 DOI: https://doi.org/10.1037/aca0000230

R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17(1), 321-335. https://files.eric.ed.gov/fulltext/EJ1130806.pdf

Setzer, J. C., Wise, S. L., van den Heuvel, J. R., & Ling, G. (2013). An investigation of examinee test-taking effort on a large-scale assessment. Applied Measurement in Education, 26(1), 34-49. https://dx.doi.org/10.1080/08957347.2013.739453 DOI: https://doi.org/10.1080/08957347.2013.739453

Spenassato, D., Trierweiller, A. C., Andrade, D. F. de, & Bornia, A. C. (2016). Testes adaptativos computadorizados aplicados em avaliações educacionais. Revista Brasileira de Informática na Educação, 24(2), 1-12. http://milanesa.ime.usp.br/rbie/index.php/rbie/article/view/6416 DOI: https://doi.org/10.5753/rbie.2016.24.02.1

Tillé, Y., & Matei, A. (2016). Sampling: Survey Sampling. https://CRAN.R-project.org/ package=sampling

Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A multiprocess item response model for not-reached items due to time limits and quitting. Educational and Psychological Measurement, 80(3), 522-547. https://doi.org/10.1177/0013164419878241 DOI: https://doi.org/10.1177/0013164419878241

Veldkamp, B. P., & Matteucci, M. (2013). Bayesian computerized adaptive testing. Ensaio: Avaliação e Políticas Públicas em Educação, 21(78), 57-72. https://doi.org/10.1590/S0104-40362013005000001 DOI: https://doi.org/10.1590/S0104-40362013005000001

Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing. Journal of Methods and Measurement in the Social Sciences, 2(1), 1-27. https://doi.org/10.2458/v2i1.12351 DOI: https://doi.org/10.2458/jmm.v2i1.12351

Wu, M., Tam, H. P., & Jen, T.-H. (2016). Educational measurement for applied researchers Theory into practice. Springer. DOI: https://doi.org/10.1007/978-981-10-3302-5

Published

2024-12-20

How to Cite

Jaloto, A., & Primi, R. (2024). Next-generation Enem assessment with fewer items and high reliability using CAT. Estudos Em Avaliação Educacional, 35, e10142. https://doi.org/10.18222/eae.v35.10142

Issue

Section

Articles