Confiabilidade e concordância entre juízes: aplicações na área educacional

Daniel Abud Seabra Matos

doi:10.18222/eae255920142750

Authors

Daniel Abud Seabra Matos Departamento de Educação da Universidade Federal de Ouro Preto (Ufop), Ouro Preto, Minas Gerais, Brasil

DOI:

https://doi.org/10.18222/eae255920142750

Keywords:

Rate of Reliability, Vestibular examination, Evaluators, Essay

Abstract

The aims of this study were to: (1) investigate the strategies for verifying reliability and agreement among evaluators, focusing on the applications in the educational area; (2) conduct a review of the national literature on the techniques of reliability and agreement among judges and their areas of application; and (3) illustrate the application of the techniques of reliability and agreement among evaluators by analyzing the corrections of the Vestibular (college entrance exam) essays from one public university in Minas Gerais. We used the intraclass correlation coefficient to analyze the reliability and agreement among evaluators in the correction of the essays from 2005 to 2010. We identified little use, in the educational research, of agreement techniques among evaluators. As for the analysis of the correction of essays, some results were satisfactory (example: mean reliability of the evaluators for total scores of the essays) and others were unsatisfactory (example: low agreement in some criteria of correction).

Downloads

Download data is not yet available.

Author Biography

Daniel Abud Seabra Matos, Departamento de Educação da Universidade Federal de Ouro Preto (Ufop), Ouro Preto, Minas Gerais, Brasil

Professor do Departamento de Educação da Universidade Federal de Ouro Preto (Ufop); Doutor em Educação pela Universidade Federal de Minas Gerais (UFMG). Doutorado Sanduíche na University of Florida, Estados Unidos.

danielmatos@ichs.ufop.br

References

ALTMAN, D. Practical statistics for medical research. Boca Raton, FL: CRC, 1991. AMERICAN EDUCATIONAL RESEARCH ASSOCIATION; AMERICAN PSYCHOLOGICAL ASSOCIATION; NATIONAL COUNCIL ON MEASUREMENT IN EDUCATION. Standards for educational and psychological testing. Washington, D.C.: American Educational Research Association, 1999.

ANDRADE, M.; SHIRAKAWA, I. Versão brasileira do Defense Style Questionnaire (DSQ) de Michael Bond: problemas e soluções. Revista de Psiquiatria do Rio Grande do Sul, Porto Alegre, v. 28, n. 2, p. 144-160, 2006. DOI: https://doi.org/10.1590/S0101-81082006000200007

BACHA, N. Writing evaluation: What can analytic versus holistic essay scoring tell us? System, v. 29, p. 371-383, 2001. DOI: https://doi.org/10.1016/S0346-251X(01)00025-2

BLAND, J. M.; ALTMAN, D. G. A note on the use of the intraclass correlation coefﬁcient in the evaluation of agreement between two methods of measurement. Comput. Biol. Med., v. 20, n. 5, p. 337-340, 1990. DOI: https://doi.org/10.1016/0010-4825(90)90013-F

BLOOD, E.; SPRATT, K. F. Disagreement on Agreement: Two Alternative Agreement Coefﬁcients. SAS Global Forum, 2007.

BRUSCATO, W. L.; IACOPONI, E. Validade e conﬁabilidade da versão brasileira de um inventário de avaliação de relações objetais. Rev. Bras. Psiquiatr., São Paulo, v. 22, n. 4, p. 172-177, 2000. DOI: https://doi.org/10.1590/S1516-44462000000400006

COHEN, J. A coefﬁcient of agreement for nominal scales. Educational and

Psychological Measurement, v. 20, p. 37-46, 1960. DOI: https://doi.org/10.1177/001316446002000104

COHEN, J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, v. 70, p. 213-220, 1968. DOI: https://doi.org/10.1037/h0026256

CROCKER, L.; ALGINA, J. Introduction to Classical and Modern Test Theory. Belmont, CA: Wadsworth Group, 2009.

DEL-BEN, C. M.; VILELA, J. A. A.; CRIPPA, J. A. S.; HALLAK, J. E. C.; LABATE, C. M.; ZUARDI, A. W. Conﬁabilidade da Entrevista Clínica Estruturada para o DSM-IV – Versão Clínica traduzida para o português. Rev. Bras. Psiquiatr., São Paulo, v. 23, n. 3, p. 156-159, 2001. DOI: https://doi.org/10.1590/S1516-44462001000300008

EMBRETSON, S. E.; REISE, S. P. Item response theory for psychologists. New York: Routledge, 2000. DOI: https://doi.org/10.1037/10519-153

FLEISS, J. Statistical methods for rates and proportions. New York: John Wiley & Sons, 1981.

FONSECA, R.; SILVA, P.; SILVA, R. Acordo inter-juízes: O caso do coeﬁciente kappa. Laboratório de Psicologia, Lisboa, v. 5, n.1, p. 81-90, 2007. DOI: https://doi.org/10.14417/lp.759

FRAGA-MAIA, H.; SANTANA, V. S. Concordância de informações de adolescentes e suas mães em inquérito de saúde. Revista de Saúde Pública, São Paulo, v. 39, n. 3, p. 430-437, 2005. DOI: https://doi.org/10.1590/S0034-89102005000300014

GRAHAM, M.; MILANOWSKI, A.; MILLER, J. Measuring and promoting inter-rater agreement of teacher and principal performance ratings. Research Report, 2012.

GWET, K. Handbook of inter-rater reliability: How to estimate the level of agreement between two or multiple raters. Gaithersburg: Stataxis, 2001.

HAIR, J. F.; ANDERSON, R. E.; TATHAM, R. L.; BLACK, W. C. Análise multivariada de dados. 5. ed. Porto Alegre: Bookman, 2005. 593 p.

HAMP-LYONS, L. Scoring procedures for ESL contexts. In: HAMP-LYONS, L. (Ed.). Assessing second language writing in academic contexts. Norwood, NJ: Ablex, 1991. p. 241–276.

HANEY, W.; RUSSELL, M.; BEBELL, D. Drawing on education: using drawings to document schooling and support changes. Harvard Educational Review, v. 74, n. 3, 241-271, 2004. DOI: https://doi.org/10.17763/haer.74.3.w0817u84w7452011

HAYS, R. D.; REVIKI, D. A. Reliability and validity (including responsiveness). In: FAYERS, P. M.; HAYS, R. D. (Ed.). Assessing quality of life in clinical trials: Methods and practice. NY: Oxford University Press, 2005.

JORBA, J.; SANMARTÍ, N. A função pedagógica da avaliação. In: BALLESTER, M. (Org.). Avaliação como apoio à aprendizagem. Porto Alegre: Artmed, 2003. cap 2, p.23-45.

KING, J. E. Software Solutions for Obtaining a Kappa-Type Statistic for Use with Multiple Raters. In: ANNUAL MEETING OF THE SOUTHWEST EDUCATIONAL RESEARCH ASSOCIATION, 2004, Dallas, EUA. Anais… Dallas:

LANDIS, J. R.; KOCH, G. G. A one way components of variance model for categorical data. Biometrics, v. 33, p. 671–679, 1977. DOI: https://doi.org/10.2307/2529465

LEBRETON, J. M.; SENTER, J. L. Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, v. 11, n. 4, p. 815-852, 2008. DOI: https://doi.org/10.1177/1094428106296642

LINACRE, J. M. Rating, judges and fairness. Rasch Measurement Transactions, v. 12, n. 2, p. 630-1, 1998.

LU, L.; SHARA, N. Reliability analysis: calculate and compare intra-class correlation coefﬁcients (ICC) in SAS. NESUG, 2007.

PERROCA, M. G.; GAIDZINSKI, R. R. Instrumento de classiﬁcação de pacientes de perroca: teste de conﬁabilidade pela concordância entre avaliadores – correlação. Rev. Esc. Enferm., São Paulo, v. 36, n. 3, p. 245-252, 2002. DOI: https://doi.org/10.1590/S0080-62342002000300006

PERROCA, M. G.; GAIDZINSKI, R. R. Avaliando a conﬁabilidade interavaliadores de um instrumento para classiﬁcação de pacientes – coeﬁciente Kappa. Rev. Esc. Enferm., São Paulo, v. 37, n. 1, p. 72-80, 2003. DOI: https://doi.org/10.1590/S0080-62342003000100009

POLANCZYK, G. V.; EIZIRIK, M.; ARANOVICH, V.; DENARDIN, D.; SILVA, T. L.; CONCEIÇÃO, T. V.; PIANCA, T. G.; ROHDE, L. A. Interrater agreement for the schedule for affective disorders and schizophrenia epidemiological version for school-age children (K-SADS-E). Rev. Bras. Psiquiatr., São Paulo, v. 25, n. 2, p. 87-90, 2003. DOI: https://doi.org/10.1590/S1516-44462003000200007

PRIMI, R.; MIGUEL, F. K.; COUTO, G.; MUNIZ, M. Precisão de avaliadores

na avaliação da criatividade por meio da produção de metáforas. Psico-USF, Itatiba, v. 12, n. 2, p. 197-210, 2007. DOI: https://doi.org/10.1590/S1413-82712007000200008

QUINTANA, H. E. O portfólio como estratégia para a avaliação. In: BALLESTER, M. (Org.). Avaliação como apoio à aprendizagem. Porto Alegre: Artmed, 2003. cap 16, p.163-173.

ROMBERG, A. Intraclass correlation coefﬁcients. Reliability and more. 2009. Disponível em: <http://www.docstoc.com/docs/112692917/Intraclass- correlation-coefﬁcients>. Acesso em: 07 jan. 2012.

SANTOS, T. M. de B. M. dos; MONTEIRO, V. R. V.; JUNIOR, J. F. R. Conﬁabilidade dos julgamentos de avaliadores de prova escrita na seleção para o mestrado. Est. Aval. Educ., São Paulo, v. 21, n. 46, p. 363-374, maio/ago. 2010. DOI: https://doi.org/10.18222/eae214620102017

SCHUSTER, C. A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, v. 64, p. 243-253, 2004. DOI: https://doi.org/10.1177/0013164403260197

STEMLER, S. E. A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, v. 9, n. 4, 2004.

STUFFLEBEAM, D. L. (Org.). Educational evaluation & decision making. Bloomington: Phi Delta Kappa, 1971.

TINSLEY, H. E. A.; WEISS, D. J. Interrater reliability and agreement. In: TINSLEY, H. E. A.; BROWN, S. D. (Ed.). Handbook of applied multivariate statistics and mathematical modeling. New York: Academic Press, 2000. p. 95-124. DOI: https://doi.org/10.1016/B978-012691360-6/50005-7

URBINA, S. Fundamentos da testagem psicológica. Porto Alegre: Artmed, 2007.

VENTURA, M. M.; BOTTINO, C. M. C. Estudo de conﬁabilidade da versão em português de uma entrevista estruturada para o diagnóstico de demência. Revista da Associação Médica Brasileira, São Paulo, v. 47, n. 2, p. 110-116, 2001. DOI: https://doi.org/10.1590/S0104-42302001000200028

Reliability and agreement among evaluators: applications in the educational area

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Daniel Abud Seabra Matos, Departamento de Educação da Universidade Federal de Ouro Preto (Ufop), Ouro Preto, Minas Gerais, Brasil

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Keywords

Press Releases

Siga a FCC

Information

Developed By

Mapa de Acessos