Reliability and agreement among evaluators: applications in the educational area
DOI:
https://doi.org/10.18222/eae255920142750Keywords:
Rate of Reliability, Vestibular examination, Evaluators, EssayAbstract
The aims of this study were to: (1) investigate the strategies for verifying reliability and agreement among evaluators, focusing on the applications in the educational area; (2) conduct a review of the national literature on the techniques of reliability and agreement among judges and their areas of application; and (3) illustrate the application of the techniques of reliability and agreement among evaluators by analyzing the corrections of the Vestibular (college entrance exam) essays from one public university in Minas Gerais. We used the intraclass correlation coefficient to analyze the reliability and agreement among evaluators in the correction of the essays from 2005 to 2010. We identified little use, in the educational research, of agreement techniques among evaluators. As for the analysis of the correction of essays, some results were satisfactory (example: mean reliability of the evaluators for total scores of the essays) and others were unsatisfactory (example: low agreement in some criteria of correction).
Downloads
References
ALTMAN, D. Practical statistics for medical research. Boca Raton, FL: CRC, 1991. AMERICAN EDUCATIONAL RESEARCH ASSOCIATION; AMERICAN PSYCHOLOGICAL ASSOCIATION; NATIONAL COUNCIL ON MEASUREMENT IN EDUCATION. Standards for educational and psychological testing. Washington, D.C.: American Educational Research Association, 1999.
ANDRADE, M.; SHIRAKAWA, I. Versão brasileira do Defense Style Questionnaire (DSQ) de Michael Bond: problemas e soluções. Revista de Psiquiatria do Rio Grande do Sul, Porto Alegre, v. 28, n. 2, p. 144-160, 2006. DOI: https://doi.org/10.1590/S0101-81082006000200007
BACHA, N. Writing evaluation: What can analytic versus holistic essay scoring tell us? System, v. 29, p. 371-383, 2001. DOI: https://doi.org/10.1016/S0346-251X(01)00025-2
BLAND, J. M.; ALTMAN, D. G. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput. Biol. Med., v. 20, n. 5, p. 337-340, 1990. DOI: https://doi.org/10.1016/0010-4825(90)90013-F
BLOOD, E.; SPRATT, K. F. Disagreement on Agreement: Two Alternative Agreement Coefficients. SAS Global Forum, 2007.
BRUSCATO, W. L.; IACOPONI, E. Validade e confiabilidade da versão brasileira de um inventário de avaliação de relações objetais. Rev. Bras. Psiquiatr., São Paulo, v. 22, n. 4, p. 172-177, 2000. DOI: https://doi.org/10.1590/S1516-44462000000400006
COHEN, J. A coefficient of agreement for nominal scales. Educational and
Psychological Measurement, v. 20, p. 37-46, 1960. DOI: https://doi.org/10.1177/001316446002000104
COHEN, J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, v. 70, p. 213-220, 1968. DOI: https://doi.org/10.1037/h0026256
CROCKER, L.; ALGINA, J. Introduction to Classical and Modern Test Theory. Belmont, CA: Wadsworth Group, 2009.
DEL-BEN, C. M.; VILELA, J. A. A.; CRIPPA, J. A. S.; HALLAK, J. E. C.; LABATE, C. M.; ZUARDI, A. W. Confiabilidade da Entrevista Clínica Estruturada para o DSM-IV – Versão Clínica traduzida para o português. Rev. Bras. Psiquiatr., São Paulo, v. 23, n. 3, p. 156-159, 2001. DOI: https://doi.org/10.1590/S1516-44462001000300008
EMBRETSON, S. E.; REISE, S. P. Item response theory for psychologists. New York: Routledge, 2000. DOI: https://doi.org/10.1037/10519-153
FLEISS, J. Statistical methods for rates and proportions. New York: John Wiley & Sons, 1981.
FONSECA, R.; SILVA, P.; SILVA, R. Acordo inter-juízes: O caso do coeficiente kappa. Laboratório de Psicologia, Lisboa, v. 5, n.1, p. 81-90, 2007. DOI: https://doi.org/10.14417/lp.759
FRAGA-MAIA, H.; SANTANA, V. S. Concordância de informações de adolescentes e suas mães em inquérito de saúde. Revista de Saúde Pública, São Paulo, v. 39, n. 3, p. 430-437, 2005. DOI: https://doi.org/10.1590/S0034-89102005000300014
GRAHAM, M.; MILANOWSKI, A.; MILLER, J. Measuring and promoting inter-rater agreement of teacher and principal performance ratings. Research Report, 2012.
GWET, K. Handbook of inter-rater reliability: How to estimate the level of agreement between two or multiple raters. Gaithersburg: Stataxis, 2001.
HAIR, J. F.; ANDERSON, R. E.; TATHAM, R. L.; BLACK, W. C. Análise multivariada de dados. 5. ed. Porto Alegre: Bookman, 2005. 593 p.
HAMP-LYONS, L. Scoring procedures for ESL contexts. In: HAMP-LYONS, L. (Ed.). Assessing second language writing in academic contexts. Norwood, NJ: Ablex, 1991. p. 241–276.
HANEY, W.; RUSSELL, M.; BEBELL, D. Drawing on education: using drawings to document schooling and support changes. Harvard Educational Review, v. 74, n. 3, 241-271, 2004. DOI: https://doi.org/10.17763/haer.74.3.w0817u84w7452011
HAYS, R. D.; REVIKI, D. A. Reliability and validity (including responsiveness). In: FAYERS, P. M.; HAYS, R. D. (Ed.). Assessing quality of life in clinical trials: Methods and practice. NY: Oxford University Press, 2005.
JORBA, J.; SANMARTÍ, N. A função pedagógica da avaliação. In: BALLESTER, M. (Org.). Avaliação como apoio à aprendizagem. Porto Alegre: Artmed, 2003. cap 2, p.23-45.
KING, J. E. Software Solutions for Obtaining a Kappa-Type Statistic for Use with Multiple Raters. In: ANNUAL MEETING OF THE SOUTHWEST EDUCATIONAL RESEARCH ASSOCIATION, 2004, Dallas, EUA. Anais… Dallas:
LANDIS, J. R.; KOCH, G. G. A one way components of variance model for categorical data. Biometrics, v. 33, p. 671–679, 1977. DOI: https://doi.org/10.2307/2529465
LEBRETON, J. M.; SENTER, J. L. Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, v. 11, n. 4, p. 815-852, 2008. DOI: https://doi.org/10.1177/1094428106296642
LINACRE, J. M. Rating, judges and fairness. Rasch Measurement Transactions, v. 12, n. 2, p. 630-1, 1998.
LU, L.; SHARA, N. Reliability analysis: calculate and compare intra-class correlation coefficients (ICC) in SAS. NESUG, 2007.
PERROCA, M. G.; GAIDZINSKI, R. R. Instrumento de classificação de pacientes de perroca: teste de confiabilidade pela concordância entre avaliadores – correlação. Rev. Esc. Enferm., São Paulo, v. 36, n. 3, p. 245-252, 2002. DOI: https://doi.org/10.1590/S0080-62342002000300006
PERROCA, M. G.; GAIDZINSKI, R. R. Avaliando a confiabilidade interavaliadores de um instrumento para classificação de pacientes – coeficiente Kappa. Rev. Esc. Enferm., São Paulo, v. 37, n. 1, p. 72-80, 2003. DOI: https://doi.org/10.1590/S0080-62342003000100009
POLANCZYK, G. V.; EIZIRIK, M.; ARANOVICH, V.; DENARDIN, D.; SILVA, T. L.; CONCEIÇÃO, T. V.; PIANCA, T. G.; ROHDE, L. A. Interrater agreement for the schedule for affective disorders and schizophrenia epidemiological version for school-age children (K-SADS-E). Rev. Bras. Psiquiatr., São Paulo, v. 25, n. 2, p. 87-90, 2003. DOI: https://doi.org/10.1590/S1516-44462003000200007
PRIMI, R.; MIGUEL, F. K.; COUTO, G.; MUNIZ, M. Precisão de avaliadores
na avaliação da criatividade por meio da produção de metáforas. Psico-USF, Itatiba, v. 12, n. 2, p. 197-210, 2007. DOI: https://doi.org/10.1590/S1413-82712007000200008
QUINTANA, H. E. O portfólio como estratégia para a avaliação. In: BALLESTER, M. (Org.). Avaliação como apoio à aprendizagem. Porto Alegre: Artmed, 2003. cap 16, p.163-173.
ROMBERG, A. Intraclass correlation coefficients. Reliability and more. 2009. Disponível em: <http://www.docstoc.com/docs/112692917/Intraclass- correlation-coefficients>. Acesso em: 07 jan. 2012.
SANTOS, T. M. de B. M. dos; MONTEIRO, V. R. V.; JUNIOR, J. F. R. Confiabilidade dos julgamentos de avaliadores de prova escrita na seleção para o mestrado. Est. Aval. Educ., São Paulo, v. 21, n. 46, p. 363-374, maio/ago. 2010. DOI: https://doi.org/10.18222/eae214620102017
SCHUSTER, C. A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, v. 64, p. 243-253, 2004. DOI: https://doi.org/10.1177/0013164403260197
STEMLER, S. E. A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, v. 9, n. 4, 2004.
STUFFLEBEAM, D. L. (Org.). Educational evaluation & decision making. Bloomington: Phi Delta Kappa, 1971.
TINSLEY, H. E. A.; WEISS, D. J. Interrater reliability and agreement. In: TINSLEY, H. E. A.; BROWN, S. D. (Ed.). Handbook of applied multivariate statistics and mathematical modeling. New York: Academic Press, 2000. p. 95-124. DOI: https://doi.org/10.1016/B978-012691360-6/50005-7
URBINA, S. Fundamentos da testagem psicológica. Porto Alegre: Artmed, 2007.
VENTURA, M. M.; BOTTINO, C. M. C. Estudo de confiabilidade da versão em português de uma entrevista estruturada para o diagnóstico de demência. Revista da Associação Médica Brasileira, São Paulo, v. 47, n. 2, p. 110-116, 2001. DOI: https://doi.org/10.1590/S0104-42302001000200028
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish in this journal agree to the following terms:
a. Authors retain the copyright and grant the journal the right to first publication, with the paper simultaneously licensed under the Creative Commons Attribution license that allows the sharing of the paper with acknowledgment of authorship and initial publication in this journal.
b. Authors are authorized to assume additional contracts separately, for non-exclusive distribution of the version of the paper published in this journal (for example publishing in institutional repository or as a book chapter), with acknowledgment of authorship and initial publication in this journal.
c. Authors are allowed and encouraged to publish and distribute their paper on-line (for example in institutional repositories or on their personal page) at any moment before or during the editorial process, as this can generate productive changes, as well as increase the impact and citation of the published paper (See The Effect of Open Access).