Leveraging Machine Learning in Student Peer Review:  A Systematic Literature Review

Theresia Devi Indriasari; Yohanes Sigit Purnomo W.P.

doi:10.24002/jbi.v17i1.14753

Authors

Theresia Devi Indriasari Program Studi Informatika, Fakultas Teknologi Industri, Universitas Atma Jaya Yogyakarta
Yohanes Sigit Purnomo W.P. Program Studi Informatika, Fakultas Teknologi Industri, Universitas Atma Jaya Yogyakarta

DOI:

https://doi.org/10.24002/jbi.v17i1.14753

Keywords:

Automated Feedback, Machine Learning, Peer Assessment, Student Peer Review, Systematic Literature Review, Pembelajaran Mesin, Penilaian Sejawat, Tinjauan Pustaka Sistematis

Abstract

Our study examines how machine learning techniques are integrated into student peer review processes, focusing on the challenges that motivate their adoption and the methods used to address them. Using Kitchenham’s systematic literature review framework, 328 articles were screened, and 25 empirical studies on machine learning applications in student peer review were selected. The findings show that machine learning is mainly used to manage large volumes of reviews, support automated grading, and improve feedback quality. Common techniques include classification, prediction, ranking, and clustering, which help improve the fairness, efficiency, and objectivity of peer review. This study provides a rigorous synthesis of machine learning adoption in student peer review and highlights its potential to enhance assessment accuracy, support learning outcomes, and guide future research and broader implementation in educational contexts.

Penelitian ini mengkaji bagaimana teknik pembelajaran mesin diintegrasikan dalam proses student peer review, dengan berfokus pada tantangan yang mendorong penerapannya serta metode yang digunakan untuk mengatasinya. Dengan menggunakan kerangka Systematic Literature Review dari Kitchenham, sebanyak 328 artikel diseleksi menjadi 25 studi empiris yang membahas penerapan pembelajaran mesin dalam student peer review. Hasil penelitian menunjukkan bahwa pembelajaran mesin terutama digunakan untuk mengelola volume ulasan yang besar, mendukung penilaian otomatis, dan meningkatkan kualitas umpan balik. Teknik yang umum digunakan meliputi klasifikasi, prediksi, pemeringkatan, dan pengelompokan yang berkontribusi terhadap peningkatan akurasi, efisiensi, dan objektivitas dalam proses peer review. Penelitian ini menyajikan sintesis yang sistematis mengenai penerapan pembelajaran mesin dalam student peer review serta menyoroti potensinya dalam meningkatkan akurasi penilaian, mendukung capaian pembelajaran, dan menjadi dasar bagi penelitian serta implementasi yang lebih luas dalam konteks pendidikan.

References

[1] J. Serrano-Aguilera et al., “Using Peer Review for Student Performance Enhancement: Experiences in a Multidisciplinary Higher Education Setting,” Education Sciences, 2021, doi: 10.3390/educsci11020071.

[2] N. Ardill, “Peer feedback in higher education: student perceptions of peer review and strategies for learning enhancement,” European Journal of Higher Education, vol. 15, no. 4, pp. 696–721, Jan. 2025, doi: 10.1080/21568235.2025.2457466.

[3] N. T. Kerman, S. K. Banihashem, M. Karami, E. Er, S. van Ginkel, and O. Noroozi, “Online peer feedback in higher education: A synthesis of the literature,” Education and Information Technologies, vol. 29, no. 1, pp. 763–813, Jan. 2024, doi: 10.1007/s10639-023-12273-8.

[4] B. Ortega-Ruipérez and J. M. Correa-Gorospe, “Peer assessment to promote self-regulated learning with technology in higher education: systematic review for improving course design,” Frontiers in Education, 2024, doi: 10.3389/feduc.2024.1376505.

[5] A. Annasekaran, V. Rajasekar, R. M. P, and V. Kalaivani, “Peer-Reviewed Reflective Writing by Phase II Medical Students: A Mixed-Method Study,” Cureus, vol. 17, 2025, doi: 10.7759/cureus.90679.

[6] A. Darvishi, H. Khosravi, A. Rahimi, S. Sadiq, and D. Gašević, “Assessing the Quality of Student-Generated Content at Scale: A Comparative Analysis of Peer-Review Models,” IEEE Transactions on Learning Technologies, vol. 16, pp. 106–120, 2023, doi: 10.1109/tlt.2022.3229022.

[7] S. Hutt et al., Feedback on Feedback: Comparing Classic Natural Language Processing and Generative AI to Evaluate Peer Feedback. Proceedings of the 14th Learning Analytics and Knowledge Conference, 2024. doi: 10.1145/3636555.3636850.

[8] K. J. Topping, E. F. Gehringer, H. Khosravi, S. Gudipati, K. Jadhav, and S. Susarla, “Enhancing peer assessment with artificial intelligence,” International Journal of Educational Technology in Higher Education, vol. 22, 2025, doi: 10.1186/s41239-024-00501-1.

[9] P. C. Sauer and S. Seuring, “How to Conduct Systematic Literature Reviews in Management Research: A Guide in 6 Steps and 14 Decisions,” Review of Managerial Science, vol. 17, pp. 1899–1933, 2023, doi: 10.1007/s11846-023-00668-3.

[10] J. Paul, P. Khatri, and H. K. Duggal, “Frameworks for Developing Impactful Systematic Literature Reviews and Theory Building: What, Why and How?,” Journal of Decision Systems, vol. 33, no. 4, pp. 537–550, 2024, doi: 10.1080/12460125.2023.2197700.

[11] M. Azarian, H. Yu, A. Shiferaw, and T. Stevik, “Do We Perform Systematic Literature Review Right? A Scientific Mapping and Methodological Assessment,” Logistics, vol. 7, no. 4, p. 89, 2023, doi: 10.3390/logistics7040089.

[12] G. Marzi, M. Balzano, A. Caputo, and M. M. Pellegrini, “Guidelines for Bibliometric-Systematic Literature Reviews: 10 Steps to Combine Analysis, Synthesis and Theory Development,” International Journal of Management Reviews, vol. 27, no. 1, pp. 81–103, 2025, doi: 10.1111/ijmr.12381.

[13] W. Bandara and R. Syed, “The Role of a Protocol in a Systematic Literature Review,” Journal of Decision Systems, vol. 33, no. 4, pp. 583–600, 2024, doi: 10.1080/12460125.2023.2217567.

[14] B. Kitchenham, “Procedures for performing systematic reviews,” Keele, UK, Keele University, vol. 33, no. 2004, pp. 1–26, 2004, [Online]. Available: http://www.it.hiof.no/~haraldh/misc/2016-08-22-smat/Kitchenham-Systematic-Review-2004.pdf

[15] V. Clarke and V. Braun, “Thematic analysis,” The Journal of Positive Psychology, vol. 12, no. 3, pp. 297–298, 2016, doi: 10.1080/17439760.2016.1262613.

[16] I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Computer Science, vol. 2, no. 3, p. 160, 2021, doi: 10.1007/s42979-021-00592-x.

[17] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Comput. Surv., vol. 41, no. 3, Jul. 2009, doi: 10.1145/1541880.1541882.

[18] A. Dood, K. Das, Z. Qian, S. Finkenstaedt-Quinn, A. Gere, and G. Shultz, “A Dashboard to Provide Instructors with Automated Feedback on Students’ Peer Review Comments,” in LAK23: 13th International Learning Analytics and Knowledge Conference, in LAK2023. New York, NY, USA: Association for Computing Machinery, 2023, pp. 619–625. doi: 10.1145/3576050.3576087.

[19] D. A. J. Leijen, “A Novel Approach to Examine the Impact of Web-based Peer Review on the Revisions of L2 Writers,” Computers and Composition, vol. 43, pp. 35–54, 2017, doi: 10.1016/j.compcom.2016.11.005.

[20] M. P. Ortega, L. B. Mendoza, J. M. Hormaza, and S. V. Soto, “Accuracy’ Measures of Sentiment Analysis Algorithms for Spanish Corpus generated in Peer Assessment,” in Proceedings of the 6th International Conference on Engineering & MIS 2020, in ICEMIS’20. New York, NY, USA: Association for Computing Machinery, 2020. doi: 10.1145/3410352.3410838.

[21] C.-J. Huang, Y.-W. Wang, S.-C. Chang, S.-Y. Lin, J.-H. Tseng, and J.-J. Jian, “Applications of data mining to an online argumentation based learning assistance platform,” in The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems, 2012, pp. 807–811. doi: 10.1109/SCIS-ISIS.2012.6505083.

[22] J. R. Rico-Juan, A.-J. Gallego, and J. Calvo-Zaragoza, “Automatic detection of inconsistencies between numerical scores and textual feedback in peer-assessment processes with machine learning,” Computers & Education, vol. 140, p. 103609, 2019, doi: https://doi.org/10.1016/j.compedu.2019.103609.

[23] A. E. Waters, D. Tinapple, and R. G. Baraniuk, “BayesRank: A Bayesian Approach to Ranked Peer Grading,” in Proceedings of the Second (2015) ACM Conference on Learning @ Scale, in L@S ’15. New York, NY, USA: Association for Computing Machinery, 2015, pp. 177–183. doi: 10.1145/2724660.2724672.

[24] Y. Zhang and E. F. Gehringer, “Can Students Produce Effective Training Data to Improve Formative Feedback?,” in 2021 IEEE Frontiers in Education Conference (FIE), 2021, pp. 1–7. doi: 10.1109/FIE49875.2021.9637414.

[25] W. Hart-Davidson, R. Omizo, and M. Meeks, “Detecting High-Quality Comments in Written Feedback with a Zero Shot Classifier,” in Proceedings of the 39th ACM International Conference on Design of Communication, in SIGDOC ’21. New York, NY, USA: Association for Computing Machinery, 2021, pp. 319–325. doi: 10.1145/3472714.3473659.

[26] Y. Xiao, Y. Gao, C. Yue, and E. Gehringer, “Estimating Student Grades through Peer Assessment as a Crowdsourcing Calibration Problem,” in 2022 20th International Conference on Information Technology Based Higher Education and Training (ITHET), 2022, pp. 1–9. doi: 10.1109/ITHET56107.2022.10031993.

[27] M. Selmi, H. Hage, and E. Aïmeur, “Evaluating LSA sensibility to disclosure in learners’ interactions,” in 2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA), 2015, pp. 1–8. doi: 10.1109/SITA.2015.7358384.

[28] Y. Han, W. Wu, Y. Yan, and L. Zhang, “Human-Machine Hybrid Peer Grading in SPOCs,” IEEE Access, vol. 8, pp. 220922–220934, 2020, doi: 10.1109/ACCESS.2020.3043291.

[29] J. Xu, J. Liu, P. Lv, and P. Yang, “Improving Peer Assessment Accuracy by Incorporating Grading Behaviors,” in 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), 2021, pp. 1162–1169. doi: 10.1109/ICTAI52525.2021.00184.

[30] O. Luaces, J. Díez, A. Alonso, A. Troncoso, and A. Bahamonde, “Including Content-Based Methods in Peer-Assessment of Open-Response Questions,” in 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 2015, pp. 273–279. doi: 10.1109/ICDMW.2015.256.

[31] C. Cachero, J. R. Rico-Juan, and H. Macià, “Influence of personality and modality on peer assessment evaluation perceptions using Machine Learning techniques,” Expert Systems with Applications, vol. 213, p. 119150, 2023, doi: 10.1016/j.eswa.2022.119150.

[32] X. Fan, Y. Liu, N. Cao, J. Hong, and J. Wang, “MindMiner: Quantifying Entity Similarity via Interactive Distance Metric Learning,” in Proceedings of the 20th International Conference on Intelligent User Interfaces Companion, in IUI Companion ’15. New York, NY, USA: Association for Computing Machinery, 2015, pp. 93–96. doi: 10.1145/2732158.2732173.

[33] Y. Xiao et al., “Modeling review helpfulness with augmented transformer neural networks,” in 2022 IEEE 16th International Conference on Semantic Computing (ICSC), 2022, pp. 83–90. doi: 10.1109/ICSC52841.2022.00019.

[34] V. Bolón-Canedo, J. Díez, O. Luaces, A. Bahamonde, and A. Alonso-Betanzos, “Paving the way for providing teaching feedback in automatic evaluation of open response assignments,” in 2017 International Joint Conference on Neural Networks (IJCNN), 2017, pp. 3447–3453. doi: 10.1109/IJCNN.2017.7966289.

[35] Z. Fan, M. LU, and X. Li, “Peer Assessment Based on the User Preference Matrix,” in 2020 International Conference on Artificial Intelligence and Education (ICAIE), 2020, pp. 1–4. doi: 10.1109/ICAIE50891.2020.00009.

[36] M. Parvez Rashid, E. F. Gehringer, M. Young, D. Doshi, Q. Jia, and Y. Xiao, “Peer Assessment Rubric Analyzer: An NLP approach to analyzing rubric items for better peer-review,” in 2021 19th International Conference on Information Technology Based Higher Education and Training (ITHET), 2021, pp. 1–9. doi: 10.1109/ITHET50392.2021.9759679.

[37] M. S. M. Sajjadi, M. Alamgir, and U. von Luxburg, “Peer Grading in a Course on Algorithms and Data Structures: Machine Learning Algorithms do not Improve over Simple Baselines,” in Proceedings of the Third (2016) ACM Conference on Learning @ Scale, in L@S ’16. New York, NY, USA: Association for Computing Machinery, 2016, pp. 369–378. doi: 10.1145/2876034.2876036.

[38] C. E. Kulkarni, R. Socher, M. S. Bernstein, and S. R. Klemmer, “Scaling short-answer grading by combining peer assessment with algorithmic scoring,” in Proceedings of the First ACM Conference on Learning @ Scale Conference, in L@S ’14. New York, NY, USA: Association for Computing Machinery, 2014, pp. 99–108. doi: 10.1145/2556325.2566238.

[39] P. M. Moreno-Marcos, C. Alario-Hoyos, P. J. Muñoz-Merino, I. Estévez-Ayres, and C. D. Kloos, “Sentiment analysis in MOOCs: A case study,” in 2018 IEEE Global Engineering Education Conference (EDUCON), 2018, pp. 1489–1496. doi: 10.1109/EDUCON.2018.8363409.

[40] F. Sciarrone and M. Temperini, “Simulating Massive Open On-line Courses Dynamics,” in 2019 18th International Conference on Information Technology Based Higher Education and Training (ITHET), 2019, pp. 1–9. doi: 10.1109/ITHET46829.2019.8937336.

[41] A. V. Y. Lee, “Supporting students’ generation of feedback in large-scale online course with artificial intelligence-enabled evaluation,” Studies in Educational Evaluation, vol. 77, p. 101250, 2023, doi: 10.1016/j.stueduc.2023.101250.

[42] E. Scheihing, M. Vernier, J. Guerra, J. Born, and L. Crcamo, “Understanding the Role of Micro-Blogging in B-Learning Activities: Kelluwen Experiences in Chilean Public Schools,” IEEE Transactions on Learning Technologies, vol. 11, no. 3, pp. 280–293, 2018, doi: 10.1109/TLT.2017.2714163.

[43] A. Bürgermeister, I. Glogger-Frey, and H. Saalbach, “Supporting Peer Feedback on Learning Strategies: Effects on Self-Efficacy and Feedback Quality,” Psychology Learning & Teaching, vol. 20, pp. 383–404, 2021, doi: 10.1177/14757257211016604.

[44] A. Sizo, A. Lino, Á. Rocha, and L. P. Reis, “Defining quality in peer review reports: a scoping review,” Knowledge and Information Systems, vol. 67, pp. 6413–6460, 2025, doi: 10.1007/s10115-025-02435-0.

[45] H. Baer, E. Legome, D. Satnick, J. McHugh, and G. Loo, “A New Teaching Tool for Peer Review of Charting and Care in the Emergency Department,” The Joint Commission Journal on Quality and Patient Safety, vol. 49, no. 2, pp. 105–110, 2023, doi: 10.1016/j.jcjq.2022.10.007.

[46] C. Rastogi et al., “A randomized controlled trial on anonymizing reviewers to each other in peer review discussions,” PLoS ONE, vol. 19, no. 12, 2024, doi: 10.1371/journal.pone.0315674.

[47] J. L. Hill, K. Berlin, J. Choate, L. Cravens-Brown, L. McKendrick-Calder, and S. F. Smith, “Exploring the Emotional Responses of Undergraduate Students to Assessment Feedback: Implications for Instructors,” Teaching & Learning Inquiry, vol. 9, no. 1, 2021, doi: 10.20343/teachlearninqu.9.1.20.

[48] O. Akbari and J. Sahibzada, “Students’ Self-Confidence and Its Impacts on Their Learning Process,” American International Journal of Social Science Research, vol. 5, no. 1, pp. 1–15, Jan. 2020, doi: 10.46281/aijssr.v5i1.462.

[49] A. Sunar and M. S. Khalid, “Natural Language Processing of Student’s Feedback to Instructors: A Systematic Review,” IEEE Transactions on Learning Technologies, vol. 17, pp. 741–753, 2024, doi: 10.1109/tlt.2023.3330531.

[50] S. Ibragimova et al., “A Longitudinal Study on NLP-Enhanced Bilingual Pedagogy for Non-Linguistic Majors,” Digital Technologies Research and Applications, 2026, doi: 10.54963/dtra.v5i1.1985.

Leveraging Machine Learning in Student Peer Review: A Systematic Literature Review

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

editorial-policies

Editorial Policies

instruction-for-authors

Instruction for Authors

article-template-and-instructions

Article Templates and Instructions

accreditation-certificate

Accreditation Certificate

Cited-by

Cited by

Visitors

Visitors

Index-in

Indexed In

Address

Contact Information