Please login first
Rethinking Similarity Scores in Programming Education: A Three-Year Study Across Assignments and Detection Tools
* , ,
1  Faculty of Organization and Informatics, University of Zagreb, Zagreb, Croatia
Academic Editor: Mike Joy

Abstract:

Code similarity tools are widely used in programming courses to support academic integrity, yet fixed percentage cutoffs are often applied without considering how score distributions change over time, vary by assignment type, or differ across detection tools. This study examines similarity patterns over three academic years (2023/2024–2025/2026) using two programming assignments, with approximately 50 students per assignment each year, and multiple similarity detection tools. For each assignment–year–tool combination, we analyzed distributional indicators, including mean, median, quartiles, standard deviation, the 99th percentile, and maximum, to capture central tendency, spread, and upper-tail behavior.

The results show that similarity dynamics are not uniform. In one assignment pattern, central values rose consistently over time (e.g., increasing medians and lower quartiles), indicating that the shift was not driven only by a small number of extreme submissions. In the other pattern, similarity levels started higher and then showed mild increases or stabilization, depending on the tool. Across both patterns, extreme values and upper-tail indicators generally declined in later years, accompanied by lower variability, suggesting a transition toward higher baseline similarity with fewer outlier-heavy cases.

These findings support a more context-aware interpretation of similarity scores in programming education. Rather than relying on a single fixed threshold, educators may benefit from a distribution-based reading of results that accounts for assignment characteristics, year-to-year shifts, and tool-specific behavior. This perspective provides a more robust basis for course-level monitoring and more defensible decision-making in integrity and assessment workflows.

Keywords: academic integrity, code similarity, similarity detection tools, distribution-based analysis, programming assignments

 
 
Top