EARLY RISK WARNING IN A SOFT SKILLS COURSE WITH CALIBRATED MULTICLASS XGBOOST: A PREDICT → TIER → INTERVENE FRAMEWORK

EARLY RISK WARNING IN A SOFT SKILLS COURSE WITH CALIBRATED MULTICLASS XGBOOST: A PREDICT → TIER → INTERVENE FRAMEWORK

Tang Thi Vinh* tranvansan@hitu.edu.vn Ho Chi Minh City Industry and Trade College 20 Tang Nhon Phu street, Phuoc Long ward, Ho Chi Minh City, Vietnam
Tran Van San tranvansan@hitu.edu.vn Ho Chi Minh City Industry and Trade College 20 Tang Nhon Phu street, Phuoc Long ward, Ho Chi Minh City, Vietnam
Summary: 
We propose a three-step Predict → Tier → Intervene framework for a college Soft Skills course. Weekly T1–T7 activity traces, attendance/ bonus, mini-tasks, and peer/self reports are standardized into individual–and group–level features. A multiclass XGBoost model with isotonic calibration and group-aware splits yields well–calibrated probabilities that are mapped to RED/YELLOW/GREEN alerts and class summaries of tier mix and at-risk students/groups. On a held out semester, the system achieves Accuracy = 0.772, Macro–F1 = 0.520, AUPRC(C) = 0.739, Brier = 0.1008, ECE = 0.0577. The model clearly separates high from average achievers and–despite the rarity of tier C provides effective risk ranking in precision–recall analysis. The pipeline is low–cost (Google Forms/Sheets plus concise Python), transparent, and reproducible, supporting timely, tiered interventions.
Keywords: 
Educational Data Mining
early warning
tiered instruction
soft skills
probability calibration
XGBoost.
Refers: 

[1] Angeioplastis, A., Aliprantis, J., Konstantakis, M. & Tsimpiris, A. (2025). Predicting student performance and enhancing learning outcomes: A data-driven approach using educational data mining techniques. Computers, 14(3), 83. https:// doi.org/10.3390/computers14030083

[2] Baker, R. S. & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), pp.3–17.

[3] Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), pp.1–3.

[4] Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16), pp.785–794. https://doi.org/10.1145/2939672.2939785.

[5] Garg, A., Ali, N., Hollmann, N., Purucker, L., Müller, S. & Hutter, F. (2025). Real-TabPFN: Improving tabular foundation models via continued pre training with real-world data. In Proceedings of the 1st ICML Workshop on Foundation Models for Structured Data.

[6] Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, pp. 1321–1330.

[7] Hollmann, N., Hütter, S., Schirrmeister, R. T., et al. (2025). Accurate predictions on small data with a tabular foundation model. Nature. Advance online publication.

[8] Malik, S., Patro, S. G. K., Mahanty, C., Hegde, R., Naveed, Q. N., Lasisi, A., Buradi, A., Emma, A. F. & Kraiem, N. (2025). Advancing educational data mining for enhanced student performance prediction: A fusion of feature selection algorithms and classification techniques with dynamic feature ensemble evolution. Scientific Reports, 15, p.8738. https://doi.org/10.1038/s41598-025-92324-x

[9] Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, pp.2825–2830.

[10] Romero, C. & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), pp.601–618. https://doi.org/10.1109/TSMCC.2010.2053532

[11] Tạp chí Giáo dục Thành phố Hồ Chí Minh. (2024). Báo động 80% sinh viên thiếu hụt kĩ năng mềm: Cao đẳng Việt Mĩ nỗ lực đổi mới đào tạo. Giáo dục Thành phố Hồ Chí Minh. https://giaoduc.edu.vn/ bao-dong-80-sinh-vien-thieu-hut-ky-nang-mem cao-dang-viet-my-no-luc-doi-moi-dao-tao/

[12] Zadrozny, B. & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.694–699.

Articles in Issue