A Comparative Study of Machine Learning Algorithms for Predicting Graduate Unemployment Duration in Sri Lanka

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Center for Data Science, University of Colombo, Sri Lanka.

Abstract

Graduate unemployment continues to pose a significant socio-economic challenge in Sri Lanka, with many graduates experiencing delays in securing suitable employment. This study aims to predict and compare the duration of post-graduation unemployment using a comprehensive dataset of approximately 49,000 records obtained from the 2019 Unemployed Graduates Training Program facilitated by the Presidential Secretariat. Unemployment duration was categorised into short-, medium-, and long-term classes, with the dataset exhibiting substantial class imbalance. To address this, the Synthetic Minority Oversampling Technique (SMOTE) was employed. Nine machine learning algorithms—Naive Bayes, Logistic Regression, K-Nearest Neighbours (KNN), Support Vector Machine (SVM), Random Forest, XGBoost, LightGBM, CatBoost, and a Neural Network—were evaluated using multiple performance metrics, including accuracy, precision, recall, and F1-score. Although KNN yielded the highest initial test accuracy (94.12%), learning curve diagnostics indicated overfitting, leading to its exclusion from the final comparative analysis. Among the remaining models, Random Forest demonstrated the most favourable balance between predictive accuracy and generalisation, achieving a test accuracy of 90.61% and a cross-validation accuracy of 92.86%. Class-wise evaluation revealed strong performance for the majority class but reduced precision and recall for minority classes, consistent with the underlying imbalance. Macro-averaged metrics (precision = 0.50, recall = 0.66, F1 = 0.54) and weighted averages (precision = 0.95, recall = 0.91, F1 = 0.92) provided a more informative representation of model behaviour. Feature importance analysis identified age and internal/external degree type as the most associated predictors, followed by district. The findings offer actionable insights to policymakers and higher education stakeholders for designing targeted employability interventions to reduce prolonged unemployment among graduates in Sri Lanka.

Description

Citation

Aththanayake, A. M. K. S., & Rajapaksha, R. R. L. U. I. (2025). A comparative study of machine learning algorithms for predicting graduate unemployment duration in Sri Lanka. Proceedings of the 3rd International Conference in Data Science 2025. Center for Data Science, University of Colombo, Sri Lanka. (p. 9).

Collections

Endorsement

Review

Supplemented By

Referenced By