A Comparative Study of Machine Learning Algorithms for Predicting Graduate Unemployment Duration in Sri Lanka

dc.contributor.authorAththanayake, A. M. K. S.
dc.contributor.authorRajapaksha, R. R. L. U. I.
dc.date.accessioned2026-01-16T07:00:51Z
dc.date.issued2025
dc.description.abstractGraduate unemployment continues to pose a significant socio-economic challenge in Sri Lanka, with many graduates experiencing delays in securing suitable employment. This study aims to predict and compare the duration of post-graduation unemployment using a comprehensive dataset of approximately 49,000 records obtained from the 2019 Unemployed Graduates Training Program facilitated by the Presidential Secretariat. Unemployment duration was categorised into short-, medium-, and long-term classes, with the dataset exhibiting substantial class imbalance. To address this, the Synthetic Minority Oversampling Technique (SMOTE) was employed. Nine machine learning algorithms—Naive Bayes, Logistic Regression, K-Nearest Neighbours (KNN), Support Vector Machine (SVM), Random Forest, XGBoost, LightGBM, CatBoost, and a Neural Network—were evaluated using multiple performance metrics, including accuracy, precision, recall, and F1-score. Although KNN yielded the highest initial test accuracy (94.12%), learning curve diagnostics indicated overfitting, leading to its exclusion from the final comparative analysis. Among the remaining models, Random Forest demonstrated the most favourable balance between predictive accuracy and generalisation, achieving a test accuracy of 90.61% and a cross-validation accuracy of 92.86%. Class-wise evaluation revealed strong performance for the majority class but reduced precision and recall for minority classes, consistent with the underlying imbalance. Macro-averaged metrics (precision = 0.50, recall = 0.66, F1 = 0.54) and weighted averages (precision = 0.95, recall = 0.91, F1 = 0.92) provided a more informative representation of model behaviour. Feature importance analysis identified age and internal/external degree type as the most associated predictors, followed by district. The findings offer actionable insights to policymakers and higher education stakeholders for designing targeted employability interventions to reduce prolonged unemployment among graduates in Sri Lanka.
dc.identifier.citationAththanayake, A. M. K. S., & Rajapaksha, R. R. L. U. I. (2025). A comparative study of machine learning algorithms for predicting graduate unemployment duration in Sri Lanka. Proceedings of the 3rd International Conference in Data Science 2025. Center for Data Science, University of Colombo, Sri Lanka. (p. 9).
dc.identifier.urihttp://repository.kln.ac.lk/handle/123456789/31105
dc.publisherCenter for Data Science, University of Colombo, Sri Lanka.
dc.subjectPredictive modelling
dc.subjectRandom Forest
dc.subjectSMOTE
dc.subjectClass imbalance
dc.subjectMachine learning
dc.titleA Comparative Study of Machine Learning Algorithms for Predicting Graduate Unemployment Duration in Sri Lanka
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CONFER~1.PDF
Size:
1.28 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections