Dharmadasa, T. K. R. S.Rupasingha, R. A. H. M.Kumara, B. T. G. S.2025-10-092025Dharmadasa, T. K. R. S., Rupasingha, R. A. H. M., & Kumara, B. T. G. S. (2025). A deep learning-based approach for detecting duplicate GitHub issues in open-source repositories using LSTM. In Proceedings of the International Research Conference on Smart Computing and Systems Engineering (SCSE 2025). Department of Industrial Management, Faculty of Science, University of Kelaniya.http://repository.kln.ac.lk/handle/123456789/30070GitHub is a platform used along with the popular version control tool Git to provide hosting facilities to software repositories. Users can publish GitHub issues to notify the repository contributors about bugs, questions, and feature requests. GitHub hosts open-source repositories that are contributed by developers across the globe. The asynchronous and uncoordinated nature of these contributions in open-source repositories increases the probability of posting duplicate GitHub issues, resulting in redundant efforts. The standard mechanism introduced by GitHub to mark duplicate issues is adding a comment to that issue body mentioning the original issue. Then GitHub will add the corresponding duplicate tag and close that issue. However, due to manual labor required to find duplicates, developers are discouraged from seeking similar issues before publishing a new issue to GitHub. The study’s main objective is to address this problem and propose an automated solution using deep learning algorithms. Our research introduces a novel approach that combines feature extraction and similarity calculations to identify duplicate GitHub issues. The proposed methodology extracted over 4000 GitHub issues covering different programming languages and repositories. After pre-processing, various features were extracted using multiple feature extraction techniques, and semantic similarity metrics such as cosine similarity were utilized to create the feature vector. The feature vector was used with different algorithms like Artificial Neural Network (ANN) and Recurrent Neural Network (RNN) including deep-learning algorithms like Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). Algorithm results are compared to detect the most suitable approach for detecting duplicate GitHub issues. Based on the different evaluations, LSTM is the better approach resulting in 88% accuracy with the highest precision, recall, and f-measures while giving the lowest error rates. With this proposed methodology, duplicate GitHub issues can be easily detected, reducing the manual work.Deep LearningDuplicate DetectionGitHub IssuesA Deep Learning-Based Approach for Detecting Duplicate GitHub Issues in Open-Source Repositories Using LSTMArticle