Please use this identifier to cite or link to this item: http://repository.kln.ac.lk/handle/123456789/26951
Title: Machine learning model to predict bank customer's next expenditure with relevant merchant category
Authors: Umayanga, A. M. K. H.
Dissanayake, D. M. L. M.
Keywords: Expenditure prediction, Gradient Boosting Regressor, Personalized Financial planning, Random Forest Classifier, Random Forest Regressor
Issue Date: 2023
Publisher: Faculty of Science, University of Kelaniya Sri Lanka
Citation: Umayanga A. M. K. H.; Dissanayake D. M. L. M. (2023) Machine learning model to predict bank customer's next expenditure with relevant merchant category, Proceedings of the International Conference on Applied and Pure Sciences (ICAPS 2023-Kelaniya) Volume 3, Faculty of Science, University of Kelaniya Sri Lanka. Page 116
Abstract: The banking industry's increasing reliance on debit card transactions has generated a wealth of valuable data for understanding consumer behaviour. This study aims to develop a machine learning model to predict a customer's next expenditure and the corresponding merchant category using 50 customers' debit card transaction data for 11 years. Unlike existing research focused on bankrupt users and fraud detection, this study addresses the next expenditure prediction with merchant categories. For the bank, predicting a customer's next expenditure and merchant category enables targeted marketing efforts. The bank can send alert messages with discount offers specifically to each customer's spending habits, reducing marketing costs by only targeting relevant customers for relevant merchant types. Additionally, customers benefit from early reminders, allowing them to manage their finances effectively. For instance, a customer can receive a reminder about an upcoming insurance payment and allocate funds, accordingly, avoiding unnecessary expenses. This proactive approach can help reduce the number of bankrupt customers and long-term customer relationships. Challenges in this study include obtaining a dataset that is not readily available on the internet. The dataset was provided by the Digital Banking Department at the Head Office of the People's Bank while ensuring data privacy. Data preprocessing involved removing null values and unnecessary columns and creating customer IDs instead of account numbers. Then, identified 36 customers who consistently used debit cards and categorised merchant names into 11 groups. The dataset was split into training and testing sets using a specific date. Three machine learning algorithms, gradient boosting regressor, random forest regressor, and random forest classifier, were employed. Gradient boosting regressor is used to predict expenditures and merchant categories after encoding the categories using one-hot encoding. Random forest regressor is for expenditure prediction, and random forest classifier is used for merchant category prediction. Ordinal encoding was used to convert categories into numerical values. Model performance was optimised through hyperparameter (learning rate, number of trees, maximum depth of each decision tree, minimum number of samples required to split an internal node, minimum number of samples required to be at a leaf node, and fixed random seed for reproducibility) tuning using grid search, evaluating various combinations of hyperparameters through cross-validation. Models run through each customer’s unique dataset since expanding patterns are different from each other. The results showed that the random forest regressor and random forest classifier-based method achieved higher accuracy compared to the gradient boosting regressor. This was evident from R2 scores (0.9866 and 1.0605) and mean squared error values (MSEs are 313165.9622 and 5.6257). However, the method yielded R2 scores exceeding 1 and a high MSE value due to an unbalanced dataset, where customers' debit card usage frequency varied. Obtaining a balanced dataset with an equal number of transactions for each customer is challenging, especially when requesting data from a bank. In the future, this study could be extended to predict the exact time and date of transactions using techniques like long short-term memory (LSTM) with a larger dataset like 1000 customers.
URI: http://repository.kln.ac.lk/handle/123456789/26951
Appears in Collections:ICAPS 2023

Files in This Item:
File Description SizeFormat 
ICAPS 2023 116.pdf199.53 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.