Symposia and Conferences

Permanent URI for this communityhttp://repository.kln.ac.lk/handle/123456789/151

Browse

Search Results

Now showing 1 - 5 of 5

Optimization of SpdK-means Algorithm
(Faculty of Graduate Studies, University of Kelaniya, Sri Lanka, 2016) Gunasekara, R.P.T.H.; Wijegunasekara, M.C.; Dias, N.G.J.
This study was carried out to enhance the performance of the k-mean data-mining algorithm by using parallel programming methodologies. As a result, the Speedup k-means (SpdK-means) algorithm which is an extension of k-means algorithm was implemented to reduce the cluster building time. Although SpdK-means speed up the cluster building process, the main drawback was that the cumulative cluster density of the created clusters by the SpdK-means algorithm was different from the initial population. This means some elements (data points) were missed out in the clustering process which reduces the cluster quality. The aim of this paper is to discuss how the drawback was identified and how the SpdK-means algorithm was optimized to overcome the identified drawback. The SpdK-means clustering algorithm was applied to three datasets which was gathered from a Ceylon Electricity Board Dataset by changing the number of clusters k. For k=2, 3, 4 did not give any significant difference between the cumulative cluster density and the initial dataset. When the number of clusters were more than 4 (i.e., when k>=5), there was a significant difference on cluster densities. The density of each cluster was recorded and it was identified that the cumulative density of all clusters was different from the initial population. It was identified that about 1% of elements from total population were missing after clusters were formed. To overcome this identified drawback the SpdK-mean clustering algorithm was studied carefully and it was identified that there are elements which had equal distances from several cluster centroids were missed out in intermediate iterations. When an element had an equal distance to two or more centroids the SpdK-means algorithm was unable to identify to which cluster that the element should belong and as a result the element is not included in any cluster. If such element was included into all the clusters that had an equal distance and if this process is repeated to all such elements the cumulative cluster density will be highly increased from the initial population. Therefore, the SpdK-means was optimized by selecting one of the cluster centroids which had equal distance to one element. After many studies of selection methods and their outcomes, it was able to modify the SpdK-means algorithm to find suitable cluster to an equal distance element. Since, an element can belong to any cluster it is not possible give any priority to select a belonging cluster. As all centroids had equal distances from the elements, the algorithm will select one of the centroid from all equal centroids randomly. The developed optimized SpdK-means algorithm successfully solved the identified problem by identifying missing elements and including them in to the correct clusters. By analyzing the iterations when applied to the datasets, the number of iterations was reduced by 20% than the former SpdK-means algorithm. After applying optimized SpdK-means algorithm to above mentioned datasets, it was found that it reduces the cluster building time by 10% to 12% than the SpdK-means algorithm. Therefore, the cluster building time was further reduced than the former SpdK-means algorithm.
T-Moms for Restaurants
(Faculty of Graduate Studies, University of Kelaniya, Sri Lanka, 2016) Medhavi, Y.A.U.; Wijegunasekara, M.C.
The aim of this study was to identify the drawbacks of a restaurant order management system and suggest a solution. Several such systems were studied and it was identified that customers waiting time to receive an order is considerably high. This is because during peak hours the waiter staff is not sufficient and the service offered is not to the required standard. In addition, the paper menus can be flimsy, hard to navigate, and outdated. To reduce customer’s wait times, management must ensure sufficient staff to present during peak hours and that they are properly trained to provide excellent customer service. These staffing issues can lead to substantial costs for the business. As a result, the Tablet based Menu and Order Management System (T-MOMS) was implemented to resolve these problems using mobile devices. The T-MOMS contains four systems, a mobile application for customers and three web based systems for the admin panel, kitchen and cashier. The order is taken by a mobile device namely, a tablet placed on the restaurant table which acts as a waiter. The mobile application is started by a waiter by logging into the system and assigning the table number and a waiter identification. The waiter identification and table number are saved in the application until that particular waiter logs out. The mobile application has four subsystems namely, display subsystem, assistance subsystem, commenting subsystem and ordering subsystem. The display subsystem displays the complete restaurant menu by categories, special offers’ information and allows the customer to browse all the currently available menu items by category. The assistance subsystem allows the customer to call a waiter for any assistance needed. The commenting subsystem allows customers to create user accounts for adding comments and share experience on Facebook/Twitter. The ordering subsystem allows to select the desired items and make the order. Once the customer makes the order, first he will be able to view the order information that he has ordered including the payment with/without tax and service charge. After the customer confirms the order, the order is transmitted to the kitchen department via Internet for meal preparation. The kitchen web system displays all order information that are received from the tablets. This include the customer details, table number, the waiter identification and the details of the order. After the order is prepared, the waiter will deliver the order to the customer. At the same time, the cashier web system receives the details of the delivered order and the bill is prepared. The web based admin panel system allows the restaurant’s management to add/view/remove/ update menu items and waiters, view reservation information and their cooking status/payment status, update service charge/tax, viewing revenue information over a time period. The T-MOMS system consists of a central server and a database. All ordering and expenditure information is stored in a central database. Eclipse and PHPStorm used as the IDEs. Mainly used languages are HTML, JavaScript, PHP, JAVA, XML. The menu application is designed to be used on 7" tablets as well as it will be supported on smaller screen sizes. As future development, features such as restaurant statistics should be implemented, paying the bill directly through the menu application should be created.
A Study on Loan Performance Using Data Mining Techniques
(Faculty of Graduate Studies, University of Kelaniya, 2015) Thisara, E.B.; Wijegunasekara, M.C.
Most of the modern financial companies offer loans to customers in order to build up their own business. Such companies have a major problem when they recover the loan as the customers do not pay the installments according to the signed contract. It is crucial to determine/create the appropriate strategies and to identify the risk free customers as there is high potential of non-performing loans. In order to predict the risk factors that affect to non-performing loan, Data Mining techniques were considered. This research discovered the factors/reasons for non-performing loan using the data from a reputed Finance Company. This research focused on eighteen attributes which were referred to as factors affecting a nonperforming loan state and the dataset contained with 30% of test data and 70% of training data from 750 records. Among those attributes eleven key attributes namely: Age, Area, Branch Name, Customer Job, Income, Loan State, Mortgage, Number of Terms, Overdue days, Product Type and Interest Rate were selected to create the data mining models. The considered mining models were namely: Neural Networks (NN), Decision Trees (DT) and Clustering (CL). These models were created using the Business Intelligence tool and the database was created in SQL Server Management Studio 2008R2. The predicted probabilities (as a percentage) of Neural Networks, Decision Trees and Clustering models were 1.57%, 0.44% and 10.46% for non-performing loan state respectively. As the Clustering Model had the highest value it was chosen as the best algorithm to evaluate loan state by using Microsoft clustering method. The Clustering model was given ten clusters numbered from 1 to 10 and five clusters namely: 3, 6, 8, 9 and 10 were identified as the most inclined towards the non-performing loan state by comparative analysis. The predicted probabilities of selected clusters were 23%, 41%, 32%, 23% and 35% respectively and cluster number 6 showed a highest value and cluster number 10 showed the next highest value. Based on cluster performance, numbers 1, 2, 4, 5, 7 had a high probability of becoming performing loan and thus were not included in the analysis. According to the states of attributes within each cluster profiles Product Type, Customer Job, Mortgage, Income, Number of Terms and Interest Rate were identified and shortlisted as the factors affecting the nonperforming loan state most. The research identified that if the customer is self-employed or individual, a small property owner, or having a low income and depending on the type of mortgage (building, vehicle or non-mortgage) the loan tend to be non-performing. The longer duration for loan repayment or higher interest rates will also cause a loan to be non-performing. According to the above results it can be concluded that the high interest loans provided for the unemployed customers or customers with low income have a higher potential to be non-performing and hence resulting in a monetary loss for the financial company. Therefore a financial company will be able to improve its profits if they are more concerned about such customers and undertake suitable decisions. The model will support the financial sector in identifying the amount of loans that could be transformed into the non-performing state. Therefore the findings of this research will benefit the financial industry to reduce the risk of granting loans when providing loans in future.
A Virtual Dressing System
(University of Kelaniya, 2013) Rajasinghe, R.M.C.N.A.; Wijegunasekara, M.C.
Virtual dressing rooms are a relatively new concept, which is slowly becoming a trend on various fashion websites. The virtual dressing room allows a customer who is at home, to virtually try on dresses, and other fashions online. This allows the consumer to gauge, if the style and the fit are an appropriate match before adding it to the virtual shopping cart of a webstore. Customers are nervous about purchasing garments electronically, because they are unsure of what size to order, and how the clothes will look on them. Merchants are nervous about the high volume of apparel returns. For a merchant, the handling of an apparel return can cost up to four times what it costs to process the initial sale of the garment. Industry analysts have estimated that apparel returns for electronic merchants range from about 10% for very basic items to between 35% to 40% for high end clothing. The single biggest reason for returns of apparel purchased electronically is poor fit. The objective of this research is to address the above stated issues. Firstly, to improve the ability to make the right buy, with better opportunities to experiment with the dress style. These are the competitive advantages. Secondly, to reduce the buying risk, time, effort, discomfort, queues at shops, and the proportion of returned items. To address these issues, the technology of image processing, template matching (which is for finding small parts of the image) and thresholding, the simplest method of image segmentation was used. NET was the main framework for this application and C# and C++ are used as the language for the development. The OpenCv libraries were also used for this application. Main functions implemented in this system can be catogorized as follows: 1. Loading the video stream to the form 2. Embedding textile images 3. Facilitating the user to move the textile image that was embedded to the video according to requirements. Any user who is new to the system must select the given item and background. Selected values are written in a text file. These text file values are read by logic files and it would load the appropriate images into the forms. The function cvtColor() in OpenCv converts the input image from one color space to another. In the case of transformation to-from RGB color space the ordering of the channels is specified explicitly (RGB or BGR). In case of non-linear, the input RGB image is normalized to the proper value range in order to get the correct results. Image is scaled before a transformation. Transformations are done within the RGB space by adding or removing an alpha channel, reversing the channel order, conversion to-from 16-bit RGB color (R5:G6:B5 or R5:G5:B5), conversion to-from grayscale and the conversion from a RGB image to gray color. The 8-bit and 16-bit images R, G and B are converted to floating-point format and scaled to fit in a range in between 0 to 1 and the values are then converted to the destination data type. The system is functioned by a threshold color and all the detecting functions are working according to these threshold colors. The OpenCv threshold method is used for the above. The Bayer pattern used in CCD and CMOS cameras allows color pictures from a single plane where R, G and B pixels (sensors of a particular component) are interleaved. The output RGB components of a pixel are interpolated from 1, 2 or 4 neighbors of the pixel with the same color. The implemented system can be used to overcome the identified problems of this study. The system was a real success with the illumination conditions that were used to test the system.
Performance of k-mean data mining algorithm with the use of WEKA-parallel
(University of Kelaniya, 2013) Gunasekara, R.P.T.H.; Dias, N.G.J.; Wijegunasekara, M.C.
This study is based on enhancing the performance of the k-mean data mining algorithm by using parallel programming methodologies. To identify the performance of parallelizing, first a study was done on k-mean algorithm using WEKA in a stand-alone machine and then compared with the performance of k-mean with WEKA-parallel. Data mining is a process to discover if data exhibit similar patterns from the database/dataset in the different areas like finance, retail industry, science, statistics, medical sciences, artificial intelligence, neuro science etc. To discover patterns from large data sets, clustering algorithms such as k-mean, k -medoid and, balance iterative reducing and clustering using hierarchies (BIRCH) are used. In data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k (where k is the number of selected groups) clusters in which each observation belongs to the cluster with the nearest mean. The grouping is done by minimizing the sum of squared distances (Euclidean distances) between items and the corresponding centroid (Center of Mass of the cluster). As the data sets are increasing exponentially, high performance technologies are needed to analyze and to recognize patterns of those data. The applications or the algorithms that are used for these processes have to invoke data records several times iteratively. Therefore, this process is very time consuming and consumes more device memory on a very large scale. During the study of enhancing the performance of data mining algorithms, it was identified that the data mining algorithms that were developed for the parallel processing were based on the distributed, cluster or grid computing environments. Nowadays, the algorithms are required to implement the multi-core processor to utilize the full computation power of the processors. The widely used machine learning and data mining software, namely WEKA was first chosen to analyze clusters and identify the performance of k -mean algorithm. k -mean clustering algorithm was applied to an electricity consumption dataset to generate k clusters. As a result, the dataset was partitioned into k clusters along with their mean values and the time taken to build clusters was also recorded. (The dataset consists of 30000 entries and it was collected from the Ceylon Electricity Board). Secondly to reduce the time consumed, we selected parallel environment using WEKA-parallel (Machine Learning software). This is a new option of WEKA used for multi-core programming methodology that can be used to connect several servers and client machines. Here, threads are passed among machines to fulfill this task. The WEKA parallel was installed and established for some distributed server machines with one client machine. The same electricity consumption dataset was used with k -mean in WEKA-parallel. The speed of building clusters was increased when the parallel software was used. But the mean values of the clusters are not exact with the previously obtained clusters. By visualizing both sets of clusters it was identified that some border elements of the first set of clusters have jumped to other clusters. The mean values of clusters are changed because of those jumped elements. The experiment was done on a single core i3, 3.3 GHz machine with Linux operating system to find the execution time taken to create k number of clusters using WEKA for several different datasets. The same experiment was repeated on a cluster of machines with similar specifications to compute the execution time taken to create k number of clusters in a parallel environment using WEKA-parallel by varying the number of machines in the cluster. According to the results, WEKA-parallel significantly improves the speed of k-mean clustering. The results of the experiment for a dataset on the consumption of electricity consumers in the North Western Province are shown in Table 1. This study shows that the use of WEKA-parallel and parallel programming methodologies significantly improve the performance of the k-mean data mining algorithm for building clusters.

Symposia and Conferences

Browse

Filters

Settings

Sort By

Results per page

Search Results