An online news crawling framework for an aggregated news site

Wagawaththa, W.A.I.K.; Wijayanayake, W.M.J.I.

Please use this identifier to cite or link to this item: http://repository.kln.ac.lk/handle/123456789/15729

Title:	An online news crawling framework for an aggregated news site
Authors:	Wagawaththa, W.A.I.K. Wijayanayake, W.M.J.I.
Keywords:	Information retrieval Category classification Topic classification News ranking
Issue Date:	2016
Publisher:	Faculty of Science, University of Kelaniya, Sri Lanka
Citation:	Wagawaththa, W.A.I.K. and Wijayanayake, W.M.J.I. 2016. An online news crawling framework for an aggregated news site. In Proceedings of the International Research Symposium on Pure and Applied Sciences (IRSPAS 2016), Faculty of Science, University of Kelaniya, Sri Lanka. p 75.
Abstract:	The internet has become one of the most widespread platforms for information exchange and retrieval as the number of news websites is increasing rapidly. During the last decade, most of the major newspapers have developed web sites providing news and other information. In addition, web-only newspapers have also appeared. News aggregator is a good substitute for news sites like BBC news. Because news aggregators can index not just the content of the BBC news but all other news sites, giving it a huge advantage in coverage. On the other hand, news aggregators may complement online news sites. Because news consumers incur costs (time and effort) in searching for news that are important to them and also they will compare the expected benefit from visiting a news site to the expected search cost, where that cost includes becoming aware of the existence of the site and finding how to navigate it. There are few news aggregators like Google News, News Look Up, Fark which provide news aggregation facilities, but they are proprietary and there are privacy concerns about the user along with the biasness of these aggregators. In order to benefit more from the available information, the objective of the research is to develop a technical framework, gathering online news and approach to recognize most important latest news and display the recognized news items that society is interested in without any bias. Presenting crawled news items in a way that it displays the trending topics in society will increase the awareness of the reader. In order to do that news classification and ranking is a needed. News items for the framework will be gathered through (RSS) feeds. Gathered news feeds will be stored and will be preprocessed. Keywords will be extracted from an algorithm that can be worked with any language that has basic Morphological tools for language processing. Category classification of the news items will be done using a method that is based on the keyword extraction algorithm. Topic detection and classification of the news items will be done to the category classified news items using an algorithm that requires no corpus for statistics or training data. The ranking of the news article, topic and source will be done using an approach which is based on the virtual graph model. In the ranking process, similarity between articles are calculated manually and it will be automated using the cosine similarity.
URI:	http://repository.kln.ac.lk/handle/123456789/15729
ISBN:	978-955-704-008-0
Appears in Collections:	IRSPAS 2016

Files in This Item:

File	Description	Size	Format
75.pdf		218.64 kB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets