Saudi Journal of Engineering and Technology (SJEAT)
Volume-3 | Issue-06 | 410-419
Original Research Article
Ontology Based Automatic Text Mining Using TF and IDF Algorithms for Summarization of Multiple Files
Chinmayee C, Dr. S Meenakshi Sundaram, Keerthana N S, Manikya S, Nitya Hegde M
Published : June 30, 2018
Abstract
In the present world, due to tremendous development in technology, a huge
amount of information is available everywhere. Therefore, it is difficult for the users to
understand the main content of the entire document as it takes a lot of time. In this work
we use extractive text summarization which uses a method to give the version of
summary for one or more file or document. Here we give an approach that maps
sentences to nodes of a hierarchical ontology. Ontology explains what exists in a
particular domain. For the ontology creation, vocabularies are collected. It is used as
background knowledge and helps to find the related meaning of the terms which occur in
the source documents. Text mining is the technique from which high quality information
is derived from text. Clustering is a significant task. The clustering method groups similar
or related terms into a single group. In the first stage, data collection takes place. The preprocessing stage includes stemming and stop words removal.TF-IDF process occurs after
which clustering takes place. In the ontology creation, first the determination of the main
sub topics of the article of interest is done. We classify sentences to nodes which have a
predefined hierarchical ontology. Each ontology node has bag-of-words from a web
search. We represent sentences by sub trees that permit to apply measures of similarity
and find relations between sentences. The ontology used in this work is not domainspecific; it does not require labelled data. this work can be extended to topics focused on
summarization framework to news articles or blogs and to also to various machine
learning approaches