Saturday, August 2, 2008

Development of an auto-summarization tool

by Your Name 0 comments

Tag


Share this post:
Design Float
StumbleUpon
Reddit

Abstract:
Auto-summarization is a technique used to generate summaries of electronic documents. This has some applications like summarizing the search-engine results, providing briefs of big documents that do not have an abstract etc. There are two categories of summarizers, linguistic and statistical. Linguistic summarizers use knowledge about the languange (syntax/semantics/usage etc) to summarize a document. Statistical ones operate by finding the important sentences using statistical methods (like frequency of a particular word etc). Statistical summarizers normally do not use any linguistic information.

In this project, an auto-summarization tool is developed using statistical techniques. The techniques involve finding the frequency of words, scoring the sentences, ranking the sentences etc. The summary is obtained by selecting a particular number of sentences (specified by the user) from the top of the list. It operates on a single document (but can be made to work on multiple documents by choosing proper algorithms for integration) and provides a summary of the document. The size of the summary can be specified by the user when invoking the tool. Pre-processing interfaces are there to handle the following document types: Plain Text, HTML, Word Document

Comments 0 comments

Subscribe feeds via e-mail
Subscribe in your preferred RSS reader

Subscribe feeds rss Recent Entries

Advertise on this site Sponsored links

Categories

Subscribe feeds rss Recent Comments

Technorati

Technorati
My authority on technorati
Add this blog to your faves