Sunday, November 2, 2008

Grammar Search in Unstructured Information Management Architecture (UIMA)

by Your Name 0 comments

Tag


Share this post:
Design Float
StumbleUpon
Reddit

 Abstract - This report gives an overview of our approach to build an application using Unstructured Information Management Architecture (UIMA). This application is designed to search from unstructured data based on the grammar specified. Our application solely based on unstructured data rather than structured data. To get an insight in grammar search we examined GSearch tool, which searches from text corpora according to syntactic criteria. UIMA is an open framework for building analytic applications, we intend to build our module using its tools.

INTRODUCTION

UIMA

UIMA is an open framework for building analytic applications - to find latent meaning, relationships and relevant facts hidden in unstructured text. UIMA defines a common, standard interface that enables text analytics components from multiple vendors to work together. It provides tools for either creating new inter operable text analytics modules or enabling existing text analytics investments to operate within the framework. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing, Information Retrieval, machine learning. The UIMA framework provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which they can build and deploy UIM applications. The framework is not specific to any IDE or platform. The fundamental focus of UIMA is unstructured data, rather than structured data.

UIMA is an architecture in which basic building blocks called Analysis Engines (AEs) are composed in order to analyze a document. At the heart of AEs are the analysis algorithms that do all the work to analyze documents and record analysis results. These algorithms are packaged within components that are called Annotators. AEs are the stackable containers for annotators and other analysis engines.

How Annotators represent and share their results is an important part of the UIMA architecture. To enable composition and reuse, UIMA defines a Common Analysis Structure (CAS) precisely for these purposes. The CAS is an object-based container that manages and stores typed objects having properties and values. Object types may be related to each other in a single-inheritance hierarchy. Annotators are given a CAS having the subject of analysis, in addition to any previously created objects (from annotators earlier in the pipeline), and they add their own objects to the CAS. The CAS serves as a common data object, shared among the annotators that are assembled for an application.

Many UIM applications analyze entire collections of documents. UIMA supports this analysis through its Collection Processing Architecture. This part of the architecture allows specification of a "source-to-sink" flow by reading the data from the source, processing it, and storing the results in a data sink of our choice.

Comments 0 comments

Subscribe feeds via e-mail
Subscribe in your preferred RSS reader

Subscribe feeds rss Recent Entries

Advertise on this site Sponsored links

Categories

Subscribe feeds rss Recent Comments

Technorati

Technorati
My authority on technorati
Add this blog to your faves