Intelligent Data Classification
Many organisations are trying to extract key data from large volumes of documents. There are many use cases where this is a common problem:
- Extracting data from insurance policy documents, loan agreements or contracts
- Knowledge Extraction from research papers
- Extracting information from candidate profiles in recruitment or HR
- Categorising content so it can be routed to the most appropriate person or group
Symilarity’s Intelligent data Classification System makes this task straightforward and just requires training data specific to the task to be accomplished. The training data is supplied in a standard excel sheet.
How can Symilarity help?
The system provides a two-stage architecture, although stage 1 can operate standalone:
Stage 1
Training data, supplied in a simple excel spreadsheet, is used to build a model for the classification process. Symilarity can provide three types of models and where multiple document types need to be processed, the system can support individual models for each document type.
The system classifies each document in small segments. Rules can be set up that extract data based upon the labelled text.
Document submission, the labelled output and extractions are delivered using API (Applications Programming Interface) technology enabling it to be integrated with other processes.
Stage 2
Symilarity provides a secondary step that can be used in conjunction with Stage 1. This enables a commentary to be added to the output based upon the labelled text.
Comments are output alongside the original document, with markers to ensure the relationship between comments and original text.
Comments are able to be colour coded, to differentiate commentary of differing types.