Technical Details

01.


Based on the concept of Repositories

These act like folders and can be used to manage access to particular types of information.

Documents are ingested into the repositories and can be enriched (by adding category, sentiment or coordinates) or have information extracted during this process (such as the names of people, places and organisations).  The software then enables all repositories to be searched at the same time.

02.


Can process the most common document formats

Including Word, Excel, Powerpoint, plain text and PDF.

It will consume data derived from the web or email, in the form of html/mhtml. It will also enable multiple records to be created from one document, where that document uses CSV or XML formats.

03.


Cloud-based and simple to use

The software can be set up without complex technical expertise.  It is provided on Amazon AWS as a cloud-based application, but your system will be dedicated to your company‚Äôs use only.

It is priced based upon the size of the server you need.  This means you can use it for small scale applications without a large cost overhead.

04.


Enhanced search and API functionality

The software provides traditional free text searching and includes full boolean operators (e.g. And/OR) plus boost and fuzzy algorithms if one term is more important or misspelled.

It also uses unsupervised machine learning to create hyper-dimensional vector space models for each word and document.  This enables repositories to be searched using vector terms and documents.

All search functionality is available via API.

05.


Provides tools to capture external data and hold in repositories

These tools include multi-level web mining to extract content from websites.  Algorithms enable keywords or phrases to be generated based on the extracted content.

The mined data can be enriched through the ingestion process.

It will also consume RSS feeds and enable any tags included with the feeds to be used in faceted search.

06.


Searches across repositories concurrently

The software searches all repositories concurrently.  The number of search results are listed by repository, enabling the user to explore the results from each repository separately.  Search can be limited to  specific repositories or fields.

Search results can be faceted, clustered and searched by cluster.

07.


Permissions-based

User access to the information in each repository is governed by permissions.  Users can only search repositories they have permission to access.

These permissions also govern whether users can view, add or delete documents within those repositories.

08.


Provides classification functionality

The software provides simple classification functionality, enabling the user to classify a set of documents based upon a training set held in a separate repository.

This can be useful for auto-tagging documents to improve the search process.

09.


Connects to Graph Database

The software uses natural language processing (a form of Machine Learning) to extract “subject verb object” triples from the text. 

The software can export these triples and other entity data to a Neo4j graph database. 

Extractions can be the whole of a repository, selected documents or the results of a specific query.

Symilarity’s software products can be used to solve many business problems.

Contact us today for a free trial or to discuss your requirements.