Organizations have access to increasing amounts of information. Some are generated within the organization by its people and processes and some are created outside. Most of this information is not stored in neat databases but instead is unstructured and sits in documents, spreadsheets, presentations, log files, etc., and cannot easily be searched or used in any analytics process. This is sometimes referred to as “Dark Data”.
We have all become used to having very powerful search tools for online content but very few of us get that experience when looking at our internal documents. That usually stems from the lack of good search tools and holding our data in disparate places within the organization. We can waste nearly 20% of our day looking for documents (source: McKinsey 2012-The social economy: Unlocking value and productivity through social technologies).
Symilarity’s Helix Insight Engine can help address these issues by providing AI/Machine Learning powered search and enrichment tools and a simple set of repositories to hold data that needs to be available for search. It has a wide range of uses, from providing improved document search, matching tender opportunities to potential bidders, automated classification of businesses based on their website content, and researching large quantities of academic papers.
How do we do that?
Upload.
We upload the documents and ingest them, creating indexes that enable them to be searchable, even if there are millions of them.
Documents can be added by users or in an automated batch process.
We support most common file formats including plain text, Word, Excel, Powerpoint, PDF, XML, RSS, Mht and Html.
Extract & Enrich.
We process the data using natural language processing (a machine learning technology). We extract information such as place names, or references to people or organisations. We can also add information, such as the latitude and longitude coordinates of the place names found.
We can also use IBM’s Watson Natural Language Understanding (NLU) services to add topics and sentiment to the data.
Learn.
Vector Space Modelling is used to analyse the words contained in each document to learn how those words are related to each other. This “unsupervised machine learning” process means that search terms also embody the context of the related words. This ensures that the search results are more related to the search term than if the results were just based on exact matches.
Search.
We provide powerful search technology, in a simple to use format, enabling you to search across all your data.
We can search across all your repositories at the same time (sometimes called Federated Search).
We can also search for related data based on the latitude and longitude coordinates of the place names contained in the document.