Findings of 007 Agent in Textual Analysis

According to Deloitte forecasts, 80 of the top 100 largest developers in the world will use cognitive technologies (text analysis of natural language, speech recognition, neural networks, etc.) already in 2019, which is 50% more than in 2018.
What if your company is not in the top 100? And what if you do not understand anything about text analytics and Big Data technologies? Communications in the Big Data market so far is more like a huge secret, where you need to be 007 agent to figure out who plays what role. Almost all market players talk about intelligent data analysis systems, convenient visualization, cloud solutions, machine learning, language definition, etc.

And what are the differences? What questions should you ask yourself first if you intend to implement text analytics Big Data and if you do not have a technical background? How exactly can these technologies be applied to your business? Let’s figure it out.

What tasks does text analytics solve using Big Data ?

The concept of “text analytics” is not as popular as the phrase “Big Data”. However, it is in the format of unstructured text that about 80% of all accumulated information is presented, according to a report by International Data Corporation. What to do with it and how?

Dmitry Torshin, IT Director for Investment and Vice President of Aplana, is sure:“The use of modern technologies based on text analytics is one of the most important tasks that heads of developing companies in Russia should set themselves. Their colleagues from developed countries have already done this, and we all are already using it, not even always realizing it. An virtual secretary has already been created to coordinate meetings (being a program, but it does not give out anything except a mail address – it perfectly answers questions and suggestions of people in correspondence). The App in the Air Chat on Facebook Messenger gives me the opportunity to learn in simple language what I can take with me on a plane in a particular country and what cannot, and find the flight I need. And the latest version of Apple’s desktop operating system, macOS, which came out just the other day, contains Siri and a search that lets you ask your computer to find “documents, which Petya sent me last week. ” People instantly get used to it, and if tomorrow your business is not able to communicate in human language with a client as well, then it will be seriously squeezed out with competitors ”.

Nevertheless, the most common cases of the applicability of text analytics can be found in the advertising market, in the banking industry, as well as in online retail (where this trend is only emerging, but the benefits of using it are already obvious).

What tasks can be solved by text analytics of unstructured data in advertising and customer service:

– Compilation of brand loyalty ratings,
– Increase in CTR by increasing the effectiveness of native advertising (matching content of the placed advertising),
– Content analysis (tagging and classification) to create the next sub-product or adjust the current one,
– Implementation of text analysis technologies in chat rooms for community management,
– Automatic identification of various kinds of entities and frequency analysis of words,
– Control of tonality of brand references as an indicator of the company’s health,
– Detection of trends at the time of their inception,
– Improving the effectiveness of loyalty programs (by monitoring not only the public space, but also the analytics of text data of chats, messages of call centers, email messages).

Banks are perhaps the leader in applying analytics to unstructured Big Data. Says Sergey Dobridnyuk, Director of Research and Innovation, Diasoft Systems, who are actively studying the banking sector:“My opinion is that trying to structure everything is a dead end. Up to 80% of daily information “digitized” by humanity is contained in an unstructured form. And the reason here is the complexity of both data and classifier systems. For example, to classify sales receipts for PFM systems, you will need to create a classifier with at least 1.5 million SKU headings. This is an unrealistically large dictionary in which it is easy to make a mistake: the great Pushkin had a vocabulary of about 30 thousand words. And IT successfully fights this complexity – there are hundreds of data management systems (DBMS) in NoSQL technologies – for which unstructured data is their native element. Algorithms are being greatly improved – for example, multilayer neural networks, Bayesian neural networks find connections and process texts, speech, images thousands of times faster than 10 years ago. Very high-quality and open source free software libraries have appeared – which make these technologies available to all comers. A breakthrough technology today is Machine Learning, when causal relationships are established by a computer based on statistical analysis, and even a person cannot explain logic – preferring to consider it an unknowable “black box”. All this is important in order to offer the client comprehensive services based on behavioral models, collected customer experience (CX). And the quality of the offer is constantly improving due to continuous monitoring of the client, all his “digital traces” in a structured and unstructured form. The intrigue also lies in the fact that this can be done not only by banks – but also by retailers, telecom operators, suppliers of services and goods, already knowing the client and offering him financial services no worse than the “average bank”. Therefore, classical banking is today in a zone of deep turbulence and a rethinking of its activities. ”

And the director of IBS Data Lab, Sergey Zablodsky, does not doubt the “real” analytics of unstructured data: “The question of the applicability of BigData analytics in business solutions today is no longer there. Rather, there is the task of doing this effectively. And for examples of effective solutions you do not need to go far – look at Uber, Airbnb, Netflix, Walmart. And these are only those names that are heard. All of them actively and successfully use BigData analytics in their business solutions, and for some, the entire business is based on BigData analytics. For example, the likelihood of commercial success of the series produced by Netflix reaches 70%, while the average market probability is only 35%. ”

Where to start and what is important to know for implementing text analytics ?

The most well-known companies in the field of Big Data and linguistics – mainly due to loud cases and the presence of visualization (interface) – have become social media and media monitoring companies (Brand Analytics, Brandwatch, Radian 6, Cribrum, etc.). However, the text analytics industry is not limited to this, but on the contrary, it becomes extremely difficult to understand the differences between the proposed solutions.

First of all, when choosing a text analytics solution, you should think for yourself what characteristics are important to you (provided that you have already decided that you will use text analysis technologies to solve a specific business problem).

Answer the questions below:

1. Do you really have big data or not? Are there really a lot of unstructured data among them?
2. What is more important for you: depth of analysis or speed?
3. Texts in what languages ​​do you need to analyze? Each solution on the market has its own technical features of text analysis and language definition. All international corporate machine learning solutions work perfectly with the English language, but in the case of Russian there are many problems. Rich and powerful, so to speak!
4. Are you ready to export data?
5. In general, do you want a technology or a finished highly specialized product?
6. Do you need data collection or just text analysis of big data, or both?
7. Is it fundamentally a solution to your internal circuit or a cloud-based solution through the REST API?
8. Do you have the resources to visualize the analyzed data?
9. Which of the main areas of text analytics do you need: search (information search methods) or descriptive / predictive analytics (text mining and tonality determination)?
10. Do you need to extract commercially useful knowledge from text online?
11. Do you have professionals in the team who are able to correctly interpret the result of the analysis, introduce the technology, create a product, or do you expect this from the technology supplier?
12. Finally, what budget are you willing to invest in such decisions? It must be understood that the maximum benefit from the analysis of big data can be obtained with a long-term analysis (that is, evaluate the results of analysis in time), and this is a subscription model, and not a one-time project.

Who has what method?

So, you were more or less able to answer the questions listed above. The next step is choosing a partner. Unstructured information analysis solutions can be conditionally divided into 3 types:

– Finished products based on text analytics technologies: not for a mass audience, and therefore quite expensive and “tailored” for a specific segment of B2B clients.
– Point solutions-products at the junction of text analytics and big data for the mass-market segment, if I may say so in B2B: simpler to implement, designed for different B2B segments.
– Modular text analytics technologies: perhaps the most flexible in implementation, suitable for a wide range of tasks – such a cube in the Lego text analytics constructor for business.

The first group includes solutions really from the field of artificial intelligence, which can perform not only the tasks of text analytics, but also in general, provide cognitive services and their mix. For example, IBM Watson, officially launched in 2007, operates big data regardless of the type and format of data, has the ability to self-learn, and is suitable for quickly finding answers to questions. On their website they provide a demo for subscription.

Both startups and very targeted products of well-known corporations fall into the second category. For example, in the summer of 2016, ABBYY announced the launch of Findo, a search assistant for mail messages, files and documents in the clouds. And in 2014, ABBYY launched Compreno – an intelligent search and identification of “essence” in texts. Of the non-corporate solutions on the market, there are innovative companies / startups such as Textocat (also offering smart search) and the product “chat bot”. SAS also released two key solutions for text mining and tonality analysis: SAS Text Miner and SAS Sentiment Analysis.

Among modular technologies, players such as Yandex Data Factory and EurekaEngine are actively present on the market. Both help companies make commercial use of the accumulated data: create end services in existing business processes of companies instead of implementing software and visualizations. YDF uses corporate experience and machine learning technologies, EurekaEngine uses high-speed text analytics, especially for the Russian-speaking space, because the company has its roots in Russia (which, by the way, is used by Brand Analytics, one of the leaders in the market for social media and media monitoring services, which took 1st place by quality among social media monitoring systems in the TECH INDEX 2016 ranking by AdIndex).

Advertisers, especially DMP systems and advertising auditors, also have their own developments in the field of text analysis, but they are mainly used for their own internal tasks: segmentation, more targeted targeting, semantic comparisons of audiences (for example, audiences of mobile applications), etc. d. As you know, the devil is in the details: almost everyone has problems with the accuracy of the analysis and the inability to separate the advertising content of the text block from the text of the article in the media, as well as the further output of the product.


What does client think about the usefulness of text analytics in solving business problems? Ivan Tretyakov, managing partner of the Association A.R.Z.A.M.A. and POSonline service:

“In an era of growing consumption, as well as demand for the quality of services, business (in particular the banking and retail segments) began to look more deeply at the root of how to be closer to the client and make him more loyal. Big Data tools – analytics and, in particular, analysis of text arrays already today show amazing results: you can adjust your service based on the feedback of people in the media, chats, forums; You can offer people interesting promotions / discounts by studying their factors of demand and interest in specific product groups / brands; you can expand your own list of services offered or lower the loan rate by studying the behavior of your customers in the Internet.

Text analytics can be applicable not only for a business concentrated in the Internet space, but also for offline players, for example: by analyzing the behavior / user reviews on the Internet for certain goods / services and armed with geolocation services to work with a potential audience – you can offer them interesting Products / services / solutions are already in offline space. For example: courses, travels, workshops, etc. And thanks to the availability of ready-made SAAS services for text analytics, the business will receive a strong tool to grow its profits and increase satisfied customers. ”

Amram David

Senior Contributor at DFI Club
Amram is a technical analyst and partner at DFI Club Research, a high-tech research and advisory firm .He has over 10 years of technical and business experience with leading high-tech companies including Huawei,Nokia,Ericsson on ICT, Semiconductor, Microelectronics Systems and embedded systems.Amram focuses on the business critical points where new technologies drive innovations.
Amram David

Latest posts by Amram David (see all)