#NLPAnthology Archives

Key Concept Extraction: Intelligent Audio Transcript Analytics Extracting Key Phrases for Scaling Industrial NLP Applications

The COVID‐19 pandemic that hit us last year brought a massive cultural shift, causing millions of people across the world to switch to remote work environments overnight and use various collaboration tools and business applications to overcome communication barriers.

However, this generates humongous amounts of data in audio format. Converting this data to text format provides a massive opportunity for businesses to distill meaningful insights.

One of the essential steps for an in-depth analysis of voice data is ‘Key Concept Extraction,’ which determines the business calls’ main topics. Once the identification is accurately completed, it leads to many downstream applications.

One way to extract key concepts is to use Topic Modelling, which is an unsupervised machine learning technique that clusters words into topics by detecting patterns and recurring words. However, it cannot guarantee precise results and may present many transcription errors when converting audio to text.

Let’s glance at the existing toolkits that can be used for topic modelling.

Some Selected Topic Modelling (TM) Toolkits

Stanford TMT : It is designed to help social scientists or researchers analyze massive datasets with a significant textual component and monitor word usage.

VISTopic : It is a hierarchical visual analytics system for analyzing extensive text collections using hierarchical latent tree models.

MALLET : It is a Java-based package that includes sophisticated tools for document classification, NLP, TM, information extraction, and clustering for analyzing large amounts of unlabelled text.

FiveFilters : It is a free software solution that builds a list of the most relevant terms from any given text in JSON format.

Gensim : It is an open-source TM toolkit implemented in Python that leverages unstructured digital texts, data streams, and incremental algorithms to extract semantic topics from documents automatically.

Anteelo’s AI Center of Excellence (AI CoE)

Our AI CoE team has developed a custom solution for key concept extraction that addresses the challenges we discussed above. The whole pipeline can be broken down into four stages, which follow the “high recall to high precision” system design using a combination of rules and state-of-the-art language models like BERT.

Pipeline:

1) Phrase extraction : The pipeline starts with basic text pre-processing, eliminating redundancies, lowercasing texts, and so on. Next, use specific rules to extract meaningful phrases from the texts.

2) Noise removal: This stage of the pipeline uses the above-extracted phrases to remove noisy phrases based on signals mentioned below:

Named Entity Recognition (NER): Certain NER such as quantity, time, and location type that are most likely to be noise for the given task are dropped from the set of phrases.
Stop-words: Dynamically generated list of stop words and phrases obtained from casual talk removal [refer to the first blog of the series for details regarding casual talk removal (CTR) module] are used to identify noisy phrases.
IDF: IDF values of phrases are used to remove common recurring phrases, which are part of the usual greetings in an audio call.

3) Phrase normalization: After removing the noise, the pipeline proceeds to combine semantically and syntactically similar phrases. To learn phrase embedding, the module uses state-of-the-art BERT language model and domain trained word embeddings. For example, “Price Efficiency Across Enterprise” and “Business-Venture Cost Optimization” will be clubbed together by this pipeline as they essentially mean the same.

4) Phrase ranking: This is the last and final stage of the pipeline, which ranks the final set of phrases using various metadata such as frequency, number of similar phrases, and linguistic POS patterns. These metadata signals are not comprehensive, and other signals may be added based on any additional data present.

Intelligent Audio Transcript Analytics: The Next Big Thing for Scaling Industrial NLP Applications.

Over the last few years, Natural Language Processing (NLP) has made significant strides in mastering language models to understand the nuances of different languages, dialects, and voices. NLP has unlocked countless new possibilities, and the market has corroborated this with a high rate of adoption. However, many in the industry, across the Fortune 500, are still skeptical about implementing NLP-based tools to derive value from texts; instead, they rely on their experts to do this manually, resulting in low efficiency, inconsistency, and challenges at scale.

Tredence is helping enterprises address these challenges through our AI CoE. We’ve recently partnered with a Fortune 100 Research and Advisory client to solve several challenging NLP problems using audio transcripts for conversations between the client’s analysts and their customers (usually Business Directors and above). These problems have potential use cases, including Research Guidance, Evolving Categorizations, Automated Reports, and Process Automation.

This is the first in a 4-part blog series that will discuss the overview of the problem and motivation behind the solution along with some challenges faced during the solution’s development.

The Problem: Rising Metadata and Lack of Actionable Insights

With companies using a host of call monitoring and recording applications, a large amount of unstructured call data gets generated every day. But the inherent resource constraints of a manual approach fail to provide valuable insights.

NLP solutions can play a vital role in mining the call data and categorizing and providing actionable insights. For example, it can be applied on call transcripts to quickly extract key topics covered with little or no human input. Further, using the solution to understand call transcripts can improve workplace efficiency, reduce human capital costs and improve training and feedback for employees. It can also help in identifying business problems algorithmically, making it easier for the organization to deploy resources in an evidence-based manner.

We have built an NLP-enabled Audio Transcript Analytics Solution that helps systematically understand the business calls by using three key components:

Key Concepts Identification
Natural Language Intent Extraction
Multi-label Document Tagging

We will discuss each component in detail in the next three blogs of this series.

Our solution has been successfully applied to many Fortune industrial 500 clients’ various transcription needs in multiple domains.

The tools can be combined to form a full-spectrum Natural Language Understanding and Processing System that’s customized for new domains relatively easily.

Data & Present Framework

Roughly 100,000 analyst-client calls, lasting between 30-40 minutes, take place every year. Before our solution was deployed, the domain experts had to analyze and extract the key elements of each call transcript.

Before we discuss the critical components used by our Audio Transcript Analytics Solution, let’s glance at some of the challenges.

Challenges

Ambiguity is inherent to human language. Hence, the speech-to-text converted data poses many problems for NLP systems like transcription errors – incorrect words, spelling errors, and incorrect sentence segmentation.
The lack of speaker text segregation hinders the application of NLP algorithms in client spoken segment.
Off-topic conversations or casual talks also impact the algorithm’s effectiveness significantly. Hence, to address this issue, we’ve developed a Casual Talk Removal method in which we considered the causal talk identification as a sentence classification problem using:

A supervised approach: We trained an ensemble model for nearly 10,000 sentences on the quantitative features derived from each sentence, such as the sentence’s position and count of tokens, stop words, entities, person names, geographic location. We observed that the sentence’s position is the most important feature since the transcripts have a high density of casual talk in the beginning. This approach performed well in the classification of sentences present at the beginning of the call transcript.

However, this approach had two significant limitations:

It required a sizeable labeled corpus to train the model.
Poor classification accuracy in later sections of the transcript.

To overcome these limitations, we developed an unsupervised method to classify casual talk sentences.

An unsupervised approach: Some information such as people names, geographical location names, and certain stop words were removed from the sentences. We used part-of-speech (POS) tags such as Noun and Proper Nouns, and IDF values at the sentence level to classify casual talks.

Hope you liked our approach to call data analysis and framework for removing ambiguity and casual talk from call transcripts and perform meaningful analysis.

Tag: #NLPAnthology

Key Concept Extraction from NLP Anthology (Part 2)

Intelligent Audio Transcript Analytics – NLP Anthology (Part 1)

Delivering excellence, collaborating across time zones.

Take a look at our global hideouts.

Contact

India (HQ)

Atlanta, USA

London, UK

Dubai, UAE

Melbourne, Australia

Surabaya, Indonesia

India (HQ)

Atlanta

London

Dubai

Australia

Indonesia

Delivering excellence, collaborating across time zones.

Take a look at our global hideouts.​

Contact

India (HQ)

Atlanta, USA

London, UK

Dubai, UAE

Melbourne, Australia

Surabaya, Indonesia

India (HQ)

Atlanta

London

Dubai

Australia

Indonesia

Take a look at our global hideouts.