+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)


Artificial Intelligence in Real Time Communications

  • ID: 4655697
  • Report
  • August 2018
  • Region: Global
  • 147 Pages
  • Kranky Geek Research
1 of 5
Research Highlights AI and Machine Learning Opportunity and Challenges in Speech and Video Conferencing Technologies


  • 2Hz
  • batvoice
  • Dolby
  • Google
  • Mitel
  • SightCorp
  • MORE

This study examines the role of Artificial Intelligence (AI) and Machine Learning in Real Time Communications (RTC). Advances in consumer-oriented AI technologies are now finding new applications and use cases as these capabilities become democratized. The communications industry, which was once at the forefront of many of these technologies, is now presented with a plethora of new options for improving existing applications, finding new cost advantages, and redefining existing communications modalities.

This study examines the use of machine learning and AI technologies in 4 distinct domains:

  • Speech analytics - extracting transcription and paralinguistic information for speech to provide insights;
  • Voicebots - the role of conversational AI as a feature and enabler for applications such as Interactive Voice Response (IVR);
  • Computer vision (CV) - use of advanced vision processing algorithms on video calling streams;
  • RTC optimization - using machine learning to optimize end-to-end network architecture and lower-level VoIP protocols.

The report authors are active industry participants helping companies enhance and build products with years of experience in technical, product management, and consulting roles applying new technologies with recent practical work with speech analytics, computer vision, voice bots, and performance algorithms. The research was compiled from more than 40 in-depth 1-on-1 interviews with key industry technology stakeholders, a web survey with nearly 100 distinct company respondents, and detailed product reviews of leading machine learning-based products, and AI in RTC solutions.

The research is designed to help product, strategy, and business development decision makers communications service providers, technology vendors, communications-centric app providers, and enterprise information technology organizations.

Note: Product cover images may vary from those shown
2 of 5


  • 2Hz
  • batvoice
  • Dolby
  • Google
  • Mitel
  • SightCorp
  • MORE

1 Executive Summary

  • Use Cases
  • Market Dynamics
  • Drivers
  • Inhibitors
  • Market Landscape
  • Recommendations

2 Scope & Methodology

  • Scope
  • Real Time Communications Applications
  • Artificial Intelligence Technologies
  • Research Methodology
  • Expertise
  • Chad Hart
  • Tsahi Levent-Levi

3 Machine Learning Overview

  • Introduction
  • AI and ML
  • Learning Approaches
  • Supervised
  • Unsupervised
  • Deep Learning
  • Data Flow in Machine Learning
  • Product Aspects of Machine Learning
  • Limitations
  • AI in RTC
  • Edge versus Client
  • Voice and Video
  • What Next?

4 Speech Analytics

  • Introduction
  • Use Cases
  • Speech Analytics Technology Stack
  • Media Server
  • Speech-to-Text Engine
  • NLU Engine
  • Analytics Applications
  • Market Dynamics
  • Drivers
  • Deep Learning
  • Open Source
  • Start-Ups
  • Cloud Platforms Driving STT Commoditization
  • Inhibitors
  • Language Support
  • Legacy System Recording Quality
  • Custom Vocabularies
  • Compliance
  • Using the Data
  • Emerging Features and Trends
  • Paralinguistics
  • Speaker Separation
  • Speaker Recognition
  • Real Time
  • Deeper Analytics
  • Edge Transcription
  • Market Landscape
  • Stakeholder Groups
  • Vendor Groups
  • Selection Criteria
  • Media Server
  • Speech-to-Text (STT) Engine
  • NLU Engine
  • Analytics Applications
  • Recommendations
  • Dealing with Legacy Telephony Environments
  • Be Careful with Word Error Rates
  • SIPREC Recording in Multi-Vendor Environments
  • Speech Engine Build vs. Buy

5 Voicebots

  • Introduction
  • Use Cases
  • Inbound Interactive Voice Response (IVR)
  • Outbound IVR
  • Agent Assistant
  • User Assistant
  • Conference Call Assistant
  • Smart Conference Room Devices
  • Voicebot Technology Stack
  • RTC-Bot Gateway
  • Wake Word Detector
  • Speech-to-Text Engine (STT)
  • Bot Engine
  • Natural Language Understanding (NLU)
  • Text-to-Speech (TTS)
  • Bot Application
  • Market Dynamics
  • Drivers
  • Chatbots
  • Consumer Voice Market
  • Speech Technology Improvements
  • Call Center Economics
  • Inhibitors
  • Telephony Integration
  • Language Support
  • Voice User Interface (VUI) Expertise
  • Emerging Features and Trends
  • Knowledge Extraction
  • API Simplification
  • Graphical Development
  • Speaker Recognition
  • Market Landscape
  • Major Vendor Groups
  • Implementation Approaches and Implications
  • Selection Criteria
  • RTC-Bot Gateway
  • Wake Word Detector
  • Speech-to-Text
  • Text-to-Speech
  • Bot Engine
  • Recommendations
  • Cloud Implementations for Agility
  • Replace Outdated IVR Technology
  • Many VoIP Devices Could be Smart Devices
  • Owning the Voicebot Technology Stack

6 Computer Vision

  • Introduction
  • Computer Vision Technology Stack
  • Image Data
  • From Image to Video
  • Train
  • Optimize
  • Inference
  • Cloud Inference
  • Edge Inference
  • Use Cases
  • Silly Hats
  • Image Enhancement
  • Improving Colors
  • Replacing the Background
  • Improving a Participant’s Looks
  • Face Recognition
  • Face Tracking
  • Automatic Zoom
  • Head Counting
  • Assist with Speaker Diarization
  • Emotion Detection
  • Body and Gesture Tracking
  • Not Safe for Work (NSFW)
  • Image Classification and Object Detection
  • Whiteboard Detection
  • AR, IoT, and Healthcare
  • Optical Character Recognition (OCR)
  • Market Dynamics
  • Drivers
  • Deep Learning
  • Open Source Projects
  • Cloud APIs
  • iOS and Android Edge Inference
  • Inhibitors
  • Video Algorithms
  • Real Time Processing
  • Cloud Cost
  • Inference on Edge Devices
  • Big Brother Concerns
  • Market Landscape
  • Selection Criteria
  • Recommendations
  • Start from the Business Value
  • Focus on Quick Wins
  • Decide on Cloud vs. Edge Inference
  • Owning the Data

7 RTC Quality and Cost Optimization

  • Introduction
  • Technology Overview
  • Network Level Optimization
  • The Media Processing Pipeline
  • Capture
  • Encode
  • Send
  • Receive
  • Decode
  • Play
  • Market Dynamics
  • Drivers
  • AI Adoption
  • Media Quality
  • Inhibitors
  • Existing Heuristic Algorithms
  • ROI Calculation
  • Edge Inference
  • Market Trends
  • Edge Inference
  • In-House Implementations
  • Selection Criteria
  • Recommendations
  • Differentiate on Quality
  • RTC Quality Build vs. Buy
  • Pick Algorithms to Focus On
  • Noise Suppression
  • Bandwidth Estimation
  • Packet Loss Concealment

8 RTC Survey Results

  • Demographics
  • AI Adoption Challenges
  • AI Adoption Drivers
  • AI Initiatives
  • AI Technology Use
  • Inference Locations
  • Open Source vs. Commercial
  • Top Vendors
  • Machine Learning Tools and Frameworks
  • Speech Analytics
  • Voicebots
  • Computer Vision

List of Figures
Figure 1: Top AI in RTC drivers and inhibitors web survey results
Figure 2: Example of hotdog not hotdog app made by HBO's Silicon Valley
Figure 3: Twitter trending topics for San Francisco region on May 26, 2018
Figure 4: Neural network architecture in deep learning
Figure 5: Communications versus machine learning architectures
Figure 6: Typical layers of a speech analytics application
Figure 7: Speech analytics dashboard application example
Figure 8: Google data showing transcription accuracy improvement
Figure 9: Speech analytics stakeholder groups and vendor types
Figure 10: Possible SIPREC configuration
Figure 11: Voicebot technology stack for a telephony system
Figure 12: Visualization of a speech waveform
Figure 13: Consumer voicebot market; notable milestones
Figure 14: Results from a January 2018 consumer survey of smart speaker users
Figure 15: Marks & Spencer used voicebot technology to replace DTMF IVRs
Figure 16: Cloud API interaction options
Figure 17: Voicebot implementation approaches
Figure 18: Cisco's Spark Assistant is a voicebot for its video conferencing hardware
Figure 19: Computer vision model development and deployment process
Figure 20: Computer vision cloud APIs receiving and decoding the media stream in parallel to the actual service
Figure 21: Computer vision making use of the edge device encoder to reduce processing requirements
Figure 22: Cloud inference in computer vision
Figure 23: Edge inference in computer vision
Figure 24: Image enhancements using Facebook AR Studio
Figure 25: Improving a participant’s look
Figure 26: The media processing pipeline
Figure 27: Survey respondents segmented by company type
Figure 28: AI adoption challenges
Figure 29: Primary drivers of AI technology adoption for communications applications
Figure 30: AI strategy within company
Figure 31: AI technology used or provided by respondents
Figure 32: Locations where ML inference is run
Figure 33: Preference to leverage open source vs. commercial preferences
Figure 34: Top named machine learning tools and frameworks
Figure 35: Top named speech Analytics solutions
Figure 36: Top named voicebot platforms
Figure 37: Top named computer vision tools and platforms

List of Tables
Table 1: Summary of AI in RTC use cases by domain
Table 2: Summary of AI in RTC drivers
Table 3: Summary of RTC in AI inhibitors
Table 4: Supervised vs. unsupervised learning
Table 5: Speech analytics use case examples
Table 6: Speech analytics drivers and inhibitors summary
Table 7: Examples of speech analytics start-ups (<3 years old)
Table 8: Major cloud provider transcription and analytics offers
Table 9: List of regulations where compliance may be required and where recording and speech analytics can help
Table 10: Speech analytics enabling technology and application vendor categories
Table 11: Telephony media server selection criteria
Table 12: STT engine selection criteria
Table 13: NLU engine selection criteria
Table 14: Analytics application selection criteria
Table 15: TTS and NLU technology build vs. buy considerations
Table 16: Voicebot in RTC market drivers and inhibitors
Table 17: Comparison of major cloud platform TTS vendors
Table 18: Sample of LinkedIn job posting titles that include VUI requirements
Table 19: Voicebot stakeholder groups and trends
Table 20: RTC-Bot gateway options for major cloud vendor bot systems
Table 21: Wake Word Detector selection criteria
Table 22: Speech-to-text for voicebots selection criteria
Table 23: Text-to-speech selection criteria
Table 24: Bot engine selection criterion
Table 25: Computer vision use cases in real time communications
Table 26: Computer vision drivers and inhibitors summary
Table 27: Computer vision services by cloud vendors
Table 28: Computer vision related services in mobile operating systems
Table 29: Major cloud provider video analytics features
Table 30: Specialized computer vision vendors
Table 31: Selection criteria in computer vision
Table 32: RTC quality and cost optimization drivers and inhibitors summary
Table 33: Vendors and optimization use cases across the media processing stack

Note: Product cover images may vary from those shown
3 of 5


4 of 5


  • 2Hz
  • batvoice
  • Dolby
  • Google
  • Mitel
  • SightCorp
  • MORE

The report indicates most AI efforts in communications companies are focused on speech analytics. Established vendors and a growing number of startups are looking into AI technologies to improve their product offerings, create better experiences for their customers and increase their competitiveness. With limited skills in AI available, it is crucial for companies to start early on in their journey towards AI support. Most communications vendors are only starting this journey and will require major effort to catch-up with current technology leaders.

"AI and ML will transform every industry. When computers were introduced they were like bicycles for the mind. The introduction of AI is like the space shuttle" Said Omar Javaid, Chief Product Officer at Vonage. "This is an excellent and thorough report. It is written by people that actually understand technology and how products are made."

"There is a real demand for AI in communication products. Our partners are expecting us to have a plan for AI and the timing of this research is spot on." Said John Logsdon, CEO and Founder of This is Drum Technologies Ltd. "This report contains a wide and detailed view of the industry, assisting us in honing our AI roadmap for our web collaboration product."

Communications developers are interested in leveraging machine learning, but lack qualified staff to drive these efforts. Major AI cloud platform providers like Amazon, Google, and Microsoft present both promise and peril for communications app makers. They provide advanced AI APIs that can be used in RTC applications, but at the same time they often use better versions of these same APIs in their own competing communications services. The report authors advise, "The RTC industry needs to take a more proactive role in training its existing employees around ML methods and attracting ML graduates or it will continue to lose ground to other industries and outside players who will eventually come back to take RTC customers."

Note: Product cover images may vary from those shown
5 of 5
  • 2Hz
  • Affectiva
  • Agora.io
  • Amazon
  • Apple
  • Aspect
  • AT&T
  • Avaya
  • aisense
  • batvoice
  • Blippar
  • CallMiner
  • Callstats.io
  • Chorus.ai
  • Cisco
  • Crowd Emotion
  • Deepgram
  • Dialpad
  • Dolby
  • Etherlabs
  • ExecVision
  • Eyeris
  • Face++
  • Facebook
  • Five9
  • Genesys
  • Google
  • Gong.io
  • i2x
  • IBM
  • iMotions
  • Impelo
  • Kairos
  • Lifesize
  • Logitech
  • Microsoft
  • Mitel
  • Mozilla
  • Mycroft
  • NewVoiceMedia
  • NICE inContact
  • Nuance
  • nViso
  • Plivo
  • Polycom
  • SightCorp
  • SignalWire
  • SkyBiometry
  • Twilio
  • Verint
  • Vidyo
  • Voca
  • Voicebase
  • Voicera
  • Voxbone
  • Voximplant
  • Vonage
Note: Product cover images may vary from those shown