Speech Recognition for Sales - Why Owning the Tech Stack Matters

August 4, 2017 Micha Breakstone

Over the past two years Chorus.ai has invested heavily in perfecting our proprietary Speech Recognition technology to achieve unparalleled accuracy. Our platform records and transcribes every sales call in real-time using our state of the art engines. Accurately transforming conversations into text serves a threefold goal:

  1. It allows reps and managers to easily search for key moments in the conversation and share them
  2. It allows Chorus.ai to measure call performance and identify what the top reps are doing so that managers can replicate behaviors across the sales team
  3. It enables guiding reps in real-time and identifying and creating alerts when deals are at risk

None of this would be possible without extremely accurate Speech Recognition technology that can pick up every single word including important keywords and phrases like competitors, products, acronyms or vernacular that is industry or company-specific.

It’s a Game of Accuracy

To achieve exceptional accuracy, we had to build the technology ourselves, training our linguistic and acoustic models on a specialized dataset comprised of 500,000+ sales and customer success conversations, and using state of the art algorithms including Recurrent Neural Networks, cross-entropy minimization, and Deep Learning for automatic phoneticization.


Our engines use technology very similar to that employed by Google, IBM Watson, Microsoft Bing and VoiceBase, the main difference being that our engines self-learn and optimize themselves using 500,000+ of our customers’ sales calls

The leading Speech Recognition engines (Google Voice, IBM Watson, Bing Speech API, VoiceBase, etc.) are all great for general-purpose transcriptions, but thrown into the wild on noisy calls using non-standard phrases (e.g. competitors, product features) and their performance quickly deteriorates (see chart above.)

Other Advantages of owning the stack: Security, Confidence, and Real-time

Security & Real-time - since our tech is entirely proprietary our customers’ data is never sent to an external service (Google or otherwise) and the conversations remain entirely private and secure. Also, no external provider means no delay on processing, so all the insights and guidance can be delivered to the reps and managers in 3 seconds or less.

Confidence - Every Speech engine is sensitive to low-quality audio and speaker clarity. Chorus.ai has turned this potential weakness into a strength, developing a unique “confidence” metric, so that when we detect low confidence we can notify reps that there's a good chance the prospect will have difficulty understanding them as well.

Explaining how exactly we automatically train our engines to optimize for each customer is beyond the scope of this post (if you want to read up some more about our tech here’s a more technical explanation) - but most importantly, more accurate Speech Recognition allows Chorus.ai to produce the most effective, precise, and insightful analysis of calls, holding true to our goal of delivering true Conversation Intelligence in real time.

Previous Article
Chorus for the Product Team: Voice of the Customer On-Demand
Chorus for the Product Team: Voice of the Customer On-Demand

Next Article
4 Ways to Cut Your Sales Cycle in Half
4 Ways to Cut Your Sales Cycle in Half