About the technology

Under The Hood: Speech Synthesis

Submitted by Rasmus Dall on Wed, 06/15/2022 - 16:21

We’ve previously written about one of our core technologies at Cobalt Speech & Language - automatic speech recognition (ASR). When you speak, the ASR system converts your spoken words into text. Another core technology at Cobalt is text-to-speech (TTS), or speech synthesis, which converts written words into spoken audio.

One place that TTS voices are currently reaching large audiences is in virtual assistants. They can perform a multitude of tasks and it would be impossible to pre-record all of their answers in advance. Hence, virtual assistants use TTS technology to automatically generate responses to questions, allowing them to talk to you about a wide range of topics.

Bringing Voice Technology to your Organisation

Submitted by Catherine Breslin on Wed, 12/18/2019 - 09:46

Talking is a natural way for people to interact with each other. Small children can speak long before they can read, write, or type. People can carry on a conversation even when they’re busy doing other tasks. The introduction of virtual assistants like Amazon’s Alexa, Apple’s Siri and Google’s assistant have brought voice interfaces to the mainstream as a way for people to communicate with devices. These are popular with customers who love the ease with which they can be used. Voice holds the promise of providing a frictionless way to interact with computers, and organisations are looking for ways to take advantage.

Adapting Speech Recognition to Your Domain

Submitted by Ryan Lish on Wed, 11/20/2019 - 10:08

Automatic speech recognition (ASR) and other natural speech and language processing techniques have become ubiquitous in the technologies that surround us in today’s world. From my cell phone to my dashcam to my nightstand, I always have some form of digital assistant nearby, which I can summon with the sound of my voice.

Introducing Telefol: Cobalt’s Phonetic Indexing Engine

Submitted by Julie Sheffield on Wed, 10/30/2019 - 10:26

Imagine you work for a late-night comedy show and want to put together a montage of news anchors saying the word "covfefe". You could employ an army of interns to listen to hundreds of hours of recorded broadcasts, or you could use Cobalt's Telefol engine to search.  Technology to the rescue!

Understanding Keyword Spotting

Many companies and organizations have access to large volumes of recorded speech, but it can be challenging to leverage the full value of that wealth of information because it is costly and time-consuming to search through audio.  One strategy is to use automatic speech recognition to transcribe the audio, then search the transcript. The "covfefe" example illustrates one limitation of that approach--because it's not a word in the English lexicon, the transcript would not include it.

Under the Hood: Automatic Speech Recognition

Submitted by Catherine Breslin on Wed, 10/23/2019 - 10:25

Automatic Speech Recognition (ASR) is a key component of a virtual assistant - it converts audio into text. As well as being crucial for conversational AI, ASR has applications as a standalone technology in places like automated subtitling, call centre transcription and analytics, meeting transcription, and more. This post takes a deeper look at what’s under-the-hood of Cobalt’s Cubic speech recognition technology.

decoding process 

An automatic speech recognition system has three models: the acoustic model, language model and lexicon. They’re used together in an engine that ‘decodes’ the audio signal into a best guess transcription of the words that were spoken. 

Scaling Virtual Assistants

Submitted by Catherine Breslin on Wed, 09/18/2019 - 09:45

Virtual assistants allow us to interact with technology by voice. They are built on a complex pipeline of AI technology that understands the breadth and complexity of spoken language. This pipeline includes automatic speech recognition, natural language understanding, dialogue management and text-to-speech components. The technology in the pipeline is based on machine learning - a subset of AI algorithms that learn their behaviour from data instead of being explicitly programmed.

What are Virtual Assistants?

Submitted by Catherine Breslin on Wed, 09/11/2019 - 09:29

“Hey Computer, tell me the latest”

With the rise of virtual assistants like Amazon’s Alexa, Apple’s Siri and Google’s assistant, we’re all beginning to get used to talking to our devices. In contrast to computers that have a keyboard and mouse, or tablets and phones with a touchscreen, virtual assistants let us interact using natural spoken language. Voice interfaces drastically simplify our interaction with technology. 

To fulfill requests, virtual assistants are built on a complex pipeline of AI technology:

  • A Wakeword (WW) detector runs on the device, listening for the user to say a particular word or phrase to activate the assistant. It’s also possible to activate the assistant in other ways, like a push-to-talk button.

  • Automatic Speech Recognition (ASR) converts spoken audio from the user into a text transcription.

Subscribe to About the technology

Our Latest Posts

Jun 15, 2022
Close up of a woman, face and mouth, with letters floating across the screen. speech synthesis concept
By Rasmus Dall

We’ve previously written about one of our core technologies at Cobalt Speech & Language - automatic speech recognition (ASR). When you speak, the ASR system converts your spoken words into text. Another core technology at Cobalt is text-to-speech (TTS), or speech synthesis, which converts written words into spoken audio.

Jun 3, 2020
3 people talking with each other
By Arif Haque

IMPROVING SPEAKER DIARIZATION

Many people have used automatic speech recognition systems to transcribe audio to text, but there are a host of other items that it’s useful to identify from a stream of audio. One task in particular is called diarization - who spoke when? Knowing this information can help with a range of downstream applications. For example, in meeting summarization, knowing who said something means you can accurately make notes and allocate action items.