We’ve previously written about one of our core technologies at Cobalt Speech & Language - automatic speech recognition (ASR). When you speak, the ASR system converts your spoken words into text. Another core technology at Cobalt is text-to-speech (TTS), or speech synthesis, which converts written words into spoken audio.
Many people have used automatic speech recognition systems to transcribe audio to text, but there are a host of other items that it’s useful to identify from a stream of audio. One task in particular is called diarization - who spoke when? Knowing this information can help with a range of downstream applications. For example, in meeting summarization, knowing who said something means you can accurately make notes and allocate action items.