Speech Recognition
& Transcription

Transcribe TM

Cobalt's Latest End-to-end Speech Recognition engines

Cobalt Transcribe is here to revolutionize speech recognition with its state-of-the-art system powered by Deep Neural Networks (DNNs). With its lightning-fast and accurate speech recognition capabilities, you’ll experience seamless communication like never before.

We offer two exciting DNN architectures to cater to your unique needs:

Hybrid models: Enjoy the flexibility of separately tunable Acoustic Models, Lexicons, and Language Models, allowing for high customization to suit specific use cases. What’s more, these models support incredibly low-latency partial results, ensuring swift responses.

End-to-end models: Say goodbye to waiting! These models directly convert sounds to words within the same DNN. Ideal for general use cases, they guarantee accurate results, especially when instant responses aren’t critical.

Cobalt Transcribe is designed to fit your preferences, running on-premise, in your private cloud, or fully embedded on your device. And the best part? You have complete control over your data, ensuring both audio and transcripts stay within your secure hands.

Experience the future of speech recognition with Cobalt Transcribe. Join us and embrace a world of effortless communication today!

Cobalt Transcribe and Cobalt ASR Speech Recognition engines

  • Both end-to-end and hybrid deep learning models
  • Compatible with pre-built Kaldi models
  • Can be run on IntelTM, AMDTM and ARM-64 processors

Cobalt ASR has been used in a variety of applications, including

  • Contact center call transcription
  • Transcription of courtroom proceedings & legal depositions
  • Conversational interfaces for controlling apps and appliances
  • Smart home applications
  • Warehouse & inventory control
  • Inspection & Reporting (e.g. in agriculture, hospitality, machinery)


A large call center analytics company could not transcribe their clients’ audio using their usual solution because regulatory and security requirements disallowed sharing data via a third party cloud vendor in a multi-tenant environment.


The company licensed Cobalt’s best-in-breed technology to deploy in the client’s data centers and processed securely.


The OEM partner has significantly grown market share in regulated verticals such as financial services, healthcare and insurance.


Speech Recognition

Regulated Industries for PHI & PCI information

Transcribe Tuner TM

Use your own data to improve your speech recognition models.

Run with audio files and human-corrected transcripts to make the model more robust to your unique acoustic environment (noisy background, specific accents, etc.) or business-specific needs.

Add in-domain text documents such as user manuals, customer service scripts, or other written work similar to the kind of dialogue you expect to transcribe. This customizes your transcription model to be more inclined to recognize ambiguous phrases in the way most appropriate for your usage.  For example, “low number” and “loan number” sound nearly identical; a bank could customize the model to be biased toward recognizing “loan number”.  

Add lists of custom vocabulary words, with optional pronunciations and comparable common words.  For example, if your new word is “Hooli”, you can provide “Google”, “Amazon”, or “Facebook” as comparables and the Tuner will generate sentences for the new word similar to contexts in which those commonly used company names appear. 

Voice Channel TM

Multi-speaker separator

Cobalt’s Voice Channel engine differentiates between speakers in a conversation based on distinct characteristics of their voices. Our engine greatly improves the utility of automatic speech recognition when multiple speakers are recorded on a single channel.

By incorporating Neural Networks, Cobalt’s Voice Channel diarization system can detect and segment each speaker into a separated channel.

With the increased multi-speaker audio files from broadcast, Google MeetTM,  ZoomTM, WebexTM  meetings and many more the need to separate speakers for analysis or editing has grown.

Cobalt Detect TM

Word and Phrase Detection

With Cobalt Detect, you can spot words or phrases on the fly, whether it’s during live interactions or from a collection of recorded audio.

The best part? Cobalt Detect works phonetically, so it’s not limited to dictionary words. It effortlessly recognizes search terms that might not be found in ordinary conversation.

Imagine the incredible Customer Experience when you can pick up on specific words and phrases being spoken during a conversation.

And guess what? Cobalt Detect doesn’t stop there! It’s a superhero when it comes to contact center escalations and risk management.

With its automated brilliance, understanding words, phrases, combinations, and gaps becomes a breeze. Trust Cobalt’s Detect to elevate your interactions to new heights!

Need Something Custom?

Cobalt’s Products are very flexible and can be adapted to many scenarios.

 For example, we can train models to handle:

  • domain-specific language (e.g. medical, legal, agricultural, industrial, specific applications)
  • noisy or unusual acoustic conditions
  • alternate microphone configurations
  • accented speech (i.e. non-native speakers)
  • children’s speech
  • dysarthric speech