What are Virtual Assistants?

cobalt istock download.jpg

“Hey Computer, tell me the latest”

With the rise of virtual assistants like Amazon’s Alexa, Apple’s Siri and Google’s assistant, we’re all beginning to get used to talking to our devices. In contrast to computers that have a keyboard and mouse, or tablets and phones with a touchscreen, virtual assistants let us interact using natural spoken language. Voice interfaces drastically simplify our interaction with technology. 

To fulfil requests, virtual assistants are built on a complex pipeline of AI technology:

  • A Wakeword (WW) detector runs on the device, listening for the user to say a particular word or phrase to activate the assistant. It’s also possible to activate the assistant in other ways, like a push-to-talk button.

  • Automatic Speech Recognition (ASR) converts spoken audio from the user into a text transcription.

  • Natural Language Understanding (NLU) takes the transcription of what the user said and predicts their intention in a way that’s actionable. This component understands that users can make the same request in a multitude of different ways that should all have the same outcome.

  • The Dialogue Manager (DM) decides what to say back to the user, whether to take any action, and handles any conversation.

  • Text to Speech (TTS) is the output voice of the assistant.

Screen Shot 2019-09-09 at 09.48.40.png


The technology in this pipeline needs to cope with the breadth and ambiguity of natural language. Hence, alongside manually defined rules, it’s based on machine learning - a group of AI algorithms that learn their behaviour from data instead of being explicitly programmed. This allows assistants to learn how people speak and be able to generalise to new speakers or requests. 

Types of virtual assistant

AI assistants can be deployed in many ways - e.g. on a smartphone app, over a phone call, or on a dedicated device like a smart speaker. There are many places where virtual assistants are proving useful, and new applications are continually being built. The simplest setup is a command and control system. Here the user has just a few commands available to speak to control a device, with not much in the way of dialogue. Simple assistants are often used in environments where hands-free control improves efficiency, for example giving machine operators additional voice control on the factory floor. 

Screen Shot 2019-08-28 at 15.02.52.png

In a step up from command and control systems, many of today’s assistants are task-oriented. The user and computer work together to achieve well-defined tasks like making a bank transfer or finding a mortgage recommendation. These assistants typically work in narrow domains like finance or customer service and require some dialogue back and forth with the user to complete the task. 


More general virtual personal assistants like Amazon’s Alexa or Apple’s Siri handle many different enquiries across a number of domains. They allow you to play music, ask for the weather, control your smart home devices, ask for jokes and much more. Their interactions remain task-oriented, though they typically have some chatty responses to general enquiries. 


Academic research is moving beyond task-oriented dialogue towards new forms of conversational interaction. Fully conversational agents are some way from being built and deployed at scale, but current research is looking towards social forms of human-computer interaction. Competitions like the Alexa Prize - a university competition to build assistants that converse coherently and engagingly with humans - are showcasing some of these results.


Looking to the future

Despite their widespread adoption, AI assistants at scale are still in their infancy. Apple’s Siri launched on the iPhone relatively recently in 2011, and Amazon’s Alexa in 2014. The underlying technology is continually improving. In the next few years, we expect to see AI assistants become

  • Customisable - organisations will more easily be able to build custom interactions. We are already starting to see the first tools to allow easy customisation of voice assistants.

  • Contextual - assistants will incorporate context from different sources. Relevant context can come from real-world knowledge, from personalised information about the user, or from the history of the current conversation.

  • Conversational - while human levels of conversation are still a long way in the future, AI assistants will incorporate more rudimentary conversational capability in the near future.


At Cobalt Speech, we specialise in building custom voice and language technology for our clients, whether one part of the pipeline or a complete virtual assistant. Are you interested in building a secure and private virtual assistant for your enterprise? Get in touch to see how we can help you.


About the Author

catherine_small.jpg

Catherine Breslin is a machine learning scientist with experience in a wide range of voice and language technology. Her work spans from academic research improving speech recognition algorithms through to scaling Amazon’s Alexa. She’s based in Cambridge UK and coordinates Cobalt’s UK efforts.

Introducing the CoBlog: Cobalt Blog

On any given day, my colleagues and I at Cobalt Speech and Language are tackling a variety of diverse problems in the world of speech processing, machine learning, and natural language processing. In order to invite others to have a look into our virtual workshop, we’re starting this blog. About once a week, you’ll find a short feature here that will help acquaint you with the people and projects that put Cobalt Speech at the forefront of progress in the world of speech technology.

To kick off this “CoBlog”, I thought it appropriate to share a little bit about why we started Cobalt in the first place.

Five years ago, I sat in my office at Amazon, getting ready for the upcoming Echo & Alexa launch.  I thought back on the incredible three year adventure that had prepared us for that moment. It had been an amazing experience growing the speech & NLU team at Amazon, defining and discovering what Alexa could do, working with the hardware, applications, data, and other teams at Amazon.  One thing that stood out to me was how much effort had gone into developing this novel product. I guessed that Amazon had invested hundreds of millions of dollars into preparing for a successful launch of Alexa and the Echo products.

The idea for Alexa was simple – a voice-based home assistant that could do simple things like play music and tell you the time or weather, but only a giant like Amazon could have pulled it off.  I wondered how many other entrepreneurs had similarly innovative ideas, but lacked the technical resources to execute their vision. In particular, having built speech & language research teams at Nuance, Yap, and now Amazon, I knew how hard it was to recruit top speech & language scientists and engineers and convince them to join a new company. 

That’s when the idea for Cobalt came to me.  Amazon, Microsoft, Google, and a few others have their own world-class speech & language teams.  Cobalt would be the speech team for everyone else. I would hire world-class speech & language scientists and engineers, we would develop our own, independent core software and tools, and we would use those tools to help build the dreams of everyone who needed that technology.  We would be the speech & language development partner for everyone who needed one.

Over the past 5 years, we have worked with about 100 different companies, developing all manner of applications with our technology, and it has been an incredible adventure.  One of the interesting aspects is that while we have focused on speech & language technology, our customers focus on their respective specialties. Our partnerships have taught us things about those diverse domains, and we have enjoyed learning about speech applications that were new and different.  We have learned about language pedagogy as we have developed tools to help language learners improve their pronunciation. We have learned about agriculture as we developed tools to help farmers and agronomists. We learned about the worlds of finance, education, entertainment, call centers, assistive technologies, child development, government, and so many other areas.  It has been very rewarding.

Over the coming months and years, we will share more in this blog about some of the projects we have worked on, some details about Cobalt’s technological suite, and introduce you to our team and how we work together. 

We hope you’ll enjoy getting to know us.


JeffAdams.png