Life in Tech: Is Voice AI Technology the Future? A Siri Engineer’s Perspective

By Sivani Voruganti, MEng ’23 (EECS)

This life in tech interview is part of a series from E295: Communications for Engineering Leaders. In this course, Master of Engineering students were tasked with conducting an informational interview to learn more about working in tech. They then submitted a written account of the interview, edited and organized to create a clear, compelling narrative.

Photo by Vinicius “amnx” Amano on Unsplash

Ever asked Siri to tell you a joke in Airplane mode? As he chronicles on LinkedIn, making this very situation into a reality was the self-imposed mandate of my interviewee and his team of Software Engineers at the Apple Park in Cupertino ahead of the iOS15 release last year. Yes, jokes are a serious business for Siri engineers. They trailblazed with one goal in mind: to make Siri’s core conversational experience and the cutting-edge technology behind it even more robust, to perform the feat of running completely without internet access. Not too long from now, I’m sure we’ll be in space saying, “Hey Siri, how far is the next galaxy?” Following a Master’s in Computer Science at USC and short stints at a few tech companies, my interviewee found his true calling at Apple, working at a company he admired, with people he admired, on a product he admired–Siri, Apple’s Voice Assistant. He recalls that the voice assistant technology boom first started with Apple’s release of Siri in 2011, with other big tech companies like Amazon, Google, and Microsoft following suit to create Alexa, Google Assistant, and Cortana respectively. Nowadays, voice interfaces are built into millions of consumer devices across the globe, giving us the power to accomplish a myriad of different tasks in our daily routines–from sending text messages, to setting alarms, google searching, playing music, and even turning on the lights at home–all with just a few utterances. As simple as it is to call on Siri, powering the voice assistant are the complex Machine Learning and Natural Language Processing algorithms that my interviewee designs and codes. He enhances Siri’s abilities to detect more of the nuances and intricacies of human speech like pauses, corrections, and disfluencies, in order to allow the agent to better attune to each user’s unique mode of speech. In addition, he works on the SiriKit Integration Framework which allows developers of iOS applications to create voice interfaces for their own apps using Siri, to make their apps more conversation-friendly and better integrated with the iOS ecosystem as a whole.

His work at Apple is truly intriguing and inspiring to the next generation of curious engineers including myself, as it scopes out into an emerging technological movement of Voice User Interfaces (VUIs). The VUI technology enables humans to communicate with a plethora of “Internet of Things” (IoT) devices including mobile phones, cars, TVs, and home automation systems simply by using their natural language and voice. My interviewee believes that VUI technology is indeed the future–not simply because it provides convenience to the average user in seemingly mundane, everyday tasks, but ultimately because of its potential to inspire equity in society. He emphasizes that one of Siri’s driving goals is inclusive design, which involves making its voice interface accessible to as many people as possible.¹

“VUIs serve to empower people–of different ages, levels of literacy, socio-economic backgrounds, and those with mobility or visual impairments–to become more included in everyday society.”

For instance, in developing countries like India, with the recent proliferation of affordable smartphones and some of the cheapest mobile data prices in the world, a new kind of internet user has emerged–one that relies more on voice and video than text and typing.² As millions of people from all walks of life get online for the first time, voice technology plays an enormous role in opening up avenues for the digital inclusion less-educated/illiterate communities to access and participate in digital media interactions like messaging, social media, and even financial transactions. However, my interviewee opines that VUI technology still needs to build more maturity in the area of speech recognition. In recent times, there has been discussion and debate surrounding the socio-economic and racial biases that can often become unconsciously ingrained into speech recognition machine learning models. Often, the voice data that is fed into the models at training time can itself be biased or not diverse enough. Web-mined training data may not capture the variations of all languages, dialects, and accents of the world, causing trained speech recognition models to ultimately underperform or even exclude certain under-represented populations. For instance, due to the limitations of their training datasets, many large Natural Language Processing models are not yet equipped to fully understand and cater to specialized modes of speech like African American Vernacular English.³ My interviewee suggests that the solution to overcoming limitations in speech recognition is perhaps both technical and organizational–improving the representation of diverse populations in speech models, yet also recognizing and mitigating the systemic injustices of society at large.

After all, he says, “technology reflects the society that builds it.”

Over the next decade, IoT is poised to become mainstream–with an estimated 43 billion connected devices by 2023 alone.⁴ With every device becoming “smart,” there will likely be a proliferation of these “conversations” that humans engage in with their devices, enabled by a multitude of VUIs. My interviewee’s work resonates with me and has spurred a newfound interest to innovate initiatives that transform Siri-like voice assistants into their future avatars, and make way for the next wave of technological revolution surrounding the voice conversations between humans and ubiquitous IoT devices. References

Connect with Sivani Voruganti. Edited by Mary Tran.

Life in Tech: Is Voice AI Technology the Future? A Siri Engineer’s Perspective was originally published in Berkeley Master of Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

By Sivani Voruganti, MEng ’23 (EECS)

Explore

Experience