Revealing the secrets of Siri, Apple publishes a paper explaining the design ideas of the voice assistant

Recently, Apple released a series of papers to explain the important working mechanisms of voice assistants, publicly revealed Siri, and contributed its different design ideas to the industry.

In the first paper, Apple explained the multitasking problem in voice assistants, pointing out that in Siri, wake-up processing usually requires two steps: AI must first determine whether the voice content in the input audio matches the voice content of the trigger phrase (voice trigger detection), and then must determine whether the speaker's voice matches the voice of one or more registered users (speaker verification). The general method is to handle the two tasks separately, but Apple believes that a neural network model can be used to solve the two tasks at the same time. At the same time, it said that after verification, the performance of this method in all aspects can meet expectations.

In the paper, the researchers gave an example of the model. They trained models designed based on two ideas in a dataset containing 16,000 hours of annotated samples, of which 5,000 hours of audio had voice tags and the rest had only speaker tags. Compared with the general idea of training models to obtain multiple labels, Apple trains models for multiple related tasks by cascading training data for different tasks. It was found that with the same performance, Apple's newly proposed model is more suitable for application. It can share calculations between two tasks, greatly saving memory space on the device, while reducing calculation time or waiting time and power/battery consumption.

In another paper, Apple also introduced the design of a speaker recognition system for multilingual speech scenarios - knowledge graph-assisted dictation system decision-making. Taking the acoustic sub-model as an example, it can make predictions based on voice signal transmission traces, and its context-aware prediction component takes into account various interactive context signals, where the context signal contains information about the conditions for issuing commands, installed command language environments, currently selected command language environments, and whether the user switches command language environments before making a request.

The results show that the advantage of this design is that they can help in situations where the speech signal is too short to produce reliable predictions through the acoustic model.

In addition, Apple also proposed a supplementary study to mitigate the problem of false triggering, that is, ignoring voices that are not suitable for the voice assistant (Siri). Based on the idea of designing AI models based on graph structures, researchers proposed a graph neural network (GNN) in which each node is connected to a label. The results showed that the model reduced false triggers by 87%.

<<: Can the iPhone still perform facial recognition when wearing a mask?

>>: Apple releases new patent? Can it solve the crease problem of foldable devices?

A piece of sugar cane was eaten and sent to the ICU! Don't eat this kind of sugar cane, it can cause death

Revealing the secrets of Siri, Apple publishes a paper explaining the design ideas of the voice assistant

A piece of sugar cane was eaten and sent to the ICU! Don't eat this kind of sugar cane, it can cause death

How to quickly increase the popularity of your live streaming room?

Is there a conflict between the original intention of e-sports and its commercial purpose?

App download volume is too low? You should collect product issues

The most heartfelt public service advertisement: If your family is involved in a traffic accident...

Huawei announces new flagship Honor 6 Plus: rear dual camera

Wonders of the universe: Does the gold on Earth come from the crazy neutron star "alchemy"?

Is Toutiao’s promotion effective? How much does it usually cost?

Are there other planets that support life outside of Earth? Current research has found that…

How to write copywriting that leverages momentum quickly like Durex?

Recommend

Hidden trap in online red envelopes: Woman clicked on a 500 yuan red envelope and was defrauded of 10,000 yuan

Chengdu Tea Selection Agent recommends a very reliable one to everyone [Sincere Night View]

Mango Internet TV review experience: Mango HiQ 5II has good performance

Finally figured it out! What should operations and products do in the early stages of a project?

So?! It is not recommended to wear headphones when riding the subway???

Samsung thrives on memory chips, how can Chinese manufacturers break through?

How to get "Apple Recommended" for a good game? There is a routine

The most complete in history! Summary of online marketing and promotion channels! One picture is enough, I saved it

How to make programmers happy at work: details determine success or failure

What if the US bans it? The world's most profitable non-game app in August: TikTok is unrivaled

Microsoft GitHub releases Android Beta version with support for dark mode

Liangshan Mini Program Production Company, how much does it cost to produce a mechanical equipment mini program?

How much does it cost to produce the Huai'an Wood Mini Program? Huaian wood small program production price inquiry

When your period coincides with an exam or competition, you can either...

How to take advantage of the Qixi Festival and what are the directions?