Google starts testing voice payment. Can you really pay with your mouth?

Google starts testing voice payment. Can you really pay with your mouth?

With the advancement of technology, the means of mobile payment have been changing with each passing day. From the cumbersome password input in the early days, there are now more convenient and secure fingerprint and face recognition. However, voiceprint recognition, which is now very common on smartphones, is rarely used in the payment field. Recently, there is news that Google has begun to implement this voice payment function in its products, which allows users to pay with their mouths.

According to relevant media reports, Google is currently testing a new feature that will allow consumers to use Voice Match to authorize and confirm payment. Google also confirmed that not all purchases will provide voice recognition, and at this stage this feature is only applicable to in-app purchases and restaurant orders, not Google Shopping.

[[328224]]

According to the report, this voice payment function was originally planned to be released at this year's I/O developer conference, but due to the epidemic, it had to skip the release and start testing directly. Currently, in the payment interface of Google Assistant, you can see the option of "Confirm with Voice Match".

In fact, voice payment technology is not new, and is even older than natural language processing (NLP) that voice assistants rely on. Yes, although both voice payment and natural language processing are related to voice, there is a world of difference between the two. The essence of voice payment is voiceprint recognition, but voiceprint recognition is obviously not equal to voice recognition.

Sound wave transmission is a data communication mode that uses sound as a transmission method. For example, the process of speaking can be understood as the process of encoding signals into sound. The process of listening is the process of decoding audio signals into language and text. The correspondence between the Chinese characters and pinyin used is actually equivalent to the audio protocol.

However, voiceprint recognition is a personal recognition, which requires extracting the voiceprint characteristics in the voice to determine who is speaking, that is, to solve the problem of "who is speaking". Speech recognition is a common recognition, which determines the content of the speech and solves the problem of "what was said". Obviously, the most important thing in voice payment is to determine who is the person who issued the payment command.

Since the size and shape of each person's vocal organs are unlikely to be exactly the same, these differences also lead to changes in the vocal airflow, which in turn produce differences in voiceprints. That's why we can "hear the voice before seeing the person" and judge the identity of the voice owner through timbre, pitch and speaking habits. Similarly, we can use algorithms to extract obvious, abstract and high-dimensional voiceprint features from voice information, and use deep learning to train the model, and then use unique biological features to prove the proposition of "I am myself".

In fact, the process of using voiceprint recognition to complete voice payment is very simple. The user sends a sound wave with a certain command, and the terminal device obtains this sound wave and converts it into a session, and sends the specific product information and transaction number to the Google backend. After matching the voiceprint information on the server side, the transaction operation can be started, and finally the information of the completed transaction is pushed to the Google voice assistant.

Before Google confirmed that it was testing voice payment, Amazon had already begun allowing users to pay bills on its Alexa last fall using voice commands. Once the user approves the transaction using something like "Alexa, pay my mobile bill," Alexa will use Amazon Pay to pay the bill amount and send a confirmation via the user's registered mobile phone number. In addition, Tmall Genie in the domestic market has also long been able to use voice payment. According to data released by Alibaba, during the Double 11 period last year alone, a total of 1.05 million orders on Tmall Genie were successfully paid by speaking.

However, what Google wants to achieve is obviously not just using voice payment on its own Google Home smart speaker, but is aiming at smart voice assistants that are more applicable to a wider range of scenarios. But what Google can think of, can't Amazon and Alibaba think of it? Fully integrating voice payment on smart voice assistants will undoubtedly greatly improve the user experience. After all, compared with face and fingerprint recognition, voiceprint recognition is much more convenient.

However, Amazon and Alibaba chose to limit this function to smart speakers, which are usually placed at home. There is a reason for this. Compared with fingerprint or facial information, voice is less controllable. After all, users can decide whether to put their fingers on the fingerprint recognition module or put their faces in front of the camera, but they cannot control the transmission of sound in this way.

More importantly, fingerprint information is difficult to collect, and facial recognition usually requires liveness detection, but voiceprint recognition is not only easy to collect, but also difficult to determine the user's state when speaking the payment command. In addition, AI technology has been fully spread today. Through deep learning models and waveform editing tools, the voice data of specified content can be spliced, and the user's voiceprint spectrum can be almost completely reproduced.

The security issues of voice payment do not only occur on the client side, the server side also faces certain risks. Voice payment can be regarded as a data interaction. For example, the cookie mechanism adopts a solution to keep the state on the client side, while the Session mechanism adopts a solution to keep the state on the server side. When the user visits the server for the first time, a Session will be created for the client, and a Session ID will be calculated through a special algorithm to identify the object.

However, since voice payment is not a one-time behavior, the next time the user interacts with the server, the data must be completed through SessionID. However, the implementation mechanism of SessionID makes it possible to be hijacked, such as classic XSS cross-site scripting attacks, network sniffing, proxy hijacking and other different attack modes. If SessionID is hijacked, the hacker can obtain the legitimate session of the target user, and then empty the victim's wallet like credit card fraud.

Therefore, this may be one of the important reasons why Google itself admitted that if the feedback and performance are too negative, the feature may not even be launched to the public. Therefore, before Google solves the critical security issues, if you want to complete the shopping experience by opening your mouth, it may only be possible on smart speakers for the time being.

<<:  Will QR codes be scanned by humans? Yes! But we can’t wait for that day

>>:  Four new and useful features! Detailed experience of the new WeChat version

Recommend

Marketing promotion: How to refine product selling points?

Distilling selling points is the key to selling a...

C4D Zero-Based Introductory Course for Newbies 2021

Resource Introduction of C4D Zero-Base Introductor...

Three information models for advertising planning!

In an advertising company, you will always encoun...

User operation, here is a decision model you can try

1. Key elements of user operations The so-called ...

Cancer may hold the secret to immortality?

Friends who have watched "The Emperor of All...

It's like a long-distance relationship

More than two thousand years ago, Mozi wrote in &...

Practical traffic diversion skills for Douyin (Part 2)

Yesterday I explained to you in detail how to use...

Mixue Ice City brand upgrade marketing strategy case

The only martial arts in the world that cannot be...

Short video operation "routines" and traffic surges!

In the era of mobile Internet, short videos have ...

8 Best Android Password Managers for Better Security in 2018

【51CTO.com Quick Translation】Using the same passw...