One-shot instantiated experience with simpler in-vehicle interaction

The upsurge of artificial intelligence has driven the development of the entire industry. Intelligent voice, as the most natural means of interaction, naturally attracts much attention. In today's interactive transformation caused by the voice interaction, the smart car field has become a pioneer in the transformation, especially in the aftermarket, intelligent voice seems to have become the standard for interactive vehicle scenes. Â

Domestic enterprises that focus on smart voices are increasingly competitive. Ali, voice customer service has become just what it needs; Baidu, a powerful search resource library has become a backup force of Baidu voice; HKUST News, relying on government projects, in education, health care, smart cities and other areas also occupy a market advantage; music as launch music Le audio, changing the way LeTV Super TV interacts; Iqiyi, 360, etc. are also involved in voice technology. The professional voice company is relying on the characteristic solution, and it is booming. Among these, the simple and convenient way of interaction has become a major effect of almost all speech interaction programs.

Talking about the simplicity and convenience of voice interaction, the one-shot feature added to AIO 3.1 on Oct. 20, 2016 by SBCI has attracted great attention from the industry. AIOS For Car is a dialogue operating system launched by Scipto in October 2015 for the smart car rear loading market. It is mainly used in cars, smart rearview mirrors, HUD, and Internet cars. In June 2016, SCI upgraded its version to AIOS3.0, adding 7 new features. On October 20th, SIBI again made efforts to upgrade to AIOS3.1 version. The one-shot feature accumulated for a long time also Finally unveiled mystery.

One-shot

Si Bisi describes the one-shot function as "a talk". This description is also very vivid and close to reality.

Figure 1 Scipoche one-short said

It is reported that one-shot one said that the use of "wake up words + speech semantic recognition" integrated approach to achieve a zero-latency, zero-delay, seamless docking between awakening words and voice manipulation, abandon the traditional form of question and answer, The steps of user voice control are greatly reduced, information feedback is implemented, the complexity is simplified, and simple operation is achieved, but such simplicity is not simple at the beginning of design.

One of the features of one-shot is to recognize the integration of awakening and semantic understanding, to ensure the unity and consistency of voice interaction and to complete the manipulation. To give a simple example, in the past, the interactive mode of intelligent voice was a question and answer. The user issued a wake-up command and needed the device to feed back the standby information before it could start the interaction. For example:

User: Hello Xiaochi (Awakening Word Command)

Equipment: What can I do for you? (Device feedback indicates that it is in the receiving state)

User: I am going to the airport

Equipment: Start to navigate you to the airport

The One-shot function can be used in one sentence to realize the integration of "wakeup words + speech semantic recognition", such as implementing such interactions:

User: Hello Xiaochi, I'm going to the airport

Equipment: Start to navigate you to the airport

This experience seems to be more efficient than traditional. Perhaps in the future, in the human-computer interaction, it is not impossible for the machine to track the user's intention by collecting the user's behavioral habits data and implement the following dialogue:

A: I always have a question for you

B: Loved .... Â

System response speed and accuracy are always a major concern for users. The one-shot feature in AIOS 3.1 uses the local + cloud hybrid engine mode. Speech wake-up and common voice command recognition are stored locally, and the system receives responsive responses. Speech recognition can respond accurately and promptly. At the same time, continuous speech recognition and semantic understanding are processed in the cloud. Based on the scene, user habits data is collected. Through deep learning, user intent is analyzed and tracked to ensure semantic understanding accuracy. The local plus cloud-side hybrid engine processing not only ensures the response speed but also ensures the accuracy of interaction. Even if there is no network, the basic voice interaction function can still be used.

GUI interactive interface will inevitably continue to advance and change, VUI voice interactive interface is a major trend. The release of the one-shot feature of Spitzberg demonstrates its deep thinking on the interactive design of VUI products. It is believed that VUI will inevitably cause more and more changes in the human-machine interaction in the future IOT industry by continuously improving the voice interaction experience.

Talking with technology: R & D strength is the key

A lot of users of automotive rear-loading products reflect a problem that when they use car voice, they say â€œIâ€™m going to Tiananmenâ€ and the system can respond, but when I say â€œGo to Tiananmenâ€, the system has no response. Why is this? In fact, this is because some voice solution providers have cured the words â€œI want to goâ€ as wake-up words, while words like â€œgoâ€ and â€œI want to goâ€ do not exactly match â€œI want to goâ€, so the system is naturally Unrecognized. This type of interaction is ostensibly promoted as â€œfree wake-upâ€. In fact, on the contrary, the system uses a large number of awakening words to achieve this. This leads to a very high false alarm rate, increased system resource consumption, poor scalability, and forcing users. Memories bring hidden dangers for users' safe driving. Â

On this issue, Lei Xiongguo, the product director of SIBIC, stated that â€œSiBit uses the One-shot function to solve this problem. The user can understand how the system can be understood. Based on the deep learning of the application scenario, the system can pass back-end user data. Collect, analyze user behavior habits, accurately track user intent, overcome stereotyped keyword recognition, and achieve smooth interaction of scenes through large vocabulary data.â€

In fact, the core competitiveness of voice technology companies has landed on the strength of voice technology R&D, product launch, and market application, and has become a way for companies to survive. Some companies insist on independent research and development, some companies are good at using international open source tools, for example, Google open source deep learning system Tensor Flow. The system supports popular deep neural network models such as CNN, RNN, and LSTM algorithms, greatly reducing the application difficulty of deep learning and speeding up development. However, general-purpose open source tools have limitations in their efficiency and privilege. They can neither satisfy the requirements of specific frontier algorithms, nor can they often satisfy user-based individual requirements for scenarios. Algorithms, data, and architecture in different professional fields must rely on actual conditions. Application builds and optimizes.

At present, there are not many companies deep in technology research and development in China's intelligent voice industry. For example, in deep learning, Baidu Institute launched the Deep Speech system, HKUST Newscom launched the FSMNN algorithm model, and SIBI and the Shanghai Jiaotong University Joint Lab are independently owned. VDCNN algorithm model and PSD decoding architecture. With independent research and development capabilities, we can combine product features, application scenarios, and deeply customize interactive solutions.

Whether it is the novel one-shot function or the traditional voice interaction, in the era of artificial intelligence, only the technology can be transformed into a good product experience to better depict the future of smart life. We look forward to more new technologies. The release is also looking forward to the surprise brought by the new technology.