Recently, China's first AI input method - Baidu input method AI explorer was officially unveiled. This is a new input product with default input mode of voice input, and it also accepts multisensory inputs like facial expressions, limbs, etc.
Meanwhile, Baidu announced that voice technology has achieved a breakthrough that is of great significance to both academia and industrial community. Streaming truncated multi-layer attention (SMLTA) modeling has improved online speech recognition accuracy by 15 percent, and the online speech recognition service based on Attention technology was applied on a large scale for the first time in the world.
Baidu input method AI explorer is officially unveiled. [Photo provided to chinadaily.com.cn] |
Gao Liang, director of Baidu's voice technology department, released the latest breakthrough of Baidu speech technology, the Deep Peak 2 model. Its full name is "Context-independent phoneme combination modeling based on LSTM and CTC".
This new model breaks through the traditional model that has been used for over ten years, and can fully exert the parameter advantages of the neural network model, greatly improving the recognition accuracy of Chinese and English as well as mixed input in various styles and accents (such as reading, chatting, soft voice).
Thus, the relative accuracy rate of the modeling in conversational scenarios is 20 percent higher than the industry leading level, making the machine more suitable to users when conducting daily conversations.
Wang Haifeng, Baidu's senior vice president delivers a speech at the product launch. [Photo provided to chinadaily.com.cn] |
Cai Yuting, superintendent of Baidu's input method, announced the official launch of Baidu's input method v8.0. This version adds two new AI functions, "Voice Shorthand" and "AR Emoticon" to its existing AI functions.
Among them, "Voice Shorthand" exclusively launched the "voiceprint recognition" technology. This technology is aimed at small conferences of two or three people, and it is capable of automatically distinguishing speakers according to voiceprints in order to realize intelligent voice recognition.
The "AR Emoticon" function is based on Baidu's leading face recognition technology and AR technology. Users can not only perform face recognition with cameras or photo albums to create emoticons, but can also control virtual characters through their own facial expressions. The AR expressions produced in this way can be displayed directly through input method search, voice input and keyboard input.
Baidu's input method has been combined with Chinese intangible cultural heritage. [Photo provided to chinadaily.com.cn] |
At the same time, Baidu's input method has been combined with Chinese intangible cultural heritage such as Taohuawu, to introduce Chinese folk traditional art like New Year's painting into its emoticon, bringing many classical figures to real life and making them inherit Chinese historical culture.
Cai remarked at the product launch that, "The Baidu input method v8.0 version not only upgrades its functions, but also embraces the young culture and meet user's individual needs. The application of AI technology such as voice and image recognition in the input method propels the input method to break through the limitations of the words, and adapts to users' different forms of expression."
It is expected that, in addition to the human-computer interaction of text, sound and pictures, Baidu input method will also capture information from movement and eye contact, to provide users with more natural and personalized experience.