Group2: Research and development on unconstrained spoken dialogue

Activities

Latest Activity

MS1 - Avatar Symbiotic Society (YouTube Channel)

Research and development for unconstrained remote spoken dialogue (Tatsuya KAWAHARA)

By enhancing automatic speech recognition and dialogue technologies, we will realize an autonomous spoken dialogue system with a sense of hospitality and empathy, and develop flexible dialogue technology that allows avatars (CAs) to flexibly switch between remote control and autonomous dialogue depending on the operator's intentions and the situation. The system will enable a single operator to have a flexible dialogue with multiple CAs by having the CAs perform routine introductions and responses autonomously, and having a human operator remotely establish human relationships and respond queries that cannot be coped with the autonomous system. So far, we have developed and implemented a system that can simultaneously interact with three CAs in parallel, for application scenarios such as attentive listening to the elderly, laboratory guidance and presentation, and job interviews. We aim to achieve the same performance and satisfaction as if all CAs were managed by human operators through experiments and subjective evaluations.

Lab website: http://www.sap.ist.i.kyoto-u.ac.jp/EN/

Research and development for acoustic information processing and voice conversion (Hiroshi SARUWATARI)

We mainly address an innovation in new signal processing and information processing systems, focusing our attention on understanding, processing, and control of sound media (speech, music, etc.). For example, theories on new statistical modeling and machine-learning-based algorithms are of interest for us to solve the optimization problems under acoustical generative models and physical constraint. Through the innovation, we realize expansion of human hearing systems, new man-machine interface systems as well as new contribution to music art creation. In this research project, we are studying acoustic signal processing, speech synthesis, and speech conversion for avatars.

Lab website: https://www.sp.ipc.i.u-tokyo.ac.jp/index-en

Research and development for dialogue knowledge processing (Ryuichiro HIGASHINAKA)

In order for cybernetic avatars (CAs) to take root, they need to be used for a variety of highly useful tasks. In our project, we research and develop dialogue knowledge processing to realize dialogue services for such highly useful tasks; CA is automatically constructed from data related to the dialogue service, and an interface is constructed to enable the operator to hand over the dialogue, quickly grasp the dialogue situation and perform appropriate operations. This will lead to a CA that does not require any cost by the operator except in emergency situations. We have confirmed that the 'adjacent pair' format is useful for understanding dialogues through experiments in which dialogues are frequently taken over by operators. In addition, in order to clarify the type of dialogue summary useful for the operator to understand the dialogue situation, we performed experiments in which various types of summaries were presented during a handover in dialogue, and confirmed the characteristics of each type of summary. In addition, we are also working on a dialogue breakdown detection technique to request the operator to take over the dialogue, and a technique to classify the causes of dialogue breakdowns so that the operators can use such information for an effective handover.

Research and development of CG-CA specific dialogues (Akinobu LEE)

We are conducting research on speech recognition, spoken dialogue system, natural language processing, speech interfaces, and spoken interaction for human-to-human and human-to-machine communication. In particular, we are engaging on practical research and development of speech recognition and spoken dialogue systems, from signal processing to dialogue control. The research results have been published and maintained as open source software such as Julius, a small open source general-purpose speech recognition engine, and MMDAgent, a toolkit for building spoken interaction systems, for many years. Currently, his main research theme is the study of interactive intelligent speech interfaces to connect every people with growing large-scale, sophisticated, and autonomous intelligent services. We believes that not only technologies for smooth conversation but also a philosophy and design methodology of the ideal interface design as a humanoid device is required. Based on this belief, we are promoting research and development of dialogue systems, from speech processing and natural language processing to character design and UI design theory.

Research and development for robust spoken dialogue processing (Kazunori Komatani)

Our group is studying basic technologies for systems that interact with humans using speech recognition and natural language processing, with a broad perspective on various layers ranging from acoustic signal processing to social interaction. In particular, we are also working on inference on the knowledge graph because knowledge is essential for an intelligent dialogue system. In this research project, we detect dialogue breakdowns at various levels , which are inevitable in autonomous dialogue, and develop technologies to enable the system to continue dialogue even in such cases. In order to use CA for a variety of users in a variety of environments, it is necessary to be able to deal with various types of errors and input utterances that are not expected by the system. We will study a method to detect such cases of dialogue breakdown in autonomous dialogue and to identify possible causes of such breakdowns while continuing the dialogue until the remote operator takes over.

Lab website: http://www.ei.sanken.osaka-u.ac.jp/index-e.html