Young Researcher Paper Award 2025
🥇Winners

Notice of retraction
Vol. 32, No. 8(2), S&M2292

Print: ISSN 0914-4935
Online: ISSN 2435-0869
Sensors and Materials
is an international peer-reviewed open access journal to provide a forum for researchers working in multidisciplinary fields of sensing technology.
Sensors and Materials
is covered by Science Citation Index Expanded (Clarivate Analytics), Scopus (Elsevier), and other databases.

Instructions to authors
English    日本語

Instructions for manuscript preparation
English    日本語

Template
English

Publisher
 MYU K.K.
 Sensors and Materials
 1-23-3-303 Sendagi,
 Bunkyo-ku, Tokyo 113-0022, Japan
 Tel: 81-3-3827-8549
 Fax: 81-3-3827-8547

MYU Research, a scientific publisher, seeks a native English-speaking proofreader with a scientific background. B.Sc. or higher degree is desirable. In-office position; work hours negotiable. Call 03-3827-8549 for further information.


MYU Research

(proofreading and recording)


MYU K.K.
(translation service)


The Art of Writing Scientific Papers

(How to write scientific papers)
(Japanese Only)

Sensors and Materials, Volume 38, Number 3(3) (2026)
Copyright(C) MYU K.K.
pp. 1447-1461
S&M4387 Research paper
https://doi.org/10.18494/SAM5946
Published: March 23, 2026

Design of a Mandarin Spoken Dialogue System Using Tacotron2-based Speech Synthesis with Dialogist-aware System-speaking-style Switching [PDF]

Ing-Jr Ding, Po-Jung Chen, Xin-Bau Li, and Yih-Her Yan

(Received September 24, 2025; Accepted February 20, 2026)

Keywords: spoken dialogue system, Tacotron2 speech synthesis, model fine tuning, synthetic speech evaluation, YOLO dialogist identification

As the global aging trend intensifies, the demand for long-term care systems will continue to rise, necessitating solutions to the problems of a shortage of manpower and excessive burdens on traditional human care. Among all care systems using AI techniques, the chatting system that can create a tight interaction between the aged and the system has inevitably become a necessary AI tool. However, for the aged, including those in Taiwan society, text-typing-based AI chatting systems with the interaction model of text-in–text-out are highly complicated and difficult to use. To tackle this issue, we will develop a Mandarin spoken dialogue system where chatting interactions will be in a simple and straight speech-to-speech mode. In addition, to provide emotion-connected voice interactions with psychological comfort and social companionship, the designed dialogue system will specifically contain the functionality of dialogist-aware system-speaking-style switching; in accordance with the system dialogist identity, the responding synthetic speech of the system will be in the style of a target speaker that is matched to the dialogist. The developed Mandarin spoken dialogue system in this study typically includes three computing modules, automatic speech recognition (ASR), semantics understanding of a large language model (LLM), and text-to-speech (TTS) speech synthesis. For the first two modules, the open source Google ASR and Google Gemma LLM are effectively employed and suitably integrated into the dialogue system. For TTS, to additionally perform system-speaking-style switching, the well-known Tacotron2 speech synthesis approach is adopted in this work. The Tacotron2 approach presented by Google is famous for its effectiveness in the deep learning of the speech database available. In this study, an initial Tacotron2 TTS model is first established using the Mandarin speech database ‘Biaobei,’ following which, a model fine-tuning procedure that uses small amounts of speech data from the specific target speaker to adjust the initial model parameters is designed. Aimed at the dialogist recognition of the dialogue system, You Only Look Once (YOLO)-based face detection is performed to classify the dialogist identity. With the recognized dialogist, the fine-tuned adaptation Tacotron2 model matched to this dialogist will then be used to perform speech synthesis. To evaluate the naturalness of the synthetic speech, various signal analysis evaluation metrics, including Mel-cepstral distortion (MCD), linear prediction code distortion (LPCD), and peak signal-to-noise ratio (PSNR), are also carried out in this work to investigate the effectiveness and compare the accuracy by the human-decision mean opinion score (MOS) approach.

Corresponding author: Ing-Jr Ding


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Cite this article
Ing-Jr Ding, Po-Jung Chen, Xin-Bau Li, and Yih-Her Yan, Design of a Mandarin Spoken Dialogue System Using Tacotron2-based Speech Synthesis with Dialogist-aware System-speaking-style Switching, Sens. Mater., Vol. 38, No. 3, 2026, p. 1447-1461.



Forthcoming Regular Issues


Forthcoming Special Issues

Special Issue on Novel Sensors, Materials, and Related Technologies on Artificial Intelligence of Things Applications
Guest editor, Teen-Hang Meen (National Formosa University), Wenbing Zhao (Cleveland State University), and Cheng-Fu Yang (National University of Kaohsiung)
Call for paper


Special Issue on Advanced GeoAI for Smart Cities: Novel Data Modeling with Multi-source Sensor Data
Guest editor, Prof. Changfeng Jing (China University of Geosciences Beijing)
Call for paper


Special Issue on Advanced Sensor Application Development
Guest editor, Shih-Chen Shi (National Cheng Kung University) and Tao-Hsing Chen (National Kaohsiung University of Science and Technology)
Call for paper


Special Issue on Mobile Computing and Ubiquitous Networking for Smart Society
Guest editor, Akira Uchiyama (The University of Osaka) and Jaehoon Paul Jeong (Sungkyunkwan University)
Call for paper


Special Issue on Advanced Materials and Technologies for Sensor and Artificial- Intelligence-of-Things Applications (Selected Papers from ICASI 2026)
Guest editor, Sheng-Joue Young (National Yunlin University of Science and Technology)
Conference website
Call for paper


Special Issue on Biosensing Devices
Guest editor, Kiyotaka Sasagawa (Nara Institute of Science and Technology)
Call for paper


Copyright(C) MYU K.K. All Rights Reserved.