pp. 2195-2204
S&M3316 Research Paper of Special Issue https://doi.org/10.18494/SAM4410 Published: July 13, 2023 Image Caption Generation Using Scoring Based on Object Detection and Word2Vec [PDF] Tadanobu Misawa, Nozomi Morizumi, and Kazuya Yamashita (Received March30, 2023; Accepted June 6, 2023) Keywords: image caption generation, deep learning, object detection, Word2Vec, scoring
Generating descriptive text from images, known as caption generation, is a noteworthy research field with potential applications, including aiding the visually impaired. Recently, numerous methods based on deep learning have been proposed. Previous methods learn the relationship between image features and captions on a large dataset of image–caption pairs. However, it is difficult to correctly learn all objects, object attributes, and relationships between objects. Therefore, occasionally incorrect captions are generated. For instance, captions about objects not included in the image are generated. In this study, we propose a scoring method using object detection and Word2Vec to output the correct caption for an object in the image. First, multiple captions are generated. Subsequently, object detection is performed, and the score is calculated using the resulting labels from object detection and the nouns extracted from each caption. Finally, the output is the caption with the highest score. Experimental evaluation of the proposed method on the Microsoft Common Objects in Context (MSCOCO) dataset demonstrates that the proposed method is effective in improving the accuracy of caption generation.
Corresponding author: Tadanobu MisawaThis work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Tadanobu Misawa, Nozomi Morizumi, and Kazuya Yamashita, Image Caption Generation Using Scoring Based on Object Detection and Word2Vec, Sens. Mater., Vol. 35, No. 7, 2023, p. 2195-2204. |