Image Captioning Based on Deep Neural Networks

Shuang Liu; Liang Bai; Yanli Hu; Haoran Wang

doi:10.1051/matecconf/201823201052

Open Access

Issue		MATEC Web Conf. Volume 232, 2018 2018 2^nd International Conference on Electronic Information Technology and Computer Engineering (EITCE 2018)


Article Number		01052
Number of page(s)		7
Section		Network Security System, Neural Network and Data Information
DOI		https://doi.org/10.1051/matecconf/201823201052
Published online		19 November 2018

MATEC Web of Conferences 232, 01052 (2018)

Image Captioning Based on Deep Neural Networks

Shuang Liu, Liang Bai^a, Yanli Hu and Haoran Wang

College of Systems Engineering, National University of Defense Technology, 410073 Changsha, China

^a Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract

With the development of deep learning, the combination of computer vision and natural language process has aroused great attention in the past few years. Image captioning is a representative of this filed, which makes the computer learn to use one or more sentences to understand the visual content of an image. The meaningful description generation process of high level image semantics requires not only the recognition of the object and the scene, but the ability of analyzing the state, the attributes and the relationship among these objects. Though image captioning is a complicated and difficult task, a lot of researchers have achieved significant improvements. In this paper, we mainly describe three image captioning methods using the deep neural networks: CNN-RNN based, CNN-CNN based and Reinforcement-based framework. Then we introduce the representative work of these three top methods respectively, describe the evaluation metrics and summarize the benefits and major challenges.

This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.