Multimodal Learning Based Spatial Relation Identification

Dash, Sandeep Kumar; Sureshchandra, Y. V.; Mishra, Yatharth; Pakray, Partha; Das, Ranjita; Gelbukh, Alexander


Título del documento:	Multimodal Learning Based Spatial Relation Identification
Revista:	Computación y sistemas
Base de datos:
Número de sistema:	000560533
ISSN:	1405-5546
Autores:	Dash, Sandeep Kumar¹ Sureshchandra, Y. V.¹ Mishra, Yatharth¹ Pakray, Partha² Das, Ranjita¹ Gelbukh, Alexander³
Instituciones:	¹National Insitute of Technology, Mizoram. India ²National Insitute of Technology, Silchar. India ³Instituto Politécnico Nacional, México
Año:	2020
Periodo:	Jul-Sep
Volumen:	24
Número:	3
Paginación:	1327-1335
País:	México
Idioma:	Inglés
Resumen en inglés	Spatial Relation identification is one of the integral parts of Spatial Information Retrieval. It deals with identifying the spatially related objects in view of their physical orientation or placement with respect to each other. The concept is widely used in many fields such as Robotics, Image Caption Generation and many more such areas. In this work the focus is to gather information from multiple modalities such as Image and its corresponding Text so as to strengthen the learning process for the identification of Spatial Relation pairs from a given text. Two different multimodal approaches are proposed in this work. In the first approach, information is explored as a sequential learning process where the individual Spatial Roles are identified as connected entities, which makes the Spatial Relation retrieval easy and efficient enough. To counter the small size of the dataset along with necessity to avoid overfitting, an efficient backward propagation based Neural Network was used to classify the candidate roles and the relations. The feature selection was different for all the classification tasks. Building on the selected feature from the first approach, the second approach uses a transfer learning method that utilizes an existing image caption generation model to retrieve the vital topic based information from image which is then used for the task. Thereby both approaches used information from two modalities which are further used to train the system in the respective approach. The model achieves state-of-the-art performance in terms of Precision for two of the Spatial Roles identification. This validates the advantage of using multimodal learning when compared with other partial-multimodal processes.
Disciplinas:	Ciencias de la computación
Palabras clave:	Inteligencia artificial
Keyword:	Spatial role labeling, Spatial relation identification, Multimodal learning, Multi layer perceptron, Artificial intelligence
Texto completo:	Texto completo (Ver HTML) Texto completo (Ver PDF)

Multimodal Learning Based Spatial Relation Identification

Espere un momento...