Visual Question Answering for Response Synthesis Based on Spatial Actions

The paper considers the automatic analysis problem of a user’s natural language query from an image. The mechanism synthesizes a logically correct non-binary response. Synthesis is carried out on the basis of combining the results of convolutional and recurrent networks and projection on a set of valid answers. A three-dimensional data set has been developed to search for an answer in a complex environment using a robotic arm. Similar systems examples and their comparison are given. The experiments results showed that our method is able to achieve indicators comparable with known models. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Authors
Kiselev G. , Weizenfeld D. , Gorbunova Y.
Publisher
Springer Science and Business Media Deutschland GmbH
Language
English
Pages
52-62
Status
Published
Volume
1748 CCIS
Year
2023
Organizations
  • 1 Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Street, Moscow, 117198, Russian Federation
  • 2 Artificial Intelligence Research Institute, FRC CSC RAS, 44 Vavilova Street, Moscow, 119333, Russian Federation
Keywords
Computer science; Computer vision; Machine learning; Neural networks

Other records

Davydov V.M., Stepanov A.Y.
Iberoamerica. Федеральное государственное бюджетное учреждение науки Институт Латинской Америки Российской академии наук. 2023. P. 5-30