Visual Question Answering for Response Synthesis Based on Spatial Actions

The paper considers the automatic analysis problem of a user’s natural language query from an image. The mechanism synthesizes a logically correct non-binary response. Synthesis is carried out on the basis of combining the results of convolutional and recurrent networks and projection on a set of valid answers. A three-dimensional data set has been developed to search for an answer in a complex environment using a robotic arm. Similar systems examples and their comparison are given. The experiments results showed that our method is able to achieve indicators comparable with known models. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Авторы
Kiselev G. , Weizenfeld D. , Gorbunova Y.
Сборник материалов конференции
Издательство
Springer Science and Business Media Deutschland GmbH
Язык
Английский
Страницы
52-62
Статус
Опубликовано
Том
1748 CCIS
Год
2023
Организации
  • 1 Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Street, Moscow, 117198, Russian Federation
  • 2 Artificial Intelligence Research Institute, FRC CSC RAS, 44 Vavilova Street, Moscow, 119333, Russian Federation
Ключевые слова
Computer science; Computer vision; Machine learning; Neural networks
Цитировать

Другие записи

Davydov V.M., Stepanov A.Y.
Iberoamerica. Федеральное государственное бюджетное учреждение науки Институт Латинской Америки Российской академии наук. 2023. С. 5-30