上国际互联网加速软件

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

  • 265,016 images (COCO and abstract scenes)
  • At least 3 questions (5.4 questions on average) per image
  • 10 ground truth answers per question
  • 3 plausible (but likely incorrect) answers per question
  • Automatic evaluation metric

  • Subscribe to our group for updates!

    上国际互联网加速软件

    Details on downloading the latest dataset may be found on the download webpage.

  • April 2017: Full release (v2.0)

  • Balanced Real Images
  • 204,721 COCO images
    (all of current train/val/test)
  • 1,105,904 questions
  • 11,059,040 ground truth answers
  • March 2017: Beta v1.9 release

  • October 2015: Full release (v1.0)

  • July 2015: Beta v0.9 release

  • June 2015: Beta v0.1 release


  • 上国际互联网加速软件

    Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (CVPR 2017)




    Download the paper

    BibTeX

    上国际互联网加速软件

    Yin and Yang: Balancing and Answering Binary Visual Questions (CVPR 2016)




    Download the paper

    BibTeX

    上国际互联网加速软件

    VQA: Visual Question Answering (ICCV 2015)




    Download the paper

    BibTeX


    上国际互联网加速软件


    上国际互联网加速软件

    Any feedback is very welcome! Please send it to visualqa@gmail.com.