Cohere For AI – Community Talks: Yang Chen

December 19, 2023

8 views

1 min read

Cinema Mode

Bio: Yang Chen is a Ph.D. student at Georgia Tech, supervised by Professors Alan Ritter and Wei Xu. His research interests lie in grounding the visual knowledge of multimodal large language models through retrieval-augmented generation. He has interned at Google DeepMind, working with Hexiang Hu and Ming-Wei Chang on visual entity reasoning using multimodal LLMs. Prior to that, he received an M.S. in Computer Science from the University of Chicago and a Bachelor’s degree from the University of Melbourne.

Description: Multimodal Large Language Models (MLLMs) have demonstrated state-of-the-art capabilities in various tasks involving both images and text, including visual question answering. However, it remains unclear whether these MLLMs possess the ability to answer information-seeking queries of an image such as ‘When was this church built?’.

In this talk, I will first introduce InfoSeek, a dataset tailored for visual information-seeking questions that cannot be answered using only common sense knowledge. I will then present insights into the generalization and instruction-tuning of MLLMs using InfoSeek. Finally, I will discuss what the future holds for multimodal retrieval models and how MLLMs-powered generative search engines could transform the existing search experiences.

Project page at https://open-vision-language.github.io/infoseek/

Cohere For AI – Community Talks: Yang Chen

Add comment

Cancel reply

Categories

All Topics

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

Kurzweil: AI will be smarter than all humans combined by 2029

The AI Revolution: Will Robots Take Your Job?

NVIDIA GeForce 7950 GX2 – Part 5

NVIDIA GeForce 8800 and nForce 600 – Priceless Video

GeForce GTX 200 GPUs – GAMING BEYOND

Cohere For AI – Community Talks: Yang Chen

You may also like

Add comment

Categories

All Topics