WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

WebVoyager is a new vision-powered web-browsing agent that uses browser screenshots and “Set-of-mark” prompting to conduct research, analyze images, and perform other tasks.

In this video, we will show you how to build WebVoyager using LangGraph, an open-source framework for building stateful, multi-actor AI applications.

Links:

– Python Code: https://github.com/langchain-ai/langgraph/blob/main/examples/web-navigation/web_voyager.ipynb
– WebVoyager Paper: https://arxiv.org/abs/2401.13919
– Set-of-mark Paper: https://arxiv.org/abs/2310.11441

Developing AI applications is easier with LangSmith. Create a free account at https://smith.langchain.com/.

New to LangGraph? Check out the intro video: https://www.youtube.com/watch?v=5h-JBkySK34

Add comment

Your email address will not be published. Required fields are marked *

Categories

All Topics