Cohere For AI – Community Talks: Andy Zou

August 18, 2023

8 views

2 min read

Cinema Mode

Join the Regional Asia Group as they host Andy Zou to present:
“Universal and Transferable Adversarial Attacks on Aligned Language Modelsproject page” https://llm-attacks.org/abs: https://arxiv.org/abs/2307.15043

Abstract: Because “out-of-the-box” large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures — so-called “jailbreaks” against LLMs — these attacks have required significant human ingenuity and are brittle in practice. In this paper, we propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer). However, instead of relying on manual engineering, our approach automatically produces these adversarial suffixes by a combination of greedy and gradient-based search techniques, and also improves over past automatic prompt generation methods.

Speaker Introduction: Andy Zou is a first-year PhD student in the Computer Science Department at CMU, co-founder of safe.ai, advised by Zico Kolter and Matt Fredrikson. He is interested in AI Safety.He has completed his MS and BS from UC Berkeley where he was advised by Dawn Song and Jacob Steinhardt

Cohere For AI – Community Talks: Andy Zou

Add comment

Cancel reply

Categories

All Topics

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

Kurzweil: AI will be smarter than all humans combined by 2029

The AI Revolution: Will Robots Take Your Job?

Artificial Intelligence | 60 Minutes Full Episodes

The A.I. Dilemma – March 9, 2023

In the Age of AI (full documentary) | FRONTLINE

Cohere For AI – Community Talks: Andy Zou

You may also like

Add comment

Categories

All Topics