On August 8, artificial intelligence company Anthropic announced the launch of an enhanced bug bounty program, offering rewards up to $15,000 for those who can successfully “jailbreak” its unreleased “next generation” AI model.
Anthropic’s leading AI model, Claude-3, is a generative AI system akin to OpenAI’s ChatGPT and Google’s Gemini. To ensure that Claude and its other models operate safely, the company employs a practice known as “red teaming.”
**Red Teaming**
Red teaming involves deliberately attempting to breach a system. In the context of Claude, the goal is to identify all the potential ways the model can be manipulated to produce undesired outputs. During these red teaming activities, engineers might rephrase inquiries or alter a question’s context to mislead the AI into revealing information it is designed to withhold.
For instance, an AI trained on internet-sourced data may inadvertently include personally identifiable information about individuals. To safeguard against this, Anthropic has established protocols that prevent Claude and its other models from disclosing such sensitive data.
As AI models grow increasingly sophisticated and adept at mimicking human conversation, the challenge of anticipating every potential unwanted output becomes vastly more complex.
**Bug Bounty**
Anthropic has introduced several innovative safety measures within its models, including a “Constitutional AI” framework, but gaining external perspectives on persistent challenges is always beneficial. According to a blog post from the company, this latest initiative will broaden its existing bug bounty programs to focus specifically on universal jailbreak attacks.
The program will accept a limited number of participants, prioritizing AI researchers with relevant experience, particularly those who have previously shown proficiency in identifying jailbreaks in language models. Applications are due by Friday, August 16.
Not every applicant will be chosen, but the company intends to widen this initiative in the future. Selected participants will gain early access to the unreleased “next generation” AI model for red teaming activities.
**Related:**
Tech companies send letter to the EU requesting additional time to comply with the AI Act.