Background

A year ago,

our research group embarked on a groundbreaking journey, publishing the first-ever automated jailbreaking method [link to NYT]. This breakthrough was so significant that it earned the moniker “the mother of all jailbreaks” [link to X]. In its wake, adversarial robustness garnered renewed attention, sparking a gold rush of research dedicated to both jailbreaking and defense. We are immensely proud to have been at the forefront of accelerating awareness and research in the realm of AI safety and security.

Yet,

despite our advancements, the road ahead remains fraught with challenges. Alarmingly, these vulnerabilities are far from being resolved. Almost daily, new jailbreaks are reported on the latest commercial models from tech giants like OpenAI and Google. This raises a critical question: Is it even possible to deploy these systems with a high standard of safety and reliability, especially when faced with determined adversaries intent on exploiting them?

Enter Gray Swan AI.

Over the past year, we’ve poured our efforts into developing a solution that we believe may be unparalleled. We are proud to present what could be the world’s most secure and safest Large Language Model (LLM), a model that not only outshines industry competitors but also sets a new standard for security. Our internal evaluations show that we've achieved a remarkable balance between standard performance and security, outperforming models like LLaMA-3-8B in benchmarks, while avoiding the over-refusal issues that plague other models [link to Cygnet].

Now,

we invite the community to test our claims in the ultimate jailbreaking challenge. This will be the first in a series of events aimed at identifying the world's most secure LLM. Along the way, we'll also be tracking the most ingenious hackers of the AI era.

Are you ready to take on the challenge?