A necessary a part of delivery software program securely is crimson teaming. It broadly refers back to the observe of emulating real-world adversaries and their instruments, techniques, and procedures to determine dangers, uncover blind spots, validate assumptions, and enhance the general safety posture of techniques. Microsoft has a wealthy historical past of crimson teaming rising know-how with a objective of proactively figuring out failures within the know-how. As AI techniques grew to become extra prevalent, in 2018, Microsoft established the AI Pink Workforce: a gaggle of interdisciplinary specialists devoted to considering like attackers and probing AI techniques for failures.
We’re sharing finest practices from our workforce so others can profit from Microsoft’s learnings. These finest practices may also help safety groups proactively hunt for failures in AI techniques, outline a defense-in-depth method, and create a plan to evolve and develop your safety posture as generative AI techniques evolve.
The observe of AI crimson teaming has developed to tackle a extra expanded which means: it not solely covers probing for safety vulnerabilities, but additionally contains probing for different system failures, such because the era of probably dangerous content material. AI techniques include new dangers, and crimson teaming is core to understanding these novel dangers, comparable to immediate injection and producing ungrounded content material. AI crimson teaming isn’t just a pleasant to have at Microsoft; it’s a cornerstone to accountable AI by design: as Microsoft President and Vice Chair, Brad Smith, introduced, Microsoft just lately dedicated that every one high-risk AI techniques will undergo impartial crimson teaming earlier than deployment.
The objective of this weblog is to contextualize for safety professionals how AI crimson teaming intersects with conventional crimson teaming, and the place it differs. This, we hope, will empower extra organizations to crimson workforce their very own AI techniques in addition to present insights into leveraging their present conventional crimson groups and AI groups higher.
Pink teaming helps make AI implementation safer
Over the past a number of years, Microsoft’s AI Pink Workforce has constantly created and shared content material to empower safety professionals to suppose comprehensively and proactively about implement AI securely. In October 2020, Microsoft collaborated with MITRE in addition to trade and educational companions to develop and launch the Adversarial Machine Studying Menace Matrix, a framework for empowering safety analysts to detect, reply, and remediate threats. Additionally in 2020, we created and open sourced Microsoft Counterfit, an automation device for safety testing AI techniques to assist the entire trade enhance the safety of AI options. Following that, we launched the AI safety threat evaluation framework in 2021 to assist organizations mature their safety practices across the safety of AI techniques, along with updating Counterfit. Earlier this yr, we introduced further collaborations with key companions to assist organizations perceive the dangers related to AI techniques in order that organizations can use them safely, together with the combination of Counterfit into MITRE tooling, and collaborations with Hugging Face on an AI-specific safety scanner that’s out there on GitHub.

Safety-related AI crimson teaming is an element of a bigger accountable AI (RAI) crimson teaming effort that focuses on Microsoft’s AI rules of equity, reliability and security, privateness and safety, inclusiveness, transparency, and accountability. The collective work has had a direct influence on the way in which we ship AI merchandise to our clients. As an example, earlier than the brand new Bing chat expertise was launched, a workforce of dozens of safety and accountable AI specialists throughout the corporate spent lots of of hours probing for novel safety and accountable AI dangers. This was in addition to the common, intensive software program safety practices adopted by the workforce, in addition to crimson teaming the bottom GPT-4 mannequin by RAI specialists prematurely of creating Bing Chat. Our crimson teaming findings knowledgeable the systematic measurement of those dangers and constructed scoped mitigations earlier than the product shipped.
Steerage and sources for crimson teaming
AI crimson teaming usually takes place at two ranges: on the base mannequin degree (e.g., GPT-4) or on the software degree (e.g., Safety Copilot, which makes use of GPT-4 within the again finish). Each ranges carry their very own benefits: for example, crimson teaming the mannequin helps to determine early within the course of how fashions might be misused, to scope capabilities of the mannequin, and to grasp the mannequin’s limitations. These insights might be fed into the mannequin growth course of to enhance future mannequin variations but additionally get a jump-start on which functions it’s most suited to. Software-level AI crimson teaming takes a system view, of which the bottom mannequin is one half. As an example, when AI crimson teaming Bing Chat, the complete search expertise powered by GPT-4 was in scope and was probed for failures. This helps to determine failures past simply the model-level security mechanisms, by together with the general software particular security triggers.

Collectively, probing for each safety and accountable AI dangers gives a single snapshot of how threats and even benign utilization of the system can compromise the integrity, confidentiality, availability, and accountability of AI techniques. This mixed view of safety and accountable AI gives useful insights not simply in proactively figuring out points, but additionally to grasp their prevalence within the system by means of measurement and inform methods for mitigation. Under are key learnings which have helped form Microsoft’s AI Pink Workforce program.
- AI crimson teaming is extra expansive. AI crimson teaming is now an umbrella time period for probing each safety and RAI outcomes. AI crimson teaming intersects with conventional crimson teaming objectives in that the safety part focuses on mannequin as a vector. So, among the objectives might embody, for example, to steal the underlying mannequin. However AI techniques additionally inherit new safety vulnerabilities, comparable to immediate injection and poisoning, which want particular consideration. Along with the safety objectives, AI crimson teaming additionally contains probing for outcomes comparable to equity points (e.g., stereotyping) and dangerous content material (e.g., glorification of violence). AI crimson teaming helps determine these points early so we will prioritize our protection investments appropriately.
- AI crimson teaming focuses on failures from each malicious and benign personas. Take the case of crimson teaming new Bing. Within the new Bing, AI crimson teaming not solely targeted on how a malicious adversary can subvert the AI system through security-focused methods and exploits, but additionally on how the system can generate problematic and dangerous content material when common customers work together with the system. So, not like conventional safety crimson teaming, which largely focuses on solely malicious adversaries, AI crimson teaming considers broader set of personas and failures.
- AI techniques are always evolving. AI functions routinely change. As an example, within the case of a big language mannequin software, builders might change the metaprompt (underlying directions to the ML mannequin) based mostly on suggestions. Whereas conventional software program techniques additionally change, in our expertise, AI techniques change at a quicker charge. Thus, you will need to pursue a number of rounds of crimson teaming of AI techniques and to ascertain systematic, automated measurement and monitor techniques over time.
- Pink teaming generative AI techniques requires a number of makes an attempt. In a standard crimson teaming engagement, utilizing a device or method at two completely different time factors on the identical enter, would all the time produce the identical output. In different phrases, usually, conventional crimson teaming is deterministic. Generative AI techniques, then again, are probabilistic. Because of this working the identical enter twice might present completely different outputs. That is by design as a result of the probabilistic nature of generative AI permits for a wider vary in artistic output. This additionally makes it tough to crimson teaming since a immediate might not result in failure within the first try, however achieve success (in surfacing safety threats or RAI harms) within the succeeding try. A method we’ve accounted for that is, as Brad Smith talked about in his weblog, to pursue a number of rounds of crimson teaming in the identical operation. Microsoft has additionally invested in automation that helps to scale our operations and a systemic measurement technique that quantifies the extent of the danger.
- Mitigating AI failures requires protection in depth. Similar to in conventional safety the place an issue like phishing requires a wide range of technical mitigations comparable to hardening the host to neatly figuring out malicious URIs, fixing failures discovered through AI crimson teaming requires a defense-in-depth method, too. This includes the usage of classifiers to flag probably dangerous content material to utilizing metaprompt to information conduct to limiting conversational drift in conversational situations.
Constructing know-how responsibly and securely is in Microsoft’s DNA. Final yr, Microsoft celebrated the 20-year anniversary of the Reliable Computing memo that requested Microsoft to ship merchandise “as out there, dependable and safe as customary providers comparable to electrical energy, water providers, and telephony.” AI is shaping as much as be probably the most transformational know-how of the twenty first century. And like several new know-how, AI is topic to novel threats. Incomes buyer belief by safeguarding our merchandise stays a tenet as we enter this new period – and the AI Pink Workforce is entrance and heart of this effort. We hope this weblog submit conjures up others to responsibly and safely combine AI through crimson teaming.
Sources
AI crimson teaming is a part of the broader Microsoft technique to ship AI techniques securely and responsibly. Listed here are another sources to offer insights into this course of:
- For purchasers who’re constructing functions utilizing Azure OpenAI fashions, we launched a information to assist them assemble an AI crimson workforce, outline scope and objectives, and execute on the deliverables.
- For safety incident responders, we launched a bug bar to systematically triage assaults on ML techniques.
- For ML engineers, we launched a guidelines to finish AI threat evaluation.
- For builders, we launched risk modeling steerage particularly for ML techniques.
- For anybody interested by studying extra about accountable AI, we’ve launched a model of our Accountable AI Normal and Influence Evaluation, amongst different sources.
- For engineers and policymakers, Microsoft, in collaboration with Berkman Klein Heart at Harvard College, launched a taxonomy documenting numerous machine studying failure modes.
- For the broader safety group, Microsoft hosted the annual Machine Studying Evasion Competitors.
- For Azure Machine Studying clients, we supplied steerage on enterprise safety and governance.
Contributions from Steph Ballard, Forough Poursabzi, Amanda Minnich, Gary Lopez Munoz, and Chang Kawaguchi.