garak — LLM Vulnerability Scanner — GitHub Trending Open Source Project | 2026-06-01
文章目录
- garak (Generative AI Red-teaming & Assessment Kit) is an open-source vulnerability scanner built by NVIDIA, designed specifically to find security weaknesses in Large Language Models. Written in Python, it has accumulated nearly 8,000 GitHub stars and is actively maintained with over 300 open issues and lively community discussions. Think of it as nmap or Metasploit for the LLM world — a systematic framework that automatically probes AI models for hallucination, data leakage, prompt injection, misinformation, toxicity, jailbreaks, and many other failure modes. What makes it stand out from typical benchmark suites is that garak focuses on adversarial testing: it deliberately tries to make an LLM misbehave in ways its developers did not intend. The project has been published with an arXiv paper and was presented at DEF CON. It runs entirely from the command line, supports dozens of LLM providers (OpenAI, Hugging Face, AWS Bedrock, local gguf models via llama.cpp, and more), and produces structured JSONL reports making it easy to integrate into CI/CD pipelines or continuous monitoring workflows.
- Here is the uncomfortable truth: most LLM deployments in production today have never been systematically security-tested. Teams rush to ship AI features, but security red-teaming is often treated as an afterthought. This is especially dangerous because LLMs have a much larger attack surface than traditional software. A model can be manipulated through carefully crafted prompts to leak sensitive system prompts, generate malware code, produce toxic content, or help users bypass safety guardrails. garak addresses this gap in an elegant, automated way. Rather than manually crafting adversarial prompts and hoping you find something interesting, garak provides a pluggable framework of probes — each representing a different attack vector — that can be run against any supported LLM in minutes. The results tell you exactly what percentage of attempts succeeded (i.e., the model misbehaved) and give you the specific prompts that caused the failures. This is invaluable whether you are an AI startup deploying a chat API, an enterprise integrating an LLM into internal tools, or a researcher evaluating a new model before publication. From a developer experience perspective, garak strikes a rare balance: it is powerful enough for security researchers doing advanced red-teaming, yet accessible enough for any Python developer to run their first scan in under five minutes. The architecture is clean and extensible, so if the built-in probes do not cover your specific use case, you can write your own plugin in a few dozen lines of code.
- At its core, garak is a command-line tool that takes two inputs: a generator (which LLM to test) and one or more probes (which attacks to run). Each probe sends a batch of crafted prompts to the model and feeds the responses to detectors that determine whether the output triggered a vulnerability. The tool then generates a structured report showing pass/fail rates for each probe-detector combination. The scope of built-in probes is impressive. It covers encoding-based prompt injection (where text is obfuscated to bypass safety filters), DAN-style jailbreaks (the famous "Do Anything Now" attacks), system prompt extraction (testing whether the model will reveal its system prompt under social engineering), hallucination triggers like package hallucination (generating code that references non-existent insecure packages), malware generation attempts, XSS vectors, and even multi-turn agentic attacks through the new Agent Breaker probe. The project maintains a comprehensive user guide and a Discord community for questions and collaboration.
- 1. Pre-deployment security auditing. Before shipping any LLM-powered feature to real users, run garak against your model to establish a vulnerability baseline. You will get a clear picture of which attack categories your model is most susceptible to, and you can repeat the scan after safety fine-tuning to measure improvement. 2. Evaluating third-party models. If you are considering integrating a new API provider or open-source model, garak lets you benchmark its safety properties in a standardized way. Compare results across models to make an informed decision rather than relying on the vendor's self-reported safety claims. 3. Continuous monitoring in CI/CD. Since garak outputs JSONL reports, you can integrate it into your deployment pipeline. Run a targeted probe suite on every model update; if the failure rate spikes beyond a threshold, fail the deployment and flag the security team for review. 4. Research into LLM safety. Academic researchers can use garak's modular architecture to study specific failure modes, compare attack success rates across model families, or develop and evaluate new defense mechanisms.
- Here is a hands-on quick-start guide based on the official documentation. The example below shows how to scan an OpenAI model for encoding-based prompt injection vulnerabilities — a common real-world attack vector where malicious content is hidden inside encoded text. We assume you have Python 3.10+ installed. # Step 1: Install garak via pip python -m pip install -U garak # Step 2: Set your OpenAI API key export OPENAI_API_KEY="sk-your-key-here" # Step 3: List available probes to find the right one python3 -m garak --list_probes | grep encoding # You will see: garak.probes.encoding (encoding-based injection attacks) # Step 4: Run the encoding probe against your model python3 -m garak \ --target_type openai \ --target_name gpt-4o \ --probes encoding # Step 5: Read the results # garak prints a progress bar during scanning, then shows a table: # Probe | Detector | PASS | FAIL | Total # encoding | av_scanner | 830 | 10 | 840 # ... # The JSONL report is saved to a file like garak_report_20250601.jsonl From personal experience, the first scan against a well-aligned model like GPT-4o usually yields a very low failure rate (often 0-5%) on most probes, which is reassuring. However, running against smaller or less-aligned open-source models often produces surprising results — models that appear safe in normal conversation can be surprisingly vulnerable to specific attack patterns. This is exactly why systematic testing matters. If you prefer testing locally without sending data to OpenAI, you can run garak against a local GGUF model via llama.cpp: # Install llama.cpp and download a model first, then: export GGML_MAIN_PATH="/path/to/llama.cpp/main" export GGML_MODEL_PATH="/path/to/models/llama-7b.q4_0.gguf" python3 -m garak \ --target_type ggml \ --target_name /path/to/model.gguf \ --probes dan
- 1. Pluggable architecture with 30+ built-in probes. From simple blank-prompt tests to sophisticated multi-turn agent attacks, every component (generator, probe, detector, evaluator) is a plugin. You can mix and match, or drop in your own implementation by inheriting from a base class. The system prompt extraction probe (added in recent releases) is particularly noteworthy — it loads real-world system prompts from HuggingFace datasets and tests 25+ extraction attack techniques from published research, including Riley Goodside-style attacks and advanced prompt exfiltration methods. 2. Multi-provider support without vendor lock-in. garak supports OpenAI, Hugging Face Hub, AWS Bedrock, Replicate, Cohere, Groq, NVIDIA NIM, and any REST endpoint — all through a unified generator plugin system. Switching models is a single command-line flag change. This means you can run the exact same security probe against five different models and get comparable, directly comparable results. 3. Real-time and WebSocket testing. A recent major addition (Issue #1379) added a WebSocket generator, enabling garak to test WebSocket-based real-time LLM services. This is crucial because many modern chat applications use WebSocket connections for streaming responses, a protocol that traditional HTTP-based testing tools cannot easily handle. The generator supports multiple authentication methods (Basic Auth, Bearer tokens, custom headers) and even handles typing indicators. 4. Structured JSONL output for pipeline integration. Every scan produces a detailed .jsonl file that can be fed into analysis scripts or loaded into a SIEM/dashboard. The project includes a built-in analysis script that identifies the probes and prompts with the highest hit rates, making it easy to triage findings.
- ⭐ 7,982 GitHub Stars | 📆 Pushed: 2026-05-29 | 👀 Active Issues: 315
- Compared to IBM Adversarial Robustness Toolbox (ART), garak is more focused on LLMs specifically, whereas ART covers a broader range of ML models including image classifiers. ART's LLM coverage is more limited and the tool is less actively developed for new attack techniques. Compared to Microsoft Azure AI Content Safety, garak is open-source, self-hosted, and does not require any cloud subscription — you run it entirely on your own infrastructure, which is essential for testing proprietary models behind a firewall. The trade-off is that garak requires more technical setup, while Azure AI Content Safety offers a managed API with pre-built dashboards and reporting. For teams already using PromptInject or LLM-Fuzzer, garak can complement rather than replace those tools. garak's strength is its breadth: it absorbs techniques from multiple research directions into a single coherent framework, so you do not need to stitch together a dozen separate scripts to get comprehensive coverage.
- Issue #1379 — WebSocket Generator for Real-Time LLM Testing (10 comments)The community has been requesting WebSocket support for months, since many commercial LLM APIs now use WebSocket connections for streaming chat responses. One contributor implemented a full WebSocket generator plugin that handles multiple authentication methods and template-based messaging. A reviewer praised the design for being consistent with the REST generator patterns, making it easy for users to switch between HTTP and WebSocket modes without changing their probe configurations. This feature alone significantly extends garak's applicability to real-world production environments. Issue #1538 — System Prompt Extraction Probe (7 comments)This PR sparked a lively discussion about the ethics and practicality of system prompt extraction. One commenter noted that some commercially deployed models use extremely detailed system prompts to enforce brand voice and safety policies — if those prompts get extracted, competitors could replicate the behavior for free. The implementation loads real-world prompts from HuggingFace datasets and tests over 25 extraction techniques. A reviewer suggested adding a "defense score" that measures not just whether extraction succeeded but how many characters of the system prompt were leaked — a nuanced metric that would help security teams prioritize which prompts to harden. Issue #1628 — Agent Breaker: Multi-Turn Agentic Attack Probe (5 comments)The most cutting-edge addition discussed in the repo. Agent Breaker is a new probe designed specifically for agentic LLM applications — systems that use tools like code execution, database queries, and API calls. The approach uses a red-team model that automatically discovers what tools the target agent has, generates targeted exploits, and adapts its strategy across multiple conversation turns based on what worked and what did not. One commenter observed that this is the kind of attack that will become increasingly relevant as more enterprises deploy AI agents in production — traditional single-turn safety benchmarks completely miss this attack surface.
- Q: garak returns a "generator not found" error even though I installed the package.A: garak uses dynamic module loading, so some optional generators (like the AWS Bedrock or WebSocket generators) require additional Python dependencies. Run python -m pip install garak[bedrock] or python -m pip install garak[websockets] for the extras you need. Check python3 -m garak --list_generators after installation to confirm your target type is available. Q: The scan seems to run forever. Is there a way to limit the number of attempts?A: Yes, use the --generations flag to control how many response generations are produced per prompt (the default is 10). For a quick smoke test, try --generations 2 to get results in seconds instead of minutes. For production benchmarks, use the default of 10 to get statistically meaningful failure rates. Q: My local GGUF model gets killed by the OS during garak scanning. What gives?A: Large models loaded into memory via llama.cpp can exhaust RAM, especially with high --generations values. Try using a smaller quantization (e.g., Q4_K_M instead of Q8_0), reduce --generations, or limit parallel requests with --parallel 1. Also make sure you have at least 8GB of free RAM for 7B models and proportionally more for larger ones.
- garak fills a critical gap in the AI security ecosystem: the systematic, automated red-teaming of LLM deployments. Its pluggable architecture, broad model support, and continuously expanding probe library make it the most comprehensive open-source tool for this purpose. Whether you are a security researcher, an AI startup deploying models at scale, or an enterprise IT team responsible for AI governance, garak gives you the visibility you need into how your models behave under adversarial conditions — before your users (or attackers) discover the vulnerabilities for you. The project is actively developed by NVIDIA's team with strong community contributions, and the Discord channel provides excellent support. If you are serious about LLM security, it belongs in your toolkit alongside your favorite static analyzer and your CI/CD pipeline.
- GitHub Repository — Star, fork, and contribute Official Documentation — Full user guide Discord Community — Join for support and discussion arXiv Paper 🔗 More GitHub Trending Open Source Security Projects
garak (Generative AI Red-teaming & Assessment Kit) is an open-source vulnerability scanner built by NVIDIA, designed specifically to find security weaknesses in Large Language Models. Written in Python, it has accumulated nearly 8,000 GitHub stars and is actively maintained with over 300 open issues and lively community discussions. Think of it as nmap or Metasploit for the LLM world — a systematic framework that automatically probes AI models for hallucination, data leakage, prompt injection, misinformation, toxicity, jailbreaks, and many other failure modes. What makes it stand out from typical benchmark suites is that garak focuses on adversarial testing: it deliberately tries to make an LLM misbehave in ways its developers did not intend.
The project has been published with an arXiv paper and was presented at DEF CON. It runs entirely from the command line, supports dozens of LLM providers (OpenAI, Hugging Face, AWS Bedrock, local gguf models via llama.cpp, and more), and produces structured JSONL reports making it easy to integrate into CI/CD pipelines or continuous monitoring workflows.
Here is the uncomfortable truth: most LLM deployments in production today have never been systematically security-tested. Teams rush to ship AI features, but security red-teaming is often treated as an afterthought. This is especially dangerous because LLMs have a much larger attack surface than traditional software. A model can be manipulated through carefully crafted prompts to leak sensitive system prompts, generate malware code, produce toxic content, or help users bypass safety guardrails.
garak addresses this gap in an elegant, automated way. Rather than manually crafting adversarial prompts and hoping you find something interesting, garak provides a pluggable framework of probes — each representing a different attack vector — that can be run against any supported LLM in minutes. The results tell you exactly what percentage of attempts succeeded (i.e., the model misbehaved) and give you the specific prompts that caused the failures. This is invaluable whether you are an AI startup deploying a chat API, an enterprise integrating an LLM into internal tools, or a researcher evaluating a new model before publication.
From a developer experience perspective, garak strikes a rare balance: it is powerful enough for security researchers doing advanced red-teaming, yet accessible enough for any Python developer to run their first scan in under five minutes. The architecture is clean and extensible, so if the built-in probes do not cover your specific use case, you can write your own plugin in a few dozen lines of code.
At its core, garak is a command-line tool that takes two inputs: a generator (which LLM to test) and one or more probes (which attacks to run). Each probe sends a batch of crafted prompts to the model and feeds the responses to detectors that determine whether the output triggered a vulnerability. The tool then generates a structured report showing pass/fail rates for each probe-detector combination.
The scope of built-in probes is impressive. It covers encoding-based prompt injection (where text is obfuscated to bypass safety filters), DAN-style jailbreaks (the famous "Do Anything Now" attacks), system prompt extraction (testing whether the model will reveal its system prompt under social engineering), hallucination triggers like package hallucination (generating code that references non-existent insecure packages), malware generation attempts, XSS vectors, and even multi-turn agentic attacks through the new Agent Breaker probe. The project maintains a comprehensive user guide and a Discord community for questions and collaboration.
1. Pre-deployment security auditing. Before shipping any LLM-powered feature to real users, run garak against your model to establish a vulnerability baseline. You will get a clear picture of which attack categories your model is most susceptible to, and you can repeat the scan after safety fine-tuning to measure improvement.
2. Evaluating third-party models. If you are considering integrating a new API provider or open-source model, garak lets you benchmark its safety properties in a standardized way. Compare results across models to make an informed decision rather than relying on the vendor's self-reported safety claims.
3. Continuous monitoring in CI/CD. Since garak outputs JSONL reports, you can integrate it into your deployment pipeline. Run a targeted probe suite on every model update; if the failure rate spikes beyond a threshold, fail the deployment and flag the security team for review.
4. Research into LLM safety. Academic researchers can use garak's modular architecture to study specific failure modes, compare attack success rates across model families, or develop and evaluate new defense mechanisms.
Here is a hands-on quick-start guide based on the official documentation. The example below shows how to scan an OpenAI model for encoding-based prompt injection vulnerabilities — a common real-world attack vector where malicious content is hidden inside encoded text. We assume you have Python 3.10+ installed.
# Step 1: Install garak via pip
python -m pip install -U garak
# Step 2: Set your OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"
# Step 3: List available probes to find the right one
python3 -m garak --list_probes | grep encoding
# You will see: garak.probes.encoding (encoding-based injection attacks)
# Step 4: Run the encoding probe against your model
python3 -m garak \
--target_type openai \
--target_name gpt-4o \
--probes encoding
# Step 5: Read the results
# garak prints a progress bar during scanning, then shows a table:
# Probe | Detector | PASS | FAIL | Total
# encoding | av_scanner | 830 | 10 | 840
# ...
# The JSONL report is saved to a file like garak_report_20250601.jsonl
From personal experience, the first scan against a well-aligned model like GPT-4o usually yields a very low failure rate (often 0-5%) on most probes, which is reassuring. However, running against smaller or less-aligned open-source models often produces surprising results — models that appear safe in normal conversation can be surprisingly vulnerable to specific attack patterns. This is exactly why systematic testing matters.
If you prefer testing locally without sending data to OpenAI, you can run garak against a local GGUF model via llama.cpp:
# Install llama.cpp and download a model first, then:
export GGML_MAIN_PATH="/path/to/llama.cpp/main"
export GGML_MODEL_PATH="/path/to/models/llama-7b.q4_0.gguf"
python3 -m garak \
--target_type ggml \
--target_name /path/to/model.gguf \
--probes dan
1. Pluggable architecture with 30+ built-in probes. From simple blank-prompt tests to sophisticated multi-turn agent attacks, every component (generator, probe, detector, evaluator) is a plugin. You can mix and match, or drop in your own implementation by inheriting from a base class. The system prompt extraction probe (added in recent releases) is particularly noteworthy — it loads real-world system prompts from HuggingFace datasets and tests 25+ extraction attack techniques from published research, including Riley Goodside-style attacks and advanced prompt exfiltration methods.
2. Multi-provider support without vendor lock-in. garak supports OpenAI, Hugging Face Hub, AWS Bedrock, Replicate, Cohere, Groq, NVIDIA NIM, and any REST endpoint — all through a unified generator plugin system. Switching models is a single command-line flag change. This means you can run the exact same security probe against five different models and get comparable, directly comparable results.
3. Real-time and WebSocket testing. A recent major addition (Issue #1379) added a WebSocket generator, enabling garak to test WebSocket-based real-time LLM services. This is crucial because many modern chat applications use WebSocket connections for streaming responses, a protocol that traditional HTTP-based testing tools cannot easily handle. The generator supports multiple authentication methods (Basic Auth, Bearer tokens, custom headers) and even handles typing indicators.
4. Structured JSONL output for pipeline integration. Every scan produces a detailed .jsonl file that can be fed into analysis scripts or loaded into a SIEM/dashboard. The project includes a built-in analysis script that identifies the probes and prompts with the highest hit rates, making it easy to triage findings.
⭐ 7,982 GitHub Stars | 📆 Pushed: 2026-05-29 | 👀 Active Issues: 315
Compared to IBM Adversarial Robustness Toolbox (ART), garak is more focused on LLMs specifically, whereas ART covers a broader range of ML models including image classifiers. ART's LLM coverage is more limited and the tool is less actively developed for new attack techniques. Compared to Microsoft Azure AI Content Safety, garak is open-source, self-hosted, and does not require any cloud subscription — you run it entirely on your own infrastructure, which is essential for testing proprietary models behind a firewall. The trade-off is that garak requires more technical setup, while Azure AI Content Safety offers a managed API with pre-built dashboards and reporting.
For teams already using PromptInject or LLM-Fuzzer, garak can complement rather than replace those tools. garak's strength is its breadth: it absorbs techniques from multiple research directions into a single coherent framework, so you do not need to stitch together a dozen separate scripts to get comprehensive coverage.
Issue #1379 — WebSocket Generator for Real-Time LLM Testing (10 comments)
The community has been requesting WebSocket support for months, since many commercial LLM APIs now use WebSocket connections for streaming chat responses. One contributor implemented a full WebSocket generator plugin that handles multiple authentication methods and template-based messaging. A reviewer praised the design for being consistent with the REST generator patterns, making it easy for users to switch between HTTP and WebSocket modes without changing their probe configurations. This feature alone significantly extends garak's applicability to real-world production environments.
Issue #1538 — System Prompt Extraction Probe (7 comments)
This PR sparked a lively discussion about the ethics and practicality of system prompt extraction. One commenter noted that some commercially deployed models use extremely detailed system prompts to enforce brand voice and safety policies — if those prompts get extracted, competitors could replicate the behavior for free. The implementation loads real-world prompts from HuggingFace datasets and tests over 25 extraction techniques. A reviewer suggested adding a "defense score" that measures not just whether extraction succeeded but how many characters of the system prompt were leaked — a nuanced metric that would help security teams prioritize which prompts to harden.
Issue #1628 — Agent Breaker: Multi-Turn Agentic Attack Probe (5 comments)
The most cutting-edge addition discussed in the repo. Agent Breaker is a new probe designed specifically for agentic LLM applications — systems that use tools like code execution, database queries, and API calls. The approach uses a red-team model that automatically discovers what tools the target agent has, generates targeted exploits, and adapts its strategy across multiple conversation turns based on what worked and what did not. One commenter observed that this is the kind of attack that will become increasingly relevant as more enterprises deploy AI agents in production — traditional single-turn safety benchmarks completely miss this attack surface.
Q: garak returns a "generator not found" error even though I installed the package.
A: garak uses dynamic module loading, so some optional generators (like the AWS Bedrock or WebSocket generators) require additional Python dependencies. Run python -m pip install garak[bedrock] or python -m pip install garak[websockets] for the extras you need. Check python3 -m garak --list_generators after installation to confirm your target type is available.
Q: The scan seems to run forever. Is there a way to limit the number of attempts?
A: Yes, use the --generations flag to control how many response generations are produced per prompt (the default is 10). For a quick smoke test, try --generations 2 to get results in seconds instead of minutes. For production benchmarks, use the default of 10 to get statistically meaningful failure rates.
Q: My local GGUF model gets killed by the OS during garak scanning. What gives?
A: Large models loaded into memory via llama.cpp can exhaust RAM, especially with high --generations values. Try using a smaller quantization (e.g., Q4_K_M instead of Q8_0), reduce --generations, or limit parallel requests with --parallel 1. Also make sure you have at least 8GB of free RAM for 7B models and proportionally more for larger ones.
garak fills a critical gap in the AI security ecosystem: the systematic, automated red-teaming of LLM deployments. Its pluggable architecture, broad model support, and continuously expanding probe library make it the most comprehensive open-source tool for this purpose. Whether you are a security researcher, an AI startup deploying models at scale, or an enterprise IT team responsible for AI governance, garak gives you the visibility you need into how your models behave under adversarial conditions — before your users (or attackers) discover the vulnerabilities for you.
The project is actively developed by NVIDIA's team with strong community contributions, and the Discord channel provides excellent support. If you are serious about LLM security, it belongs in your toolkit alongside your favorite static analyzer and your CI/CD pipeline.