promptfoo

    promptfoo/promptfoo

    Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

    devops
    llm
    testing
    ci
    ci-cd
    cicd
    evaluation
    evaluation-framework
    llm-eval
    llm-evaluation
    llm-evaluation-framework
    llmops
    pentesting
    prompt-engineering
    prompt-testing
    prompts
    rag
    red-teaming
    vulnerability-scanners
    TypeScript
    MIT
    19.9K stars
    1.7K forks
    19.9K watching
    Updated 4/14/2026
    View on GitHub
    Backblaze Advertisement

    Loading star history...

    Health Score

    75

    Weekly Growth

    +0

    +0.0% this week

    Contributors

    1

    Total contributors

    Open Issues

    309

    Generated Insights

    About promptfoo

    Promptfoo: LLM evals & red teaming

    npm npm GitHub Workflow Status MIT license Discord

    promptfoo is a developer-friendly local tool for testing LLM applications. Stop the trial-and-error approach - start shipping secure, reliable AI apps.

    Website · Getting Started · Red Teaming · Documentation · Discord

    Quick Start

    # Install and initialize project
    npx promptfoo@latest init
    
    # Run your first evaluation
    npx promptfoo eval
    

    See Getting Started (evals) or Red Teaming (vulnerability scanning) for more.

    What can you do with Promptfoo?

    • Test your prompts and models with automated evaluations
    • Secure your LLM apps with red teaming and vulnerability scanning
    • Compare models side-by-side (OpenAI, Anthropic, Azure, Bedrock, Ollama, and more)
    • Automate checks in CI/CD
    • Review pull requests for LLM-related security and compliance issues with code scanning
    • Share results with your team

    Here's what it looks like in action:

    prompt evaluation matrix - web viewer

    It works on the command line too:

    prompt evaluation matrix - command line

    It also can generate security vulnerability reports:

    gen ai red team

    Why Promptfoo?

    • 🚀 Developer-first: Fast, with features like live reload and caching
    • 🔒 Private: LLM evals run 100% locally - your prompts never leave your machine
    • 🔧 Flexible: Works with any LLM API or programming language
    • 💪 Battle-tested: Powers LLM apps serving 10M+ users in production
    • 📊 Data-driven: Make decisions based on metrics, not gut feel
    • 🤝 Open source: MIT licensed, with an active community

    Learn More

    Contributing

    We welcome contributions! Check out our contributing guide to get started.

    Join our Discord community for help and discussion.

    Discover Repositories

    Search across tracked repositories by name or description