BerriAI

    BerriAI/litellm

    #213 this week

    Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]

    llm
    ai
    ai-gateway
    anthropic
    azure-openai
    bedrock
    gateway
    langchain
    litellm
    llm-gateway
    llmops
    mcp-gateway
    openai
    openai-proxy
    vertex-ai
    Python
    NOASSERTION
    45.6K stars
    7.8K forks
    45.6K watching
    Updated 5/4/2026
    View on GitHub

    Genblaze — open-source SDK for generative multimedia pipelines

    Orchestrate AI video, audio & image providers in Python with provenance built into every output.

    BackblazeLearn more

    Loading star history...

    Health Score

    75

    Activity
    2
    Community
    25
    Maintenance
    40
    Last release3d ago

    Weekly Growth

    +0

    +0.0% this week

    Contributors

    386

    Total contributors

    Open Issues

    2.8K

    Use Cases & Benefits

    About litellm

    🚅 LiteLLM

    Deploy to Render Deploy on Railway

    Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]

    LiteLLM Proxy Server (LLM Gateway) | Hosted Proxy (Preview) | Enterprise Tier

    PyPI Version Y Combinator W23 Whatsapp Discord Slack

    LiteLLM manages:

    • Translate inputs to provider's completion, embedding, and image_generation endpoints
    • Consistent output, text responses will always be available at ['choices'][0]['message']['content']
    • Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
    • Set Budgets & Rate limits per project, api key, model LiteLLM Proxy Server (LLM Gateway)

    Jump to LiteLLM Proxy (LLM Gateway) Docs
    Jump to Supported LLM Providers

    🚨 Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published. More information about the release cycle here

    Support for more providers. Missing a provider or LLM Platform, raise a feature request.

    Usage (Docs)

    [!IMPORTANT] LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here LiteLLM v1.40.14+ now requires pydantic>=2.0.0. No changes required.

    Open In Colab
    pip install litellm
    
    from litellm import completion
    import os
    
    ## set ENV variables
    os.environ["OPENAI_API_KEY"] = "your-openai-key"
    os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
    
    messages = [{ "content": "Hello, how are you?","role": "user"}]
    
    # openai call
    response = completion(model="openai/gpt-4o", messages=messages)
    
    # anthropic call
    response = completion(model="anthropic/claude-sonnet-4-20250514", messages=messages)
    print(response)
    

    Response (OpenAI Format)

    {
        "id": "chatcmpl-1214900a-6cdd-4148-b663-b5e2f642b4de",
        "created": 1751494488,
        "model": "claude-sonnet-4-20250514",
        "object": "chat.completion",
        "system_fingerprint": null,
        "choices": [
            {
                "finish_reason": "stop",
                "index": 0,
                "message": {
                    "content": "Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?",
                    "role": "assistant",
                    "tool_calls": null,
                    "function_call": null
                }
            }
        ],
        "usage": {
            "completion_tokens": 39,
            "prompt_tokens": 13,
            "total_tokens": 52,
            "completion_tokens_details": null,
            "prompt_tokens_details": {
                "audio_tokens": null,
                "cached_tokens": 0
            },
            "cache_creation_input_tokens": 0,
            "cache_read_input_tokens": 0
        }
    }
    

    Call any model supported by a provider, with model=<provider_name>/<model_name>. There might be provider-specific details here, so refer to provider docs for more information

    Async (Docs)

    from litellm import acompletion
    import asyncio
    
    async def test_get_response():
        user_message = "Hello, how are you?"
        messages = [{"content": user_message, "role": "user"}]
        response = await acompletion(model="openai/gpt-4o", messages=messages)
        return response
    
    response = asyncio.run(test_get_response())
    print(response)
    

    Streaming (Docs)

    liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

    from litellm import completion
    response = completion(model="openai/gpt-4o", messages=messages, stream=True)
    for part in response:
        print(part.choices[0].delta.content or "")
    
    # claude sonnet 4
    response = completion('anthropic/claude-sonnet-4-20250514', messages, stream=True)
    for part in response:
        print(part)
    

    Response chunk (OpenAI Format)

    {
        "id": "chatcmpl-fe575c37-5004-4926-ae5e-bfbc31f356ca",
        "created": 1751494808,
        "model": "claude-sonnet-4-20250514",
        "object": "chat.completion.chunk",
        "system_fingerprint": null,
        "choices": [
            {
                "finish_reason": null,
                "index": 0,
                "delta": {
                    "provider_specific_fields": null,
                    "content": "Hello",
                    "role": "assistant",
                    "function_call": null,
                    "tool_calls": null,
                    "audio": null
                },
                "logprobs": null
            }
        ],
        "provider_specific_fields": null,
        "stream_options": null,
        "citations": null
    }
    

    Logging Observability (Docs)

    LiteLLM exposes pre defined callbacks to send data to Lunary, MLflow, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack

    from litellm import completion
    
    ## set env variables for logging tools (when using MLflow, no API key set up is required)
    os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
    os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
    os.environ["LANGFUSE_PUBLIC_KEY"] = ""
    os.environ["LANGFUSE_SECRET_KEY"] = ""
    os.environ["ATHINA_API_KEY"] = "your-athina-api-key"
    
    os.environ["OPENAI_API_KEY"] = "your-openai-key"
    
    # set callbacks
    litellm.success_callback = ["lunary", "mlflow", "langfuse", "athina", "helicone"] # log input/output to lunary, langfuse, supabase, athina, helicone etc
    
    #openai call
    response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
    

    LiteLLM Proxy Server (LLM Gateway) - (Docs)

    Track spend + Load Balance across multiple projects

    Hosted Proxy (Preview)

    The proxy provides:

    1. Hooks for auth
    2. Hooks for logging
    3. Cost tracking
    4. Rate Limiting

    📖 Proxy Endpoints - Swagger Docs

    Quick Start Proxy - CLI

    pip install 'litellm[proxy]'
    

    Step 1: Start litellm proxy

    $ litellm --model huggingface/bigcode/starcoder
    
    #INFO: Proxy running on http://0.0.0.0:4000
    

    Step 2: Make ChatCompletions Request to Proxy

    [!IMPORTANT] 💡 Use LiteLLM Proxy with Langchain (Python, JS), OpenAI SDK (Python, JS) Anthropic SDK, Mistral SDK, LlamaIndex, Instructor, Curl

    import openai # openai v1.0.0+
    client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url
    # request sent to model set on litellm proxy, `litellm --model`
    response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ])
    
    print(response)
    

    Proxy Key Management (Docs)

    Connect the proxy with a Postgres DB to create proxy keys

    # Get the code
    git clone https://github.com/BerriAI/litellm
    
    # Go to folder
    cd litellm
    
    # Add the master key - you can change this after setup
    echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
    
    # Add the litellm salt key - you cannot change this after adding a model
    # It is used to encrypt / decrypt your LLM API Key credentials
    # We recommend - https://1password.com/password-generator/
    # password generator to get a random hash for litellm salt key
    echo 'LITELLM_SALT_KEY="sk-1234"' >> .env
    
    source .env
    
    # Start
    docker-compose up
    

    UI on /ui on your proxy server ui_3

    Set budgets and rate limits across multiple projects POST /key/generate

    Request

    curl 'http://0.0.0.0:4000/key/generate' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "[email protected]", "team": "core-infra"}}'
    

    Expected Response

    {
        "key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token
        "expires": "2023-11-19T01:38:25.838000+00:00" # datetime object
    }
    

    Supported Providers (Docs)

    ProviderCompletionStreamingAsync CompletionAsync StreamingAsync EmbeddingAsync Image Generation
    openai
    Meta - Llama API
    azure
    AI/ML API
    aws - sagemaker
    aws - bedrock
    google - vertex_ai
    google - palm
    google AI Studio - gemini
    mistral ai api
    cloudflare AI Workers
    cohere
    anthropic
    empower
    huggingface
    replicate
    together_ai
    openrouter
    ai21
    baseten
    vllm
    nlp_cloud
    aleph alpha
    petals
    ollama
    deepinfra
    perplexity-ai
    Groq AI
    Deepseek
    anyscale
    IBM - watsonx.ai
    voyage ai
    xinference [Xorbits Inference]
    FriendliAI
    Galadriel
    GradientAI
    Novita AI
    Featherless AI
    Nebius AI Studio

    Read the Docs

    Contributing

    Interested in contributing? Contributions to LiteLLM Python SDK, Proxy Server, and LLM integrations are both accepted and highly encouraged!

    Quick start: git clonemake install-devmake formatmake lintmake test-unit

    See our comprehensive Contributing Guide (CONTRIBUTING.md) for detailed instructions.

    Enterprise

    For companies that need better security, user management and professional support

    Talk to founders

    This covers:

    • Features under the LiteLLM Commercial License:
    • Feature Prioritization
    • Custom Integrations
    • Professional Support - Dedicated discord + slack
    • Custom SLAs
    • Secure access with Single Sign-On

    Contributing

    We welcome contributions to LiteLLM! Whether you're fixing bugs, adding features, or improving documentation, we appreciate your help.

    Quick Start for Contributors

    This requires poetry to be installed.

    git clone https://github.com/BerriAI/litellm.git
    cd litellm
    make install-dev    # Install development dependencies
    make format         # Format your code
    make lint           # Run all linting checks
    make test-unit      # Run unit tests
    make format-check   # Check formatting only
    

    For detailed contributing guidelines, see CONTRIBUTING.md.

    Code Quality / Linting

    LiteLLM follows the Google Python Style Guide.

    Our automated checks include:

    • Black for code formatting
    • Ruff for linting and code quality
    • MyPy for type checking
    • Circular import detection
    • Import safety checks

    All these checks must pass before your PR can be merged.

    Support / talk with founders

    Why did we build this

    • Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

    Contributors

    Run in Developer mode

    Services

    1. Setup .env file in root
    2. Run dependant services docker-compose up db prometheus

    Backend

    1. (In root) create virtual environment python -m venv .venv
    2. Activate virtual environment source .venv/bin/activate
    3. Install dependencies pip install -e ".[all]"
    4. Start proxy backend python3 /path/to/litellm/proxy_cli.py

    Frontend

    1. Navigate to ui/litellm-dashboard
    2. Install dependencies npm install
    3. Run npm run dev to start the dashboard

    Discover Repositories

    Search across tracked repositories by name or description