GStars
    BerriAI

    BerriAI/litellm

    Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]

    llm
    ai
    ai-gateway
    anthropic
    azure-openai
    bedrock
    gateway
    langchain
    litellm
    llm-gateway
    llmops
    mcp-gateway
    openai
    openai-proxy
    vertex-ai
    Python
    NOASSERTION
    37.1K stars
    6.0K forks
    37.1K watching
    Updated 2/27/2026
    View on GitHub
    Backblaze Advertisement

    Loading star history...

    Health Score

    75

    Weekly Growth

    +634

    +1.7% this week

    Contributors

    1

    Total contributors

    Open Issues

    1.6K

    Generated Insights

    About litellm

    ๐Ÿš… LiteLLM

    Deploy to Render Deploy on Railway

    Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]

    LiteLLM Proxy Server (LLM Gateway) | Hosted Proxy (Preview) | Enterprise Tier

    PyPI Version Y Combinator W23 Whatsapp Discord Slack

    LiteLLM manages:

    • Translate inputs to provider's completion, embedding, and image_generation endpoints
    • Consistent output, text responses will always be available at ['choices'][0]['message']['content']
    • Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
    • Set Budgets & Rate limits per project, api key, model LiteLLM Proxy Server (LLM Gateway)

    Jump to LiteLLM Proxy (LLM Gateway) Docs
    Jump to Supported LLM Providers

    ๐Ÿšจ Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published. More information about the release cycle here

    Support for more providers. Missing a provider or LLM Platform, raise a feature request.

    Usage (Docs)

    [!IMPORTANT] LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here LiteLLM v1.40.14+ now requires pydantic>=2.0.0. No changes required.

    Open In Colab
    pip install litellm
    
    from litellm import completion
    import os
    
    ## set ENV variables
    os.environ["OPENAI_API_KEY"] = "your-openai-key"
    os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
    
    messages = [{ "content": "Hello, how are you?","role": "user"}]
    
    # openai call
    response = completion(model="openai/gpt-4o", messages=messages)
    
    # anthropic call
    response = completion(model="anthropic/claude-sonnet-4-20250514", messages=messages)
    print(response)
    

    Response (OpenAI Format)

    {
        "id": "chatcmpl-1214900a-6cdd-4148-b663-b5e2f642b4de",
        "created": 1751494488,
        "model": "claude-sonnet-4-20250514",
        "object": "chat.completion",
        "system_fingerprint": null,
        "choices": [
            {
                "finish_reason": "stop",
                "index": 0,
                "message": {
                    "content": "Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?",
                    "role": "assistant",
                    "tool_calls": null,
                    "function_call": null
                }
            }
        ],
        "usage": {
            "completion_tokens": 39,
            "prompt_tokens": 13,
            "total_tokens": 52,
            "completion_tokens_details": null,
            "prompt_tokens_details": {
                "audio_tokens": null,
                "cached_tokens": 0
            },
            "cache_creation_input_tokens": 0,
            "cache_read_input_tokens": 0
        }
    }
    

    Call any model supported by a provider, with model=<provider_name>/<model_name>. There might be provider-specific details here, so refer to provider docs for more information

    Async (Docs)

    from litellm import acompletion
    import asyncio
    
    async def test_get_response():
        user_message = "Hello, how are you?"
        messages = [{"content": user_message, "role": "user"}]
        response = await acompletion(model="openai/gpt-4o", messages=messages)
        return response
    
    response = asyncio.run(test_get_response())
    print(response)
    

    Streaming (Docs)

    liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

    from litellm import completion
    response = completion(model="openai/gpt-4o", messages=messages, stream=True)
    for part in response:
        print(part.choices[0].delta.content or "")
    
    # claude sonnet 4
    response = completion('anthropic/claude-sonnet-4-20250514', messages, stream=True)
    for part in response:
        print(part)
    

    Response chunk (OpenAI Format)

    {
        "id": "chatcmpl-fe575c37-5004-4926-ae5e-bfbc31f356ca",
        "created": 1751494808,
        "model": "claude-sonnet-4-20250514",
        "object": "chat.completion.chunk",
        "system_fingerprint": null,
        "choices": [
            {
                "finish_reason": null,
                "index": 0,
                "delta": {
                    "provider_specific_fields": null,
                    "content": "Hello",
                    "role": "assistant",
                    "function_call": null,
                    "tool_calls": null,
                    "audio": null
                },
                "logprobs": null
            }
        ],
        "provider_specific_fields": null,
        "stream_options": null,
        "citations": null
    }
    

    Logging Observability (Docs)

    LiteLLM exposes pre defined callbacks to send data to Lunary, MLflow, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack

    from litellm import completion
    
    ## set env variables for logging tools (when using MLflow, no API key set up is required)
    os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
    os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
    os.environ["LANGFUSE_PUBLIC_KEY"] = ""
    os.environ["LANGFUSE_SECRET_KEY"] = ""
    os.environ["ATHINA_API_KEY"] = "your-athina-api-key"
    
    os.environ["OPENAI_API_KEY"] = "your-openai-key"
    
    # set callbacks
    litellm.success_callback = ["lunary", "mlflow", "langfuse", "athina", "helicone"] # log input/output to lunary, langfuse, supabase, athina, helicone etc
    
    #openai call
    response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi ๐Ÿ‘‹ - i'm openai"}])
    

    LiteLLM Proxy Server (LLM Gateway) - (Docs)

    Track spend + Load Balance across multiple projects

    Hosted Proxy (Preview)

    The proxy provides:

    1. Hooks for auth
    2. Hooks for logging
    3. Cost tracking
    4. Rate Limiting

    ๐Ÿ“– Proxy Endpoints - Swagger Docs

    Quick Start Proxy - CLI

    pip install 'litellm[proxy]'
    

    Step 1: Start litellm proxy

    $ litellm --model huggingface/bigcode/starcoder
    
    #INFO: Proxy running on http://0.0.0.0:4000
    

    Step 2: Make ChatCompletions Request to Proxy

    [!IMPORTANT] ๐Ÿ’ก Use LiteLLM Proxy with Langchain (Python, JS), OpenAI SDK (Python, JS) Anthropic SDK, Mistral SDK, LlamaIndex, Instructor, Curl

    import openai # openai v1.0.0+
    client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url
    # request sent to model set on litellm proxy, `litellm --model`
    response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ])
    
    print(response)
    

    Proxy Key Management (Docs)

    Connect the proxy with a Postgres DB to create proxy keys

    # Get the code
    git clone https://github.com/BerriAI/litellm
    
    # Go to folder
    cd litellm
    
    # Add the master key - you can change this after setup
    echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
    
    # Add the litellm salt key - you cannot change this after adding a model
    # It is used to encrypt / decrypt your LLM API Key credentials
    # We recommend - https://1password.com/password-generator/
    # password generator to get a random hash for litellm salt key
    echo 'LITELLM_SALT_KEY="sk-1234"' >> .env
    
    source .env
    
    # Start
    docker-compose up
    

    UI on /ui on your proxy server ui_3

    Set budgets and rate limits across multiple projects POST /key/generate

    Request

    curl 'http://0.0.0.0:4000/key/generate' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "[email protected]", "team": "core-infra"}}'
    

    Expected Response

    {
        "key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token
        "expires": "2023-11-19T01:38:25.838000+00:00" # datetime object
    }
    

    Supported Providers (Docs)

    ProviderCompletionStreamingAsync CompletionAsync StreamingAsync EmbeddingAsync Image Generation
    openaiโœ…โœ…โœ…โœ…โœ…โœ…
    Meta - Llama APIโœ…โœ…โœ…โœ…
    azureโœ…โœ…โœ…โœ…โœ…โœ…
    AI/ML APIโœ…โœ…โœ…โœ…โœ…โœ…
    aws - sagemakerโœ…โœ…โœ…โœ…โœ…
    aws - bedrockโœ…โœ…โœ…โœ…โœ…
    google - vertex_aiโœ…โœ…โœ…โœ…โœ…โœ…
    google - palmโœ…โœ…โœ…โœ…
    google AI Studio - geminiโœ…โœ…โœ…โœ…
    mistral ai apiโœ…โœ…โœ…โœ…โœ…
    cloudflare AI Workersโœ…โœ…โœ…โœ…
    cohereโœ…โœ…โœ…โœ…โœ…
    anthropicโœ…โœ…โœ…โœ…
    empowerโœ…โœ…โœ…โœ…
    huggingfaceโœ…โœ…โœ…โœ…โœ…
    replicateโœ…โœ…โœ…โœ…
    together_aiโœ…โœ…โœ…โœ…
    openrouterโœ…โœ…โœ…โœ…
    ai21โœ…โœ…โœ…โœ…
    basetenโœ…โœ…โœ…โœ…
    vllmโœ…โœ…โœ…โœ…
    nlp_cloudโœ…โœ…โœ…โœ…
    aleph alphaโœ…โœ…โœ…โœ…
    petalsโœ…โœ…โœ…โœ…
    ollamaโœ…โœ…โœ…โœ…โœ…
    deepinfraโœ…โœ…โœ…โœ…
    perplexity-aiโœ…โœ…โœ…โœ…
    Groq AIโœ…โœ…โœ…โœ…
    Deepseekโœ…โœ…โœ…โœ…
    anyscaleโœ…โœ…โœ…โœ…
    IBM - watsonx.aiโœ…โœ…โœ…โœ…โœ…
    voyage aiโœ…
    xinference [Xorbits Inference]โœ…
    FriendliAIโœ…โœ…โœ…โœ…
    Galadrielโœ…โœ…โœ…โœ…
    GradientAIโœ…โœ…
    Novita AIโœ…โœ…โœ…โœ…
    Featherless AIโœ…โœ…โœ…โœ…
    Nebius AI Studioโœ…โœ…โœ…โœ…โœ…

    Read the Docs

    Contributing

    Interested in contributing? Contributions to LiteLLM Python SDK, Proxy Server, and LLM integrations are both accepted and highly encouraged!

    Quick start: git clone โ†’ make install-dev โ†’ make format โ†’ make lint โ†’ make test-unit

    See our comprehensive Contributing Guide (CONTRIBUTING.md) for detailed instructions.

    Enterprise

    For companies that need better security, user management and professional support

    Talk to founders

    This covers:

    • โœ… Features under the LiteLLM Commercial License:
    • โœ… Feature Prioritization
    • โœ… Custom Integrations
    • โœ… Professional Support - Dedicated discord + slack
    • โœ… Custom SLAs
    • โœ… Secure access with Single Sign-On

    Contributing

    We welcome contributions to LiteLLM! Whether you're fixing bugs, adding features, or improving documentation, we appreciate your help.

    Quick Start for Contributors

    This requires poetry to be installed.

    git clone https://github.com/BerriAI/litellm.git
    cd litellm
    make install-dev    # Install development dependencies
    make format         # Format your code
    make lint           # Run all linting checks
    make test-unit      # Run unit tests
    make format-check   # Check formatting only
    

    For detailed contributing guidelines, see CONTRIBUTING.md.

    Code Quality / Linting

    LiteLLM follows the Google Python Style Guide.

    Our automated checks include:

    • Black for code formatting
    • Ruff for linting and code quality
    • MyPy for type checking
    • Circular import detection
    • Import safety checks

    All these checks must pass before your PR can be merged.

    Support / talk with founders

    Why did we build this

    • Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

    Contributors

    Run in Developer mode

    Services

    1. Setup .env file in root
    2. Run dependant services docker-compose up db prometheus

    Backend

    1. (In root) create virtual environment python -m venv .venv
    2. Activate virtual environment source .venv/bin/activate
    3. Install dependencies pip install -e ".[all]"
    4. Start proxy backend python3 /path/to/litellm/proxy_cli.py

    Frontend

    1. Navigate to ui/litellm-dashboard
    2. Install dependencies npm install
    3. Run npm run dev to start the dashboard

    Discover Repositories

    Search across tracked repositories by name or description