KubeAI is used for deploying LLMs with an OpenAI compatible endpoint. Admins can configure ML models via kind: Model Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator that manages vLLM and Ollama servers.

In this tutorial you will learn how to deploy KubeAI and Langtrace end-to-end. Both KubeAI and Langtrace are installed in your Kubernetes cluster. No cloud services or external dependencies are required.

  1. If you don’t have a K8s cluster yet, you can create one using kind or minikube.
kind create cluster # OR: minikube start
  1. Install Langtrace:
helm repo add langtrace https://Scale3-Labs.github.io/langtrace-helm-chart
helm repo update
helm install langtrace langtrace/langtrace
  1. Install KubeAI:
helm repo add kubeai https://substratusai.github.io/kubeai/
helm repo update
cat <<EOF > helm-values.yaml
models:
  catalog:
    gemma2-2b-cpu:
      enabled: true
      minReplicas: 1
EOF

helm upgrade --install kubeai kubeai/kubeai \
    --wait --timeout 10m \
    -f ./helm-values.yaml
  1. Create a local Python environment and install dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install langtrace-python-sdk openai
  1. Expose the KubeAI service to your local port:
kubectl port-forward service/kubeai 8000:80
  1. Expose the Langtrace service to your local port:
kubectl port-forward service/langtrace-app 3000:3000
  1. A Langtrace API key is required to use the Langtrace SDK. So lets get one by visiting your self hosted Langtace UI. Open your browser to http://localhost:3000, create a project and get the API keys for your langtrace project. In the Python script below, replace langtrace_api_key with your API key.

  2. Create file named langtrace-example.py with the following content:

# Replace this with your langtrace API key by visiting http://localhost:3000
langtrace_api_key="f7e003de19b9a628258531c17c264002e985604ca9fa561debcc85c41f357b09"

from langtrace_python_sdk import langtrace
from langtrace_python_sdk.utils.with_root_span import with_langtrace_root_span
# Paste this code after your langtrace init function

from openai import OpenAI

langtrace.init(
    api_key=api_key,
    api_host="http://localhost:3000/api/trace",
)

base_url = "http://localhost:8000/openai/v1"
model = "gemma2-2b-cpu"

@with_langtrace_root_span()
def example():
    client = OpenAI(base_url=base_url, api_key="ignored-by-kubeai")
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "How many states of matter are there?"
            }
        ],
    )
    print(response.choices[0].message.content)

example()
  1. Run the Python script:
python3 langtrace-example.py
  1. Now you should see the trace in your Langtrace UI. Take a look by visiting http://localhost:3000.