DSPy

DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. Langtrace has first class support for DSPy, allowing you to capture traces from your DSPy pipelines or agents automatically and analyze them in Langtrace. You can also track experiments and the corresponding metrics and evaluations if you are running DSPy experiments.

Setup

Follow the DSPy installation guide to install DSPy.
Install Langtrace’s SDK and initialize the SDK in your code.
Create a project on Langtrace with type DSPy.
Run your DSPy pipeline or agent and view the traces in Langtrace.
To run experiments, follow the conventions below and head over to the Experiments tab in Langtrace to view your experiments.

Important: Follow the steps below for running experiments

For experiments to show up, pass the following additional attributes using the inject_additional_attributes. This way Langtrace knows that you are running an experiment:

(Required) experiment - Experiment name. Ex: experiment 1.
(Optional) description - Some useful description about the experiment.
(Optional) run_id - When you want to associate traces to a specific runs, pass a unique run ID. This is useful when you are running Evaluate() as part of your experiment where the traces specific to the Evaluate() will appear as an individual entry.

from langtrace_python_sdk import inject_additional_attributes

optimized_program = inject_additional_attributes(lambda: optimizer.compile(
    RAG(),
    trainset=trainset,
    max_bootstrapped_demos=3,
    max_labeled_demos=4,
    num_trials=15,
    minibatch_size=2,
    minibatch_full_eval_steps=10,
    minibatch=True,
    requires_permission_to_run=False,
), {'experiment': 'experiment 1', 'description': 'some useful description', 'run_id': 'run_1'})

predictor = inject_additional_attributes(lambda: optimized_program(my_question), {'experiment': 'experiment 1', 'description': 'some useful description', 'run_id': 'run_1'})

Note: The Eval Chart will appear when you run dspy’s Evaluate(). Note: Currently the score ranges it supports are between 0 and 100. So if you have scores that do not fall within this range, it could cause some UI issues.

Checkpoints

By default, checkpoints are traced for DSPy pipelines. If you would like to disable it, set the following env var, TRACE_DSPY_CHECKPOINT=false

Checkpoint tracing will increase the latency of executions as the state is serialized. Please disable it in production.

Project Type

When creating a project in Langtrace, select the project type as DSPy. traces

Inference Metrics

Evaluation Scores

Troubleshooting

Missing LLM Calls in Traces

If you’re not seeing LLM calls in your Langtrace traces when using DSPy, consider the following:

DSPy Caching

Grouping of Spans in Trace when using ThreadpoolExecutor in DSPy

If you’re using ThreadpoolExecutor in DSPy to parallelize your modules, you may notice that the spans in the trace are not grouped together. This is because the spans are created in the same thread and are not propagated to the parent span. To resolve this issue,

Span Grouping for ThreadpoolExecutor

Please use the provided code snippet as an example to ensure that the current tracing context is properly propagated when executing tasks within a ThreadPoolExecutor. The contextvars module is used to copy the current context and run the optimized CoT module within the copied context. By passing contextvars.copy_context().run to the executor.submit method, the current tracing context is propagated to the child span ensuring that the spans are grouped together in the trace.

import dspy
import contextvars
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric
from dspy.teleprompt import BootstrapFewShot
from concurrent.futures import ThreadPoolExecutor

# flake8: noqa
from langtrace_python_sdk import langtrace, with_langtrace_root_span

langtrace.init(api_key="<your-langtrace-api-key>")

turbo = dspy.OpenAI(model="gpt-3.5-turbo", max_tokens=250)
dspy.settings.configure(lm=turbo)

# Load math questions from the GSM8K dataset
gsm8k = GSM8K()
gsm8k_trainset, gsm8k_devset = gsm8k.train[:10], gsm8k.dev[:10]

class CoT(dspy.Module):
  def __init__(self):
      super().__init__()
      self.prog = dspy.ChainOfThought("question -> answer")

  def forward(self, question):
      return self.prog(question=question)

@with_langtrace_root_span(name="parallel_example")
def example():
  # Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 4-shot examples of our CoT program.
  config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)

  # Optimize! Use the `gsm8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing.
  teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config)
  optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset)

  questions = [
      "What is the sine of 0?",
      "What is the tangent of 100?",
  ]

  with ThreadPoolExecutor(max_workers=2) as executor:
      futures = [executor.submit(contextvars.copy_context().run, optimized_cot, question=q) for q in questions]

      for future in futures:
          ans = future.result()
          print(ans)


if __name__ == "__main__":
  example()

Getting Started

Tracing

Prompting

Evaluations & Testing

Supported Integrations

API Reference

Hosting

Contact Us

Setup

Checkpoints

Project Type

Inference Metrics

Evaluation Scores

Troubleshooting

Missing LLM Calls in Traces

Grouping of Spans in Trace when using ThreadpoolExecutor in DSPy

Getting Started

Tracing

Prompting

Evaluations & Testing

Supported Integrations

API Reference

Hosting

Contact Us

​Setup

​Checkpoints

​Project Type

​Inference Metrics

​Evaluation Scores

​Troubleshooting

​Missing LLM Calls in Traces

​Grouping of Spans in Trace when using ThreadpoolExecutor in DSPy

Setup

Checkpoints

Project Type

Inference Metrics

Evaluation Scores

Troubleshooting

Missing LLM Calls in Traces

Grouping of Spans in Trace when using ThreadpoolExecutor in DSPy