DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline.
Langtrace has first class support for DSPy, allowing you to capture traces from your DSPy pipelines or agents automatically and analyze them in Langtrace. You can also track experiments and the corresponding metrics and evaluations if you are running DSPy experiments.
Install Langtrace’s SDK and initialize the SDK in your code.
Create a project on Langtrace with type DSPy.
Run your DSPy pipeline or agent and view the traces in Langtrace.
To run experiments, follow the conventions below and head over to the Experiments tab in Langtrace to view your experiments.
Important: Follow the steps below for running experiments
For experiments
to show up, pass the following additional attributes using the inject_additional_attributes.
This way Langtrace knows that you are running an experiment:
(Optional) description - Some useful description about the experiment.
(Optional) run_id - When you want to associate traces to a specific runs, pass a unique run ID. This is useful when you are running Evaluate() as part of your experiment where the traces specific to the Evaluate() will appear as an individual entry.
Note: The Eval Chart will appear when you run dspy’s Evaluate(). Note: Currently the score ranges it supports are between 0 and 100. So if you have scores that do not fall within this range, it could cause some UI issues.
If you’re not seeing LLM calls in your Langtrace traces when using DSPy, consider the following:
DSPy implements caching of LLM calls by default. This means that repeated identical calls to the language model will not trigger new API requests, and consequently, won’t generate new traces in Langtrace.
To ensure you’re seeing all LLM calls:
Disable caching: If you need to trace every call for debugging purposes, you can disable DSPy’s caching mechanism. Refer to the DSPy documentation for instructions on how to disable caching.
Vary your inputs: If you’re testing, make sure to use different inputs for each run to avoid hitting the cache.
Clear the cache: If you need to re-run the same inputs, consider clearing DSPy’s cache between runs.
Check your DSPy configuration: Ensure that your DSPy setup is correctly configured to use the LLM provider you expect.
If you continue to experience issues after considering these points, please contact our support team for further assistance.
Remember to always use caching judiciously in production environments to balance between comprehensive tracing and optimal performance.
Grouping of Spans in Trace when using ThreadpoolExecutor in DSPy
If you’re using ThreadpoolExecutor in DSPy to parallelize your modules, you may notice that the spans in the trace are not grouped together. This is because the spans are created in the same thread and are not propagated to the parent span. To resolve this issue,
Please use the provided code snippet as an example to ensure that the current tracing context is properly propagated when executing tasks within a ThreadPoolExecutor. The contextvars module is used to copy the current context and run the optimized CoT module within the copied context. By passing contextvars.copy_context().run to the executor.submit method, the current tracing context is propagated to the child span ensuring that the spans are grouped together in the trace.
import dspyimport contextvarsfrom dspy.datasets.gsm8k import GSM8K, gsm8k_metricfrom dspy.teleprompt import BootstrapFewShotfrom concurrent.futures import ThreadPoolExecutor# flake8: noqafrom langtrace_python_sdk import langtrace, with_langtrace_root_spanlangtrace.init(api_key="<your-langtrace-api-key>")turbo = dspy.OpenAI(model="gpt-3.5-turbo", max_tokens=250)dspy.settings.configure(lm=turbo)# Load math questions from the GSM8K datasetgsm8k = GSM8K()gsm8k_trainset, gsm8k_devset = gsm8k.train[:10], gsm8k.dev[:10]classCoT(dspy.Module):def__init__(self):super().__init__() self.prog = dspy.ChainOfThought("question -> answer")defforward(self, question):return self.prog(question=question)@with_langtrace_root_span(name="parallel_example")defexample():# Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 4-shot examples of our CoT program. config =dict(max_bootstrapped_demos=4, max_labeled_demos=4)# Optimize! Use the `gsm8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing. teleprompter = BootstrapFewShot(metric=gsm8k_metric,**config) optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset) questions =["What is the sine of 0?","What is the tangent of 100?",]with ThreadPoolExecutor(max_workers=2)as executor: futures =[executor.submit(contextvars.copy_context().run, optimized_cot, question=q)for q in questions]for future in futures: ans = future.result()print(ans)if __name__ =="__main__": example()