Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Signature using a Pydantic type is not properly converted to code #7934

Open
ldorigo opened this issue Mar 10, 2025 · 0 comments
Open
Labels
bug Something isn't working

Comments

@ldorigo
Copy link

ldorigo commented Mar 10, 2025

What happened?

I have the following module/signature:

from typing import List
import dspy
from pydantic import BaseModel, Field


class NarrativeTimePeriod(BaseModel):
    """Time period information extracted from text."""

    source_text_quote: str = Field(
        description="The exact text (sentence or few words) indicating the time period"
    )
    contextualized_quote: str = Field(
        description="Text surrounding the quote for unique identification in the content"
    )
    explanation: str = Field(
        description="A brief explanation of how the start and end years can be determined from the source text"
    )
    contemporary: bool = Field(
        description="Whether the time period is contemporary to the movie's release date. If set to true, the start_year and end_year will be the release date of the movie.",
        default=False,
    )
    start_year: int = Field(
        description="The starting year of the time period (use best estimate if not exact)"
    )
    end_year: int = Field(
        description="The ending year of the time period (can be the same as start_year for single-year events)"
    )
    exact: bool = Field(
        description="Whether it's an exact period or rough period based on known events"
    )
    explicit: bool = Field(
        description="Whether the time period is explicitly stated or implied by the context"
    )


class NarrativeLocation(BaseModel):
    """Location information extracted from text."""

    source_text_quote: str = Field(
        description="The exact text (sentence or few words) indicating the location"
    )
    contextualized_quote: str = Field(
        description="Text surrounding the quote for unique identification in the content"
    )
    explanation: str = Field(
        description="A brief explanation of how the location can be determined from the source text"
    )
    location: str = Field(
        description="Modern-day name with hierarchy (e.g., 'New York City, New York, United States')"
    )
    exact: bool = Field(
        description="Whether it's an exact location or a general region"
    )
    explicit: bool = Field(
        description="Whether the location is explicitly stated or implied by the context"
    )
    real: bool = Field(
        description="Whether the location is real (true) or fictional (false)"
    )


class MinimalTimeLocationSignature(dspy.Signature):
    """Minimal signature for extracting time periods and locations from movie descriptions."""

    content = dspy.InputField(desc="Wikipedia article content")
    title = dspy.InputField(desc="Movie title")

    narrative_time_periods: List[NarrativeTimePeriod] = dspy.OutputField(
        desc="List of time periods mentioned in the narrative content"
    )
    narrative_locations: List[NarrativeLocation] = dspy.OutputField(
        desc="List of locations mentioned in the narrative content"
    )
    summary = dspy.OutputField(
        desc="Brief summary of when and where the movie takes place"
    )


class MinimalAnnotatorModule(dspy.Module):
    """Module for annotating Wikipedia articles with time and location information using minimal signature."""

    def __init__(self):
        super().__init__()
        self.predictor = dspy.ChainOfThought(MinimalTimeLocationSignature)

    def forward(self, content: str, id: str, title: str, url: str) -> dict:
        """Process a single example to extract time and location information.

        Args:
            content: The Wikipedia article content
            id: The article ID
            title: The article title
            url: The article URL

        Returns:
            A golden example with time and location annotations
        """
        # Extract time and location information
        result = self.predictor(content=content, title=title)

        return {
            "id": id,
            "title": title,
            "url": url,
            "content": content,
            "narrative_time_periods": result.narrative_time_periods,
            "narrative_locations": result.narrative_locations,
            "summary": result.summary,
        }

grounded_proposer.py/get_dspy_source_code converts this to the following source code:

StringSignature(content, title -> reasoning, narrative_time_periods, narrative_locations, summary
    instructions='Minimal signature for extracting time periods and locations from movie descriptions.'
    content = Field(annotation=str required=True json_schema_extra={'desc': 'Wikipedia article content', '__dspy_field_type': 'input', 'prefix': 'Content:'})
    title = Field(annotation=str required=True json_schema_extra={'desc': 'Movie title', '__dspy_field_type': 'input', 'prefix': 'Title:'})
    reasoning = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${reasoning}', '__dspy_field_type': 'output'})
    narrative_time_periods = Field(annotation=List[NarrativeTimePeriod] required=True json_schema_extra={'desc': 'List of time periods mentioned in the narrative content', '__dspy_field_type': 'output', 'prefix': 'Narrative Time Periods:'})
    narrative_locations = Field(annotation=List[NarrativeLocation] required=True json_schema_extra={'desc': 'List of locations mentioned in the narrative content', '__dspy_field_type': 'output', 'prefix': 'Narrative Locations:'})
    summary = Field(annotation=str required=True json_schema_extra={'desc': 'Brief summary of when and where the movie takes place', '__dspy_field_type': 'output', 'prefix': 'Summary:'})
)

class MinimalAnnotatorModule(dspy.Module):
    """Module for annotating Wikipedia articles with time and location information using minimal signature."""

    def __init__(self):
        super().__init__()
        self.predictor = dspy.ChainOfThought(MinimalTimeLocationSignature)

    def forward(self, content: str, id: str, title: str, url: str) -> dict:
        """Process a single example to extract time and location information.

        Args:
            content: The Wikipedia article content
            id: The article ID
            title: The article title
            url: The article URL

        Returns:
            A golden example with time and location annotations
        """
        # Extract time and location information
        result = self.predictor(content=content, title=title)

        # Create a golden example with the extracted information
        return {
            "id": id,
            "title": title,
            "url": url,
            "content": content,
            "narrative_time_periods": result.narrative_time_periods,
            "narrative_locations": result.narrative_locations,
            "summary": result.summary,
        }

However this omits the actual models for the parameters, which causes the proposer to miss crucial information on what to include in the response.

Steps to reproduce

Just use the example code I gave above and call get_dspy_source_code on it.

DSPy version

2.6.5

@ldorigo ldorigo added the bug Something isn't working label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant