Annotations abstraction for responses that are not just a stream of plain text #716

simonw · 2025-01-24T01:15:53Z

LLM currently assumes that all responses from a model come in the form of a stream of text.

This assumption no longer holds!

Anthropic's new Citations API (API docs) returns responses that add citation details to some spans of text, like this.
DeepSeek Reasoner streams back two types of text - reasoning text and regular text - as seen here.

And that's just variants of text - multi-modal models need consideration as well. OpenAI have a model that can return snippets of audio already, and models that return images (from OpenAI and Gemini) are coming available very soon too.

simonw · 2025-01-24T01:16:35Z

I had thought that attachments would be the way to handle this, but they only work for audio/image outputs - the thing where Claude and DeepSeek can return annotated spans of text feels different.

simonw · 2025-01-24T02:47:07Z

Here's an extract from that Claude citations example:

{
  "id": "msg_01P3zs4aYz2Baebumm4Fejoi",
  "content": [
    {
      "text": "Based on the document, here are the key trends in AI/LLMs from 2024:\n\n1. Breaking the GPT-4 Barrier:\n",
      "type": "text"
    },
    {
      "citations": [
        {
          "cited_text": "I’m relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.\n\n",
          "document_index": 0,
          "document_title": "My Document",
          "end_char_index": 531,
          "start_char_index": 288,
          "type": "char_location"
        }
      ],
      "text": "The GPT-4 barrier was completely broken, with 18 organizations now having models that rank higher than the original GPT-4 from March 2023, with 70 models in total surpassing it.",
      "type": "text"
    },
    {
      "text": "\n\n2. Increased Context Lengths:\n",
      "type": "text"
    },
    {
      "citations": [
        {
          "cited_text": "Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google’s Gemini series accepts up to 2 million.\n\n",
          "document_index": 0,
          "document_title": "My Document",
          "end_char_index": 1680,
          "start_char_index": 1361,
          "type": "char_location"
        }
      ],
      "text": "A major theme was increased context lengths. While last year most models accepted 4,096 or 8,192 tokens (with Claude 2.1 accepting 200,000), today every serious provider has a 100,000+ token model, and Google's Gemini series accepts up to 2 million.",
      "type": "text"
    },

And from the DeepSeek reasoner streamed response (pretty-printed here). First a reasoning content chunk:

{
    "id": "2cf23b27-2ba6-41dd-b484-358c486a1405",
    "object": "chat.completion.chunk",
    "created": 1737480272,
    "model": "deepseek-reasoner",
    "system_fingerprint": "fp_1c5d8833bc",
    "choices": [
        {
            "index": 0,
            "delta": {
                "content": null,
                "reasoning_content": "Okay"
            },
            "logprobs": null,
            "finish_reason": null
        }
    ]
}

Text content chunk:

{
    "id": "2cf23b27-2ba6-41dd-b484-358c486a1405",
    "object": "chat.completion.chunk",
    "created": 1737480272,
    "model": "deepseek-reasoner",
    "system_fingerprint": "fp_1c5d8833bc",
    "choices": [
        {
            "index": 0,
            "delta": {
                "content": " waves",
                "reasoning_content": null
            },
            "logprobs": null,
            "finish_reason": null
        }
    ]
}

simonw · 2025-01-24T02:54:35Z

Meanwhile OpenAI audio responses look like this (truncated).I'm not sure if these can mix in text output as well, but in this case the audio does at least include a "transcript" key:

{
  "id": "chatcmpl-At42uKzhIMJfzGOwypiS9mMH3oaFG",
  "object": "chat.completion",
  "created": 1737686956,
  "model": "gpt-4o-audio-preview-2024-12-17",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "refusal": null,
        "audio": {
          "id": "audio_6792ffad12f48190abab9d6b7d1a1bf7",
          "data": "UklGRkZLAABXQVZFZ...",
          "expires_at": 1737690557,
          "transcript": "Hi"
        }
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "completion_tokens": 13,
    "total_tokens": 35,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0,
      "text_tokens": 22,
      "image_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 8,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0,
      "text_tokens": 5
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_58887f9c5a"
}

simonw · 2025-01-24T04:46:48Z

Wrote about this on my blog: https://simonwillison.net/2025/Jan/24/anthropics-new-citations-api/#now-i-need-to-design-an-abstraction-layer-for-llm

danbri · 2025-01-24T09:14:18Z

Nearby:

https://blog.livekit.io/openai-livekit-partnership-advanced-voice-realtime-api/

https://github.com/pipecat-ai/gemini-webrtc-web-simple

https://huggingface.co/blog/freddyaboulton/realtime-video-chat-with-gradio-webrtc

ericfeunekes · 2025-01-24T09:50:20Z

I think a combination of pydantic object with some sort of templating language. E.g. for the Claude example you have this object:

from pydantic import BaseModel, Field
from typing import List, Optional, Literal

class TextRange(BaseModel):
   start: int
   end: int

class Citation(BaseModel):
   sourceDocument: str = Field(alias="document_title")
   documentIndex: int = Field(alias="document_index")
   textRange: TextRange = Field(...)
   citedText: str = Field(alias="cited_text")
   type: Literal["char_location"]

class ContentBlock(BaseModel):
   blockType: Literal["text", "heading"] = Field(alias="type")
   content: str = Field(alias="text") 
   hasCitation: bool = Field(default=False)
   citation: Optional[Citation] = None
   headingLevel: Optional[int] = None

class Message(BaseModel):
   messageId: str = Field(alias="id")
   contentBlocks: List[ContentBlock] = Field(alias="content")

and then you define a message template:

{% for block in contentBlocks if block.blockType == "text" %}
  {{ block.content }}
  {% if block.hasCitation %}
    > {{ block.citation.citedText }}
  {% endif %}
{% endfor %}

You could then create similar objects and templates for different model types. These could also be exposed to users to customize how data is shown for any model. Also - pydantic now supports partial validation so to the extent any of the json responses are streamed, this model should still work.

ericfeunekes · 2025-01-24T13:38:31Z

I think a combination of pydantic object with some sort of templating language. E.g. for the Claude example you have this object:

from pydantic import BaseModel, Field
from typing import List, Optional, Literal

class TextRange(BaseModel):
   start: int
   end: int

class Citation(BaseModel):
   sourceDocument: str = Field(alias="document_title")
   documentIndex: int = Field(alias="document_index")
   textRange: TextRange = Field(...)
   citedText: str = Field(alias="cited_text")
   type: Literal["char_location"]

class ContentBlock(BaseModel):
   blockType: Literal["text", "heading"] = Field(alias="type")
   content: str = Field(alias="text") 
   hasCitation: bool = Field(default=False)
   citation: Optional[Citation] = None
   headingLevel: Optional[int] = None

class Message(BaseModel):
   messageId: str = Field(alias="id")
   contentBlocks: List[ContentBlock] = Field(alias="content")

and then you define a message template:

{% for block in contentBlocks if block.blockType == "text" %}
  {{ block.content }}
  {% if block.hasCitation %}
    > {{ block.citation.citedText }}
  {% endif %}
{% endfor %}

You could then create similar objects and templates for different model types. These could also be exposed to users to customize how data is shown for any model. Also - pydantic now supports partial validation so to the extent any of the json responses are streamed, this model should still work.

Thinking about this a bit more, if you want to go down this road or something similar, it would be great to have it as a separate package. This would let it be a plugin to this library, but also useable in others. I could definitely use something like this in some of my projects where I have LiteLLM, which lets me switch models easily and so would be great to be able to have output templates that I could define like this.

Not sure how hard this would be but I could probably contribute.

banahogg · 2025-01-24T22:21:50Z

Cohere is a bit outside the top tier models but probably worth considering their citation format as well when designing this: https://docs.cohere.com/docs/documents-and-citations

simonw · 2025-01-25T17:20:48Z

That Cohere example is really interesting. It looks like they decided to have citations as a separate top-level key and then reference which bits of text the citations correspond to using start/end indexes:

# response.message.content
[AssistantMessageResponseContentItem_Text(text='The tallest penguins are the Emperor penguins. They only live in Antarctica.', type='text')]

# response.message.citations
[Citation(start=29, 
          end=46, 
          text='Emperor penguins.', 
          sources=[Source_Document(id='doc:0:0', 
                                   document={'id': 'doc:0:0', 
                                             'snippet': 'Emperor penguins are the tallest.', 
                                             'title': 'Tall penguins'}, 
                                   type='document')]), 
 Citation(start=65, 
          end=76, 
          text='Antarctica.', 
          sources=[Source_Document(id='doc:0:1', 
                                   document={'id': 'doc:0:1', 
                                             'snippet': 'Emperor penguins only live in Antarctica.', 
                                             'title': 'Penguin habitats'}, 
                                   type='document')])]

Note how that first citation is in a separate data structure and flags 29-46 - the text "Emperor penguins." - as the attachment point.

This might actually be a way to solve the general problem: I could take the Claude citations format and turn that into a separate stored piece of information, referring back to the original text using those indexes.

That way I could still store a string of text in the database / output that in the API, but additional annotations against that stream of text could be stored elsewhere.

For the DeepSeek reasoner case this would mean having a start-end indexed chunk of text that is labelled as coming from the <think> block.

I don't think this approach works for returning audio though - there's no text segment to attach that audio to, though I guess I could say "index 55:55 is where the audio chunk came in".

simonw · 2025-01-25T17:22:11Z

I'm going to call this annotations for the moment - where an annotation is additional metadata attached to a portion of the text returned by an LLM.

The three things to consider are:

How are annotations represented in the LLM Python API? Presumably on the Response class?
How are they represented in the CLI tool (really a question about how they are rendered to a terminal)
How are they stored in the SQLite database tables, such that they can be re-hydrated into Response objects from the database?

simonw · 2025-01-25T17:23:59Z

I think I'll treat audio/image responses separately from annotations - I'll use an expanded version of the existing attachments mechanism for that - including the existing attachments database table:

llm/docs/logging.md

Lines 181 to 194 in 656d8fa

    
           CREATE TABLE [attachments] ( 
        
             [id] TEXT PRIMARY KEY, 
        
             [type] TEXT, 
        
             [path] TEXT, 
        
             [url] TEXT, 
        
             [content] BLOB 
        
           ); 
        
           CREATE TABLE [prompt_attachments] ( 
        
             [response_id] TEXT REFERENCES [responses]([id]), 
        
             [attachment_id] TEXT REFERENCES [attachments]([id]), 
        
             [order] INTEGER, 
        
             PRIMARY KEY ([response_id], 
        
             [attachment_id]) 
        
           );

I'll probably add a response_attachments many-to-many table to track attachments returned BY a response (as opposed to being attached to the prompt as input).

simonw · 2025-01-25T17:38:25Z

After brainstorming with Claude I think a solution to the terminal representation challenge could be to add markers around the annotated spans of text and then display those annotations below.

One neat option here is corner brackets - 「 and 」- for example:

Based on the document, here are the key trends in AI/LLMs from 2024:

1. Breaking the GPT-4 Barrier: 「The GPT-4 barrier was completely broken, with 18 organizations now having models that rank higher than the original GPT-4 from March 2023, with 70 models in total surpassing it.」

2. Increased Context Lengths: 「A major theme was increased context lengths. While last year most models accepted 4,096 or 8,192 tokens (with Claude 2.1 accepting 200,000), today every serious provider has a 100,000+ token model, and Google's Gemini series accepts up to 2 million.」

Annotations:

「The GPT-4 barrier was completely broken...」:

  {
    "citations": [
      {
        "cited_text": "I’m relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 531,
        "start_char_index": 288,
        "type": "char_location"
      }
    ]
  }

「A major theme was increased context lengths...」:

  {
    "citations": [
      {
        "cited_text": "Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google’s Gemini series accepts up to 2 million.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 1680,
        "start_char_index": 1361,
        "type": "char_location"
      }
    ]
  }

So the spans of text that have annotations are wrapped in「 and 」and the annotations themselves are then displayed below.

Here's what that looks like in a macOS terminal window:

simonw · 2025-01-25T17:44:57Z

For DeepSeek reasoner that might look like this:

「Okay, so I need to come up with a joke about a pelican and a
walrus running a tea room together. Hmm, that's an
interesting combination. Let me think about how these two
characters might interact in a humorous situation.

First, let's consider their characteristics. Pelicans are
known for their long beaks and Webbed feet, often seen near
the beach or water. Walruses have big teeth, thick fur, and
they're generally found in colder climates, like icebergs or
snowy areas. So, combining these two into a tea room setting
is already a funny image.」

**The Joke:**

A pelican and a walrus decide to open a quaint little tea
room together. The walrus, with its big size, struggles to
find comfortable chairs, so it sits on the table by accident,
knocking over the teapot. Meanwhile, the pelican, trying to
help, uses its beak to place saucers on the table, causing a
few spills.

After a series of comical mishaps, the walrus looks up and
says with a grin, "This isn't so fishy anymore." The pelican
smirks and remarks, "Maybe not, but we do have a lot of krill
in our tea!"

Annotations:

「Okay, so I need to come up with a joke... 」:

  {
    "thinking": true
  }

In this case I'd have to do some extra post-processing to combine all of those short token snippets into a single annotation, de-duping the "thinking": true annotation - otherwise I would end up with dozens of annotations for every word in the thinking section.

simonw · 2025-01-25T17:48:41Z

For the Python layer this might look like so:

response = llm.prompt("prompt goes here")
print(response.text()) # outputs the plain text
print(response.annotations)
# Outputs annotations, see below
for annotated in response.text_with_annotations():
    print(annotated.text, annotated.annotations)

That text_with_annotations() method is a utility that uses the start/end indexes to break up the text and return each segment with its annotations.

The response.annotations list would look something like this:

[
  Annotation(start=0, end=5, data={"this": "is a dictionary of stuff"}),
  Annotation(start=55, end=58, data={"this": "is more stuff"}),
]

(data= is an ugly name for a property, but annotation= didn't look great either.)

simonw · 2025-01-25T17:51:11Z

Then the SQL table design is pretty simple:

 CREATE TABLE [response_annotations] (
   [id] INTEGER PRIMARY KEY,
   [response_id] TEXT REFERENCES [responses]([id]), 
   [start_index] INTEGER,
   [end_index] INTEGER,
   [annotation] TEXT -- JSON
 );

simonw · 2025-01-25T17:55:37Z

It bothers me very slightly that this design allows for exact positioning of annotations in a text stream response (with a start and end index) but doesn't support that for recording the position at which an image or audio clip was returned.

I think the fix for that is to have an optional single text_index integer on the response_attachments many-to-many table, to optionally record the exact point at which an image/audio-clip was included in the response.

Quantisan · 2025-01-30T06:56:44Z

After brainstorming with Claude I think a solution to the terminal representation challenge could be to add markers around the annotated spans of text and then display those annotations below.

One neat option here is corner brackets - 「 and 」- for example:

Based on the document, here are the key trends in AI/LLMs from 2024:

1. Breaking the GPT-4 Barrier: 「The GPT-4 barrier was completely broken, with 18 organizations now having models that rank higher than the original GPT-4 from March 2023, with 70 models in total surpassing it.」

2. Increased Context Lengths: 「A major theme was increased context lengths. While last year most models accepted 4,096 or 8,192 tokens (with Claude 2.1 accepting 200,000), today every serious provider has a 100,000+ token model, and Google's Gemini series accepts up to 2 million.」

Annotations:

「The GPT-4 barrier was completely broken...」:

  {
    "citations": [
      {
        "cited_text": "I’m relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 531,
        "start_char_index": 288,
        "type": "char_location"
      }
    ]
  }

「A major theme was increased context lengths...」:

  {
    "citations": [
      {
        "cited_text": "Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google’s Gemini series accepts up to 2 million.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 1680,
        "start_char_index": 1361,
        "type": "char_location"
      }
    ]
  }

asking the obvious question, why not use the academic paper style of using [<number>] to reference instead of quoting the beginning text as anchor? I guess one reason is that it would add even more chars to the text block.

simonw · 2025-02-14T01:08:39Z

asking the obvious question, why not use the academic paper style of using [<number>] to reference instead of quoting the beginning text as anchor? I guess one reason is that it would add even more chars to the text block.

The problem with using [1] style citation references is that marks just a single point, but the Claude citations API returns both a start and an end index.

That said, maybe this could work (I also added text wrapping):

Based on the document, here are the key trends in AI/LLMs from 2024:

1. Breaking the GPT-4 Barrier: 「The GPT-4 barrier was
  completely broken, with 18 organizations now having models
  that rank higher than the original GPT-4 from March 2023,
  with 70 models in total surpassing it.」[1]

2. Increased Context Lengths: 「A major theme was increased
  context lengths. While last year most models accepted 4,096
  or 8,192 tokens (with Claude 2.1 accepting 200,000), today
  every serious provider has a 100,000+ token model, and
  Google's Gemini series accepts up to 2 million.」[2]

Annotations:

  [1]「The GPT-4 barrier was completely broken...」:

  {
    "citations": [
      {
        "cited_text": "I’m relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 531,
        "start_char_index": 288,
        "type": "char_location"
      }
    ]
  }

  [2]「A major theme was increased context lengths...」:

  {
    "citations": [
      {
        "cited_text": "Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google’s Gemini series accepts up to 2 million.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 1680,
        "start_char_index": 1361,
        "type": "char_location"
      }
    ]
  }

This would also mean I could omit the quoted truncated extract entirely:

Based on the document, here are the key trends in AI/LLMs from 2024:

1. Breaking the GPT-4 Barrier: 「The GPT-4 barrier was
  completely broken, with 18 organizations now having models
  that rank higher than the original GPT-4 from March 2023,
  with 70 models in total surpassing it.」[1]

2. Increased Context Lengths: 「A major theme was increased
  context lengths. While last year most models accepted 4,096
  or 8,192 tokens (with Claude 2.1 accepting 200,000), today
  every serious provider has a 100,000+ token model, and
  Google's Gemini series accepts up to 2 million.」[2]

Annotations:

  [1]

  {
    "citations": [
      {
        "cited_text": "I’m relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 531,
        "start_char_index": 288,
        "type": "char_location"
      }
    ]
  }

  [2]

  {
    "citations": [
      {
        "cited_text": "Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google’s Gemini series accepts up to 2 million.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 1680,
        "start_char_index": 1361,
        "type": "char_location"
      }
    ]
  }

simonw · 2025-02-14T01:10:49Z

Or with numbers at the start:

Based on the document, here are the key trends in AI/LLMs from 2024:

1. Breaking the GPT-4 Barrier: [1]「The GPT-4 barrier was
  completely broken, with 18 organizations now having models
  that rank higher than the original GPT-4 from March 2023,
  with 70 models in total surpassing it.」

2. Increased Context Lengths: [2]「A major theme was increased
  context lengths. While last year most models accepted 4,096
  or 8,192 tokens (with Claude 2.1 accepting 200,000), today
  every serious provider has a 100,000+ token model, and
  Google's Gemini series accepts up to 2 million.」

Annotations:

  [1]

  {
    "citations": [
      {
        "cited_text": "I’m relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 531,
        "start_char_index": 288,
        "type": "char_location"
      }
    ]
  }

  [2]

  {
    "citations": [
      {
        "cited_text": "Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google’s Gemini series accepts up to 2 million.\n\n",
        "document_index": 0,
        "document_title": "My Document",
        "end_char_index": 1680,
        "start_char_index": 1361,
        "type": "char_location"
      }
    ]
  }

simonw · 2025-02-24T23:01:48Z

This came up again today thanks to Claude 3.7 Sonnet exposing thinking tokens:

Support Claude 3.7 Sonnet, including its new thinking mode llm-anthropic#14

Based on that (and how it works in the Claude streaming API - see example at https://gist.github.com/simonw/c5e369753e8dbc9b045c514bb4fee987) I'm now thinking the for annotated in response.text_with_annotations() idea looks good - but I might simply that to for chunk in response.chunks() where a chunk is text plus an optional type or optional annotations.

Maybe tool usage can benefit from this too?

simonw · 2025-03-11T17:34:14Z

Lots of these in the new OpenAI Responses API https://platform.openai.com/docs/api-reference/responses/create

simonw · 2025-03-18T23:53:18Z

The OpenAI web search stuff needs this too:

gpt-4o-search-preview and gpt-4o-mini-search-preview chat completion models #837

Example from https://platform.openai.com/docs/guides/tools-web-search?api-mode=chat&lang=curl#output-and-citations

[
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "the model response is here...",
      "refusal": null,
      "annotations": [
        {
          "type": "url_citation",
          "url_citation": {
            "end_index": 985,
            "start_index": 764,
            "title": "Page title...",
            "url": "https://..."
          }
        }
      ]
    },
    "finish_reason": "stop"
  }
]

simonw · 2025-03-18T23:57:27Z

OpenAI example (including streaming) here:

gpt-4o-search-preview and gpt-4o-mini-search-preview chat completion models #837 (comment)

simonw · 2025-03-19T01:58:39Z

Here's a challenge: in streaming mode OpenAI only returns the annotations at the very end - but I'll already have printed the text out to the screen by the time that arrives, so I won't be able to use the fancy inline [1] trick for those streaming responses. I'll just have to dump the annotations at the bottom without them being attached to the text.

But some APIs like DeepSeek or Claude Thinking CAN return inline annotations. So the design needs to handle both cases.

This will be particularly tricky at the Python API layer. If you call a method that's documented as streaming a sequence of chunks with optional annotations at you what should that method do for the OpenAI case where actually the annotations were only visible at the end?

simonw · 2025-03-19T02:05:06Z

Let's look at what Anthropic does for streaming citations.

Without streaming:

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $(llm keys get anthropic)" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-7-sonnet-20250219",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "document",
            "source": {
              "type": "text",
              "media_type": "text/plain",
              "data": "The grass is green. The sky is blue."
            },
            "title": "My Document",
            "context": "This is a trustworthy document.",
            "citations": {"enabled": true}
          },
          {
            "type": "text",
            "text": "What color is the grass and sky?"
          }
        ]
      }
    ]
  }' | jq

Returns:

{
  "id": "msg_016NSoAFZagmYi29wfZ72wN2",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-7-sonnet-20250219",
  "content": [
    {
      "type": "text",
      "text": "Based on the document you've provided:\n\n"
    },
    {
      "type": "text",
      "text": "The grass is green.",
      "citations": [
        {
          "type": "char_location",
          "cited_text": "The grass is green. ",
          "document_index": 0,
          "document_title": "My Document",
          "start_char_index": 0,
          "end_char_index": 20
        }
      ]
    },
    {
      "type": "text",
      "text": " "
    },
    {
      "type": "text",
      "text": "The sky is blue.",
      "citations": [
        {
          "type": "char_location",
          "cited_text": "The sky is blue.",
          "document_index": 0,
          "document_title": "My Document",
          "start_char_index": 20,
          "end_char_index": 36
        }
      ]
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 610,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "output_tokens": 54
  }
}

But with streaming:

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $(llm keys get anthropic)" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-7-sonnet-20250219",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "document",
            "source": {
              "type": "text",
              "media_type": "text/plain",
              "data": "The grass is green. The sky is blue."
            },
            "title": "My Document",
            "context": "This is a trustworthy document.",
            "citations": {"enabled": true}
          },
          {
            "type": "text",
            "text": "What color is the grass and sky?"
          }
        ]
      }
    ]
  }'

It returns this:

event: message_start
data: {"type":"message_start","message":{"id":"msg_01RT1aSmJeRk18N6LQ8kC4mG","type":"message","role":"assistant","model":"claude-3-7-sonnet-20250219","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":610,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":1}}            }

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}        }

event: ping
data: {"type": "ping"}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Base"}       }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"d on the provided document, I can answer your question about the colors of"}               }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" the grass and sky.\n\n"}           }

event: content_block_stop
data: {"type":"content_block_stop","index":0           }

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":"","citations":[]}     }

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"citations_delta","citation":{"type":"char_location","cited_text":"The grass is green. ","document_index":0,"document_title":"My Document","start_char_index":0,"end_char_index":20}}       }

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"The grass is green."}  }

event: content_block_stop
data: {"type":"content_block_stop","index":1    }

event: content_block_start
data: {"type":"content_block_start","index":2,"content_block":{"type":"text","text":""}             }

event: content_block_delta
data: {"type":"content_block_delta","index":2,"delta":{"type":"text_delta","text":" "}   }

event: content_block_stop
data: {"type":"content_block_stop","index":2        }

event: content_block_start
data: {"type":"content_block_start","index":3,"content_block":{"type":"text","text":"","citations":[]} }

event: content_block_delta
data: {"type":"content_block_delta","index":3,"delta":{"type":"citations_delta","citation":{"type":"char_location","cited_text":"The sky is blue.","document_index":0,"document_title":"My Document","start_char_index":20,"end_char_index":36}}            }

event: content_block_delta
data: {"type":"content_block_delta","index":3,"delta":{"type":"text_delta","text":"The sky is blue."}         }

event: content_block_stop
data: {"type":"content_block_stop","index":3           }

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":66}     }

event: message_stop
data: {"type":"message_stop"             }

Claude DID interleave citations among regular text, with blocks that look like this:

data: {"type":"content_block_delta","index":1,"delta":{"type":"citations_delta","citation":

simonw · 2025-03-19T02:06:44Z

I pushed my prototype so far - the one dodgy part of it is that I got Claude to rewrite the logs_list command to use Response.from_row() in order to test out the new .chunks() method and I have NOT reviewed that rewrite thoroughly at all - so I should at least expect to rework logs_list() before landing this.

simonw · 2025-03-19T02:18:17Z

Current TODO list:

Need to think about how to handle that streaming case, both as a Python API and from how plugins should handle that. Currently plugins yield strings from their .execute() methods, should they start optionally yielding Chunks instead?
Review and clean up that logs_list() code
Add support for annotations to the llm prompt command - at the moment they are only visible from llm logs in the prototype
Tests, docs, examples across multiple plugins

simonw · 2025-03-19T02:21:03Z

if we did start optionally yielding Chunk() from the execute() method (and its async variant) we could teach the Response.chunks() method to yield chunks as they become available.

In terms of display, we could teach llm prompt that any time a chunk has a .data annotation it should output those start and end markers and a [x] annotation marker, then still show the annotation data itself at the end.

What about for the case with OpenAI where the annotations only become available at the end of the stream? For that we cannot show [x] markers in the right places because we already printed them.

Instead, we could fall back on that earlier idea from #716 (comment) to show them like this:

Annotations:

「The GPT-4 barrier was completely broken...」:

  {
    "citations": [
      {
        "cited_text": "I’m relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.\n\n",
        "document_index": 0,
        "document_title": "My Document",

That would work pretty well for this edge-case I think.

I guess the Python API then becomes something like this:

seen_annotations = False
for chunk in response.chunks():
    print(chunk, sep='')
    if chunk.annotation:
        seen_annotations = True
        print(chunk.annotation)

if not seen_annotations and response.annotations:
    # Must have been some annotations at the end that we missed
    print(response.annotations)

Or encapsulate that logic into a if response.unseen_annotations: property.

simonw · 2025-03-19T02:23:55Z

I think I like chunk.annotation more than chunk.data for the optional dict of data attached to a chunk.

I'll leave it as annotation.data though because annotation.annotation is a bit weird.

Refs #716 (comment)

simonw · 2025-03-19T02:41:36Z

This feature may be the point at which I need a llm prompt --json option which outputs JSON instead of text. This could work using newline-delimited JSON for streaming mode and return a single JSON object for non-streaming mode.

Something like this:

llm -m gpt-4o-mini-search-preview 'what happened on march 1 2025' --json

Outputs:

{"text": "On March 1st 2025 "}
{"text": "several things "}
{"text": "happened "}
{"text": "including ", "annotation" { ... }}

Or for things where annotations come at the end maybe it ends with:

{"text": "."}
{"text": null, "annotation": {...}

This would effectively be the debug tool version of for chunk in response.chunks().

simonw · 2025-03-19T02:42:56Z

The various "thinking" blocks I want to support don't actually include start and end indexes in their APIs, so I'll need a utility mechanism to keep track of those automatically for logging to the database.

Noteworthy that the Claude citations streaming API does include start and end indices:

data: {"type":"content_block_delta","index":1,"delta":{"type":"citations_delta","citation":{"type":"char_location","cited_text":"The grass is green. ","document_index":0,"document_title":"My Document","start_char_index":0,"end_char_index":20}} }

But the thinking blocks from Claude do not:

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $(llm keys get anthropic)" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-3-7-sonnet-20250219",
    "stream": true,
    "max_tokens": 2048,
    "thinking": {"type": "enabled", "budget_tokens": 1024},
    "messages": [
        {"role": "user", "content": "Think about poetry"}
    ]
}'

Outputs (truncated):

event: message_start
data: {"type":"message_start","message":{"id":"msg_017mGze44QHkjHWQLV1z5Kar","type":"message","role":"assistant","model":"claude-3-7-sonnet-20250219","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":38,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":5}} }

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":"","signature":""}     }

event: ping
data: {"type": "ping"}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"The person is asking"}             }

...

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" or technical."}            }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"signature_delta","signature":"ErUBCkYIARgCIkDmcsvcQNCb6L8ZH4/m1eZ9cO12z9ix4VSOs5uS34KwLSQewZChqR0iKSN5/M5JXSCnAszTMy31/Iqp5n51SC61EgxjmtR4JwVoJhgW6mkaDOhdQSJTZaQwGDsqqSIwXeNBM94s+CBN1ps4t7Has4N8eNR0xckndVEo5D5yGbFpZD1KTfdMi4P+jiXefb97Kh0KrW3O2MHZog2fCqg4YPI2S4JKLIZweYz5VGku4Q=="}    }

event: content_block_stop
data: {"type":"content_block_stop","index":0      }

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}             }

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"# Reflections on Poetry\n\nPoetry"}           }

...

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":" more deeply?"}            }

event: content_block_stop
data: {"type":"content_block_stop","index":1     }

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":345}             }

event: message_stop
data: {"type":"message_stop"   }

simonw · 2025-03-19T22:57:55Z

It turns out the new OpenAI responses API does stream annotations within the main stream of returned events. Here's an example: https://gist.github.com/simonw/47b043f0851c54eae85e0bd961d2e198#file-recent_john_gruber-py-L587-L612

I ran llm -m gpt-4o-mini 'recent john gruber articles' with hard-coded kwargs["tools"] = [{"type": "web_search_preview"}] to turn on the search tool. Relevant snippet from that linked example:

{'content_index': 0,
 'delta': ' ',
 'item_id': 'msg_67db4ae7a2908192b010032f387583890c691f65bff4763f',
 'output_index': 1,
 'type': 'response.output_text.delta'}
{'content_index': 0,
 'delta': '([macrumors.com](https://www.macrumors.com/2025/03/12/gruber-says-something-is-rotten-at-apple/?utm_source=openai))',
 'item_id': 'msg_67db4ae7a2908192b010032f387583890c691f65bff4763f',
 'output_index': 1,
 'type': 'response.output_text.delta'}
{'annotation': {'end_index': 596,
                'start_index': 481,
                'title': "John Gruber Says 'Something is Rotten' at Apple - "
                         'MacRumors',
                'type': 'url_citation',
                'url': 'https://www.macrumors.com/2025/03/12/gruber-says-something-is-rotten-at-apple/?utm_source=openai'},
 'annotation_index': 0,
 'content_index': 0,
 'item_id': 'msg_67db4ae7a2908192b010032f387583890c691f65bff4763f',
 'output_index': 1,
 'type': 'response.output_text.annotation.added'}
{'content_index': 0,
 'delta': '\n',
 'item_id': 'msg_67db4ae7a2908192b010032f387583890c691f65bff4763f',
 'output_index': 1,
 'type': 'response.output_text.delta'}

The annotations are also shown in full at the end of the streaming response:
https://gist.github.com/simonw/47b043f0851c54eae85e0bd961d2e198#file-recent_john_gruber-py-L1609

simonw · 2025-03-19T23:01:04Z

https://platform.openai.com/docs/api-reference/responses-streaming/response/output_text/annotation shows the event that is output in stream mode for an annotation:

{
  "type": "response.output_text.annotation.added",
  "item_id": "msg_abc123",
  "output_index": 1,
  "content_index": 0,
  "annotation_index": 0,
  "annotation": {
    "type": "file_citation",
    "index": 390,
    "file_id": "file-4wDz5b167pAf72nx1h9eiN",
    "filename": "dragons.pdf"
  }
}

That annotation is a different shape from the web search one.
It comes from the file search API: https://platform.openai.com/docs/guides/tools-file-search

simonw · 2025-03-19T23:19:38Z

I ran through their tutorial on this page and then did:

import json

response = client.responses.create(
    model="gpt-4o-mini",
    input="What is deep research by OpenAI?",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store.id]
    }],
    stream=True,
)
for event in response:
    print(json.dumps(event.dict(), indent=2))

Here's the list of JSON events I got back: https://gist.github.com/simonw/7d93036f2a0a9b8b2bf20c452abe9f06

Again it included annotations returned during the stream:

https://gist.github.com/simonw/7d93036f2a0a9b8b2bf20c452abe9f06#file-events-txt-L1943-L1983

{
  "content_index": 0,
  "delta": " in",
  "item_id": "msg_67db50787ae08192b353fa40df9a76eb0cbdaafd19f5f92c",
  "output_index": 1,
  "type": "response.output_text.delta"
}
{
  "content_index": 0,
  "delta": " information",
  "item_id": "msg_67db50787ae08192b353fa40df9a76eb0cbdaafd19f5f92c",
  "output_index": 1,
  "type": "response.output_text.delta"
}
{
  "content_index": 0,
  "delta": " quality",
  "item_id": "msg_67db50787ae08192b353fa40df9a76eb0cbdaafd19f5f92c",
  "output_index": 1,
  "type": "response.output_text.delta"
}
{
  "annotation": {
    "file_id": "file-QDeY5qs4SjfyYarQ2onMK6",
    "index": 1437,
    "type": "file_citation",
    "filename": "deep_research_blog.pdf"
  },
  "annotation_index": 0,
  "content_index": 0,
  "item_id": "msg_67db50787ae08192b353fa40df9a76eb0cbdaafd19f5f92c",
  "output_index": 1,
  "type": "response.output_text.annotation.added"
}
{
  "content_index": 0,
  "delta": ".",
  "item_id": "msg_67db50787ae08192b353fa40df9a76eb0cbdaafd19f5f92c",
  "output_index": 1,
  "type": "response.output_text.delta"
}

It looks to me like that file annotation isn't attached to a range within the response, it's attached to a single index where it was output - I can't quite figure out what the annotation_index and content_index and output_index and annotation.index mean but none of those look like they refer to the output text being generated.

simonw · 2025-03-19T23:26:45Z

Side note: here's the text that OpenAI store in the vector store for that PDF https://cdn.openai.com/API/docs/deep_research_blog.pdf

https://gist.github.com/simonw/9f0e4385e42291fb743628f71e87c0b6?permalink_comment_id=5503171#gistcomment-5503171

The text portion is 10,007 tokens according to ttok.

I got that with:

data = client.vector_stores.files.content(
    file_id='file-QDeY5qs4SjfyYarQ2onMK6',
    vector_store_id=vector_store.id
).model_dump()

This:

client.vector_stores.files.retrieve(file_id='file-QDeY5qs4SjfyYarQ2onMK6', vector_store_id=vector_store.id)

Gave me this:

{
  "id": "file-QDeY5qs4SjfyYarQ2onMK6",
  "created_at": 1742426135,
  "last_error": null,
  "object": "vector_store.file",
  "status": "completed",
  "usage_bytes": 66539,
  "vector_store_id": "vs_67db4ffa373c81918dead92b7a593921",
  "attributes": {},
  "chunking_strategy": {
    "static": {
      "chunk_overlap_tokens": 400,
      "max_chunk_size_tokens": 800
    },
    "type": "static"
  }
}

simonw · 2025-03-19T23:35:56Z

Ooh, adding include = ["file_search_call.results"], is interesting:

import json

response = client.responses.create(
    model="gpt-4o-mini",
    input="What is deep research by OpenAI?",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store.id]
    }],
    stream=True,
    include = ["file_search_call.results"],
)
for event in response:
    print(json.dumps(event.dict(), indent=2))

That added a huge JSON event part way through when it ran the search. That event started like this:

{
  "item": {
    "id": "fs_67db545e5e488192873a64ea54abcb480b079e72bb72a98e",
    "queries": [
      "deep research by OpenAI",
      "What is deep research?",
      "OpenAI deep research concept"
    ],
    "status": "completed",
    "type": "file_search_call",
    "results": [
      {
        "attributes": {},
        "file_id": "file-QDeY5qs4SjfyYarQ2onMK6",
        "filename": "deep_research_blog.pdf",
        "score": 0.981739387423489,
        "text": "Introducing deep research | OpenAI..."

Does this count as an annotation? Not entirely clear - the fact that it was output at the start of the response isn't really that interesting, and it's included a second time in that final response.completed event. Maybe stash this in the response_json if it was requested (perhaps add an option for it?) but otherwise don't worry about it, and don't stuff it in the new annotations mechanism.

Refs #716 - describes a yield llm.Chunk() mechanism that does not yet exist.

simonw · 2025-03-20T05:02:06Z

I wrote the plugin author documentation for the new feature, including a description of how yield Chunk(...) inside .execute() should work which isn't actually implemented yet: https://github.com/simonw/llm/blob/43ccbb7f92828550e48373e4c3840c26e111d144/docs/plugins/advanced-model-plugins.md#models-that-return-annotations

simonw · 2025-03-20T10:38:20Z

Idea: if annotations have a clear ID can include that and then use to deduplicate in case where annotation is both streamed and then repeated at the end.

simonw · 2025-03-23T20:27:35Z

I'm going to implement Claude citations as part of this, to help test the new mechanism:

Support for the Claude Citations API llm-anthropic#1

simonw added the design label Jan 24, 2025

simonw added the attachments label Jan 24, 2025

simonw changed the title ~~Design an abstraction for responses that are not just a stream of text~~ Design annotations abstraction for responses that are not just a stream of plain text Jan 25, 2025

simonw mentioned this issue Jan 25, 2025

Support for the Claude Citations API simonw/llm-anthropic#1

Open

ar-jan mentioned this issue Feb 2, 2025

Image generation ar-jan/llm-venice#4

Closed

ar-jan mentioned this issue Feb 23, 2025

Utilize web search citations ar-jan/llm-venice#21

Open

simonw mentioned this issue Feb 24, 2025

Support Claude 3.7 Sonnet, including its new thinking mode simonw/llm-anthropic#14

Closed

simonw mentioned this issue Feb 25, 2025

Mechanism for visible thinking tokens #770

Open

simonw mentioned this issue Mar 18, 2025

Get new OpenAI features like web_search tool working simonw/llm-openai-plugin#9

Open

simonw mentioned this issue Mar 18, 2025

gpt-4o-search-preview and gpt-4o-mini-search-preview chat completion models #837

Open

simonw added a commit that referenced this issue Mar 19, 2025

Annotations prototype, refs #716

563a483

simonw added a commit that referenced this issue Mar 19, 2025

Rename chunk.data to chunk.annotation

90634b9

Refs #716 (comment)

simonw mentioned this issue Mar 19, 2025

Support responses API so we can support o1-pro #839

Closed

simonw added a commit that referenced this issue Mar 20, 2025

Advanced plugin docs for supporting annotations

43ccbb7

Refs #716 - describes a yield llm.Chunk() mechanism that does not yet exist.

simonw added a commit that referenced this issue Mar 21, 2025

Update .excute() signature to allow str or Chunk, refs #716

236c808

simonw changed the title ~~Design annotations abstraction for responses that are not just a stream of plain text~~ Annotations abstraction for responses that are not just a stream of plain text Mar 23, 2025

simonw pinned this issue Mar 23, 2025

simonw mentioned this issue Mar 23, 2025

Annotations #847

Draft

8 tasks

simonw added a commit that referenced this issue Mar 23, 2025

Various mypy fixes relating to Union[Chunk, str] - refs #716

2ce2510

simonw mentioned this issue Mar 30, 2025

Add Response.thoughts() #867

Open

Annotations abstraction for responses that are not just a stream of plain text #716

Annotations abstraction for responses that are not just a stream of plain text #716

Comments

simonw commented Jan 24, 2025 • edited Loading

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

danbri commented Jan 24, 2025

ericfeunekes commented Jan 24, 2025

ericfeunekes commented Jan 24, 2025 • edited by simonw Loading

banahogg commented Jan 24, 2025

simonw commented Jan 25, 2025

simonw commented Jan 25, 2025 • edited Loading

simonw commented Jan 25, 2025 • edited Loading

simonw commented Jan 25, 2025 • edited Loading

simonw commented Jan 25, 2025 • edited Loading

simonw commented Jan 25, 2025 • edited Loading

simonw commented Jan 25, 2025 • edited Loading

simonw commented Jan 25, 2025

Quantisan commented Jan 30, 2025

simonw commented Feb 14, 2025 • edited Loading

simonw commented Feb 14, 2025 • edited Loading

simonw commented Feb 24, 2025

simonw commented Mar 11, 2025

simonw commented Mar 18, 2025

simonw commented Mar 18, 2025 • edited Loading

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 19, 2025

simonw commented Mar 19, 2025

simonw commented Mar 19, 2025

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 19, 2025

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 19, 2025 • edited Loading

simonw commented Mar 20, 2025

simonw commented Mar 20, 2025

simonw commented Mar 23, 2025

simonw commented Jan 24, 2025 •

edited

Loading

ericfeunekes commented Jan 24, 2025 •

edited by simonw

Loading

simonw commented Jan 25, 2025 •

edited

Loading

simonw commented Jan 25, 2025 •

edited

Loading

simonw commented Jan 25, 2025 •

edited

Loading

simonw commented Jan 25, 2025 •

edited

Loading

simonw commented Jan 25, 2025 •

edited

Loading

simonw commented Jan 25, 2025 •

edited

Loading

simonw commented Feb 14, 2025 •

edited

Loading

simonw commented Feb 14, 2025 •

edited

Loading

simonw commented Mar 18, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading

simonw commented Mar 19, 2025 •

edited

Loading