Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a concise schema DSL alternative to JSON schema #790

Closed
simonw opened this issue Feb 27, 2025 · 7 comments
Closed

Implement a concise schema DSL alternative to JSON schema #790

simonw opened this issue Feb 27, 2025 · 7 comments
Labels
enhancement New feature or request schemas

Comments

@simonw
Copy link
Owner

simonw commented Feb 27, 2025

Typing JSON schema manually is awful. I want to be able to run something like this instead:

llm 'invent a dog' --schema 'name,bio,fave_toys'
@simonw simonw added the enhancement New feature or request label Feb 27, 2025
@simonw simonw added this to the 0.23 (schemas) milestone Feb 27, 2025
@simonw simonw added the schemas label Feb 27, 2025
@simonw
Copy link
Owner Author

simonw commented Feb 27, 2025

I came up with an alternative syntax which feels quite good. Here's a description from draft documentation:

Alternative schema syntax

JSON schema's can be time-consuming to construct by hand. LLM also supports a concise alternative syntax for specifying a schema.

A simple schema for an object with two string properties called name and bio looks like this:

name, bio

You can include type information by adding a type indicator after the property name, separated by a space.

name, bio, age int

Supported types are int for integers, float for floating point numbers, str for strings (the default) and bool for true/false booleans.

To include a description of the field to act as a hint to the model, add one after a colon:

name: the person's name, age int: their age, bio: a short bio

If your schema is getting long you can switch from comma-separated to newline-separated, which also allows you to use commas in those descriptions:

name: the person's name
age int: their age
bio: a short bio, no more than three sentences

This format is supported by the --schema option. The format will be detected any time you provide a string with at least one space that doesn't start with a { (indicating JSON):

llm --schema 'name,description,fave_toy' 'invent a dog'

To return multiple items matching your schema, use the --schema-multi option. This is equivalent to using --schema with a JSON schema that specifies an items key containing multiple objects.

llm --schema-multi 'name,description,fave_toy' 'invent 3 dogs'

The Python utility function llm.utils.build_json_schema(schema) can be used to convert this syntax into the equivalent JSON schema dictionary.

@simonw
Copy link
Owner Author

simonw commented Feb 27, 2025

I dumped that documentation into Claude 3.7 Sonnet and got it to implement the parser plus tests: https://claude.ai/share/0c76d2e8-3702-4768-93da-6deb427dd09d

@simonw
Copy link
Owner Author

simonw commented Feb 27, 2025

Hah, got this working:

 llm --schema 'name,age int,vibes: as a haiku' 'invent a dog'

Works with llm logs too:

llm logs --schema 'name,age int,vibes: as a haiku' --data | jq
{
  "name": "Barkley",
  "age": 5,
  "vibes": "Joyful playfulness,\nChasing dreams in fields of green,\nLoyal friend always."
}

@simonw
Copy link
Owner Author

simonw commented Feb 27, 2025

mypy is failing.

I pulled out just the new function into code.py and ran this:

(files-to-prompt code.py -n; mypy code.py) | llm -m claude-3.7-sonnet -o thinking 1 'fix these errors'

Result: https://gist.github.com/simonw/efd28bead811ca651171ebb822ffb3ed

simonw added a commit that referenced this issue Feb 27, 2025
@simonw
Copy link
Owner Author

simonw commented Feb 27, 2025

I made this a documented Python utility function called llm.schema_dsl(schema).

@simonw
Copy link
Owner Author

simonw commented Feb 27, 2025

@simonw
Copy link
Owner Author

simonw commented Feb 27, 2025

I added a debug tool is #793:

llm schemas dsl 'name,age int'
{
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "age": {
      "type": "integer"
    }
  },
  "required": [
    "name",
    "age"
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request schemas
Projects
None yet
Development

No branches or pull requests

1 participant