Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF support for OpenAI vision models #834

Closed
simonw opened this issue Mar 14, 2025 · 5 comments
Closed

PDF support for OpenAI vision models #834

simonw opened this issue Mar 14, 2025 · 5 comments
Labels
attachments documentation Improvements or additions to documentation enhancement New feature or request openai

Comments

@simonw
Copy link
Owner

simonw commented Mar 14, 2025

Recent addition to their API: https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

OpenAI models with vision capabilities can also accept PDF files as input. PDFs can be provided either as Base64-encoded data or via file IDs [...]

@simonw simonw added attachments enhancement New feature or request openai labels Mar 14, 2025
@simonw
Copy link
Owner Author

simonw commented Mar 16, 2025

Here's why:

llm/llm/cli.py

Lines 390 to 404 in 1d552ae

if template:
params = dict(param)
# Cannot be used with system
if system:
raise click.ClickException("Cannot use -t/--template and --system together")
template_obj = load_template(template)
extract = template_obj.extract
extract_last = template_obj.extract_last
if template_obj.schema_object:
schema = template_obj.schema_object
prompt = read_prompt()
try:
prompt, system = template_obj.evaluate(prompt, params)
except Template.MissingVariables as ex:
raise click.ClickException(str(ex))

That read_prompt() function:

llm/llm/cli.py

Lines 318 to 342 in 1d552ae

def read_prompt():
nonlocal prompt, schema
# Is there extra prompt available on stdin?
stdin_prompt = None
if not sys.stdin.isatty():
stdin_prompt = sys.stdin.read()
if stdin_prompt:
bits = [stdin_prompt]
if prompt:
bits.append(prompt)
prompt = " ".join(bits)
if (
prompt is None
and not save
and sys.stdin.isatty()
and not attachments
and not attachment_types
and not schema
):
# Hang waiting for input to stdin (unless --save)
prompt = sys.stdin.read()
return prompt

So in the case of a template the prompt variable has not yet been populated so that function waits for stdin instead.

@simonw
Copy link
Owner Author

simonw commented Mar 16, 2025

If the template uses the $input variable anywhere it should require input (and hence pause waiting on stdin if no input has been provided) - but if it does NOT use that variable it should execute without waiting.

@simonw
Copy link
Owner Author

simonw commented Mar 18, 2025

Turns out this works in the older chat completions API too.

https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

simonw added a commit that referenced this issue Mar 18, 2025
@simonw
Copy link
Owner Author

simonw commented Mar 18, 2025

It's pretty good - it looks like it finally treats the pages as images in addition to extracting text from them:

llm -a 'https://static.simonwillison.net/static/2025/footmat.pdf' 'translate to spanish'

Output:

Here is the translation to Spanish:


Tapete de Desinfección para Pies con Bordes Negros Estándar de Wearwell
Grupo 222-ST

Como sabes, la suciedad, la mugre y los virus pueden vivir en nuestros zapatos durante días. ¿Te preguntas qué puedes hacer para ayudar a prevenir que entren en tu hogar o lugar de trabajo? Considera un Tapete de Desinfección para Pies.
⚬ Llena el tapete desinfectante con solución desinfectante para eliminar contaminantes y gérmenes
⚬ Los raspadores de goma de 1/2" flexibles desalojan contaminantes de las suelas de zapatos y botas
⚬ Disponible en tu elección de bordes amarillos o negros
⚬ Para una mejor descontaminación, las suelas de los zapatos deben sumergirse durante un minuto

Categorías de Wearwell: Control de Contaminación
Entorno Adecuado: Área Húmeda
Opciones de Personalización: Solo Tamaños Estándar
Sólido o Drenaje: Sólido
Material: Goma Natural Resistente a Grasas
** reciclado**: >33%
Garantía: 1 Año

Propiedad Prueba / Resultado
Inflamabilidad MVSS 302 / 'A'; Calificación
Abrasion Taber No Disponible
Rango de Temperatura 10 a 140 F
Coeficiente de Fricción ASTM F1677 / Seco: COF =>1.0 / Húmedo COF = .98
Deflexión por Compresión No Disponible
Resistencia al Desgarro No Disponible
Calificación ESD No Disponible
Resistencia Dielectrica No Disponible
SKU UPC Color Grosor Dimensiones Parte en Stock UOM
222.12x24x32BK 715411700738 Negro 0.5 24" x 32" Y EA
222.12x24x32BYL 715411700738 Negro con Bordes Amarillos 0.5 24" x 32" Y EA

Let me know if you need anything else!

simonw added a commit that referenced this issue Mar 18, 2025
@simonw simonw closed this as completed Mar 19, 2025
@simonw simonw added the documentation Improvements or additions to documentation label Mar 19, 2025
@simonw
Copy link
Owner Author

simonw commented Mar 19, 2025

Actually this still needs documentation.

@simonw simonw reopened this Mar 19, 2025
simonw added a commit that referenced this issue Mar 20, 2025
@simonw simonw closed this as completed in 2217541 Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attachments documentation Improvements or additions to documentation enhancement New feature or request openai
Projects
None yet
Development

No branches or pull requests

1 participant