[BUG] Video File Attachment Error in Big-AGI 2.0 RC1 #761

powyncify · 2025-02-20T01:47:19Z

Environment

Big-AGI 2.0-rc1 deployed on Vercel

Description

Attempting to attach a video file (.mp4) results in an error message, and the file is ignored during processing. This issue occurs across several models but is particularly critical in the Gemini 2.0 models, which are designed to excel in video analysis. Notably, the same .mp4 file processes successfully when uploaded to Gemini AI Studio using the same models, either Gemini Flash 2.0 or Gemini 2.0 Pro Experimental

Device and browser

Edge on Windows 11. Big-AGI 2.0.0-rc1 deployed on Vercel.

Screenshots and more

Willingness to Contribute

🙋‍♂️ Yes, I would like to contribute a fix.

enricoros · 2025-02-20T02:13:34Z

Thanks@powyncify, I confirm that videos are not supported as an input type, for any model.

Great request, I don't recall other people asking for this yet. There are various limitations (e.g. Vercel max request size of 4.5 MB) and infrastructure constraints (Gemini having to upload on some temp storage) and given it's only supported by a single vendor (Gemini) this is probably not gonna come anytime soon in Big-AGI.

I believe Gemini takes videos and converts them to sequence of images, 1 second apart. Doing that would make videos work with any Vision model. Would that be an option, and what's your full use case?

powyncify · 2025-02-20T18:47:57Z

Bonjour @enricoros

Thank you so much for your incredibly quick attention to this bug report! We truly appreciate it. And on a broader note, I want to commend you on your vision in creating Big-AGI. It's shaping up to be what I believe is the best interface for LLMs available (and we've tested many, many of them).

You're absolutely right in recognizing our use of video. We're essentially using it as a convenient way to force image recognition. To clarify our use case: we typically record long strings of text – things like chat messages or social media posts – using video. Then, we perform OCR on the video frames. We don't need the system to process video per se; image processing is sufficient.

Currently, we achieve this by recording the text using video at a low frame rate (around 5 frames per second), which results in relatively small and manageable file sizes. If Big-AGI could handle this workflow by processing the video as a sequence of images, as you suggested, that would be absolutely wonderful and perfectly address our needs.

Thanks again for your responsiveness and for building such an amazing tool!

enricoros · 2025-02-22T09:43:36Z

Thanaks for describing the use case. I could implement "backward compatibility" of Videos to Text (as many LLMs don't even support images yet), with:

extraction of all video frames to images (5 per second in your case), eventually with sub-sampling (e.g. every 1, 5, 24, 30, 60 frames)
OCR of images to text.

This would generate a frame by frame OCR of the input video with text, and any LLM would (e.g. DeepSeek R1) would be able to process it effectively.

Although I can't work on this right now, I like this solution (which is similar to the 1 frame / s conversion to images that Gemini does given videos), as it enables a lossy fallback to images, or to text frames.

Thanks for the idea, hopefully one day we can get to developing this, it would be fun.

(Also note that we have the option to record directly from the screen, but only for 1 frame right now).

powyncify · 2025-02-24T23:36:18Z

@enricoros thank you for considering this feature request! The proposed "backward compatibility" approach for video-to-text conversion is an excellent idea for a future enhancement. We understand that implementing it may not be immediately feasible.

In the meantime, a simpler solution to address the initial bug report (#761) would be to simply allow the upload of video files to models that natively support them, such as Gemini. This would provide a direct workaround for our use case and allow us to leverage the existing video processing capabilities of those models without requiring any additional conversion logic within Big-AGI.

This approach would resolve the error message and enable us to utilize video input with compatible models, while the more comprehensive video-to-text feature can be considered for future development.

Thanks for your consideration!

powyncify added the type: bug Something isn't working label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Video File Attachment Error in Big-AGI 2.0 RC1 #761

[BUG] Video File Attachment Error in Big-AGI 2.0 RC1 #761

powyncify commented Feb 20, 2025 •

edited

Loading

enricoros commented Feb 20, 2025

powyncify commented Feb 20, 2025

enricoros commented Feb 22, 2025

powyncify commented Feb 24, 2025

[BUG] Video File Attachment Error in Big-AGI 2.0 RC1 #761

[BUG] Video File Attachment Error in Big-AGI 2.0 RC1 #761

Comments

powyncify commented Feb 20, 2025 • edited Loading

Environment

Description

Device and browser

Screenshots and more

Willingness to Contribute

enricoros commented Feb 20, 2025

powyncify commented Feb 20, 2025

enricoros commented Feb 22, 2025

powyncify commented Feb 24, 2025

powyncify commented Feb 20, 2025 •

edited

Loading