-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Video File Attachment Error in Big-AGI 2.0 RC1 #761
Comments
Thanks@powyncify, I confirm that videos are not supported as an input type, for any model. Great request, I don't recall other people asking for this yet. There are various limitations (e.g. Vercel max request size of 4.5 MB) and infrastructure constraints (Gemini having to upload on some temp storage) and given it's only supported by a single vendor (Gemini) this is probably not gonna come anytime soon in Big-AGI. I believe Gemini takes videos and converts them to sequence of images, 1 second apart. Doing that would make videos work with any Vision model. Would that be an option, and what's your full use case? |
Bonjour @enricoros Thank you so much for your incredibly quick attention to this bug report! We truly appreciate it. And on a broader note, I want to commend you on your vision in creating Big-AGI. It's shaping up to be what I believe is the best interface for LLMs available (and we've tested many, many of them). You're absolutely right in recognizing our use of video. We're essentially using it as a convenient way to force image recognition. To clarify our use case: we typically record long strings of text – things like chat messages or social media posts – using video. Then, we perform OCR on the video frames. We don't need the system to process video per se; image processing is sufficient. Currently, we achieve this by recording the text using video at a low frame rate (around 5 frames per second), which results in relatively small and manageable file sizes. If Big-AGI could handle this workflow by processing the video as a sequence of images, as you suggested, that would be absolutely wonderful and perfectly address our needs. Thanks again for your responsiveness and for building such an amazing tool! |
Thanaks for describing the use case. I could implement "backward compatibility" of Videos to Text (as many LLMs don't even support images yet), with:
This would generate a frame by frame OCR of the input video with text, and any LLM would (e.g. DeepSeek R1) would be able to process it effectively. Although I can't work on this right now, I like this solution (which is similar to the 1 frame / s conversion to images that Gemini does given videos), as it enables a lossy fallback to images, or to text frames. Thanks for the idea, hopefully one day we can get to developing this, it would be fun. (Also note that we have the option to record directly from the screen, but only for 1 frame right now). |
@enricoros thank you for considering this feature request! The proposed "backward compatibility" approach for video-to-text conversion is an excellent idea for a future enhancement. We understand that implementing it may not be immediately feasible. In the meantime, a simpler solution to address the initial bug report (#761) would be to simply allow the upload of video files to models that natively support them, such as Gemini. This would provide a direct workaround for our use case and allow us to leverage the existing video processing capabilities of those models without requiring any additional conversion logic within Big-AGI. This approach would resolve the error message and enable us to utilize video input with compatible models, while the more comprehensive video-to-text feature can be considered for future development. Thanks for your consideration! |
Environment
Big-AGI 2.0-rc1 deployed on Vercel
Description
Attempting to attach a video file (.mp4) results in an error message, and the file is ignored during processing. This issue occurs across several models but is particularly critical in the Gemini 2.0 models, which are designed to excel in video analysis. Notably, the same .mp4 file processes successfully when uploaded to Gemini AI Studio using the same models, either Gemini Flash 2.0 or Gemini 2.0 Pro Experimental
Device and browser
Edge on Windows 11. Big-AGI 2.0.0-rc1 deployed on Vercel.
Screenshots and more
Willingness to Contribute
The text was updated successfully, but these errors were encountered: