-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Document Extractor tool encountered several abnormal situations #14110
Comments
@yoke1990 Hello there! I'm here to help you with bug reports, answer questions, and even guide you on becoming a contributor. Let's squash those bugs together! 🤖
To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Both of these documents caused errors in the document extraction tool. You can test it |
hi.Thx for your reply 1.The docx file that causes errors can be edited and saved correctly by Word and WPS. The document written by the user using regular Office reports an error. I replied above the docx file, which can reproduce the error. 925523fea21073501b9765ffe0be96eb.docx 2 and 3.If the content exceeds 80000 character limit, The Document Extractor tool is not allows for customized error handling on v0.15.3 ![]() 4.It looks like it will force the reading of an Excel 255 line list. Generate a large number of empty columns You can test it a6af9a7b61b8f03363aca1d45e378e23.xls 5.With the same configuration, the option to upload documents can be displayed normally on iPhone. But Android phones don't work. |
To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Of course, I would prefer Dify to improve compatibility with office documents
|
Self Checks
Dify version
0.15.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
1.Upload docx,then Document Extractor report(Only a few docx documents will have errors):
Failed to extract text from DOC/DOCX: "There is no item named 'NULL' in the archive"
2.Upload file, the file content is too large, exceeding 80000 characters,Document Extractor report:
3.Document Extractor tool exception occurred, there is no exception handling option, only error can be reported
4.Document Extractor tool extracts Excel content, but many NAN useless characters appear. Resulting in over 80000 characters.
It will extract both columns Excel and 255, such as "Unnamed: 253 | Unnamed: 254 | Unnamed: 255". In fact, most sequences are blank. So many 'nan | nan |'
So report “The length of output variable
result
must be less than 80000 characters”5.Android phone on wechat, no file upload option
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
The text was updated successfully, but these errors were encountered: