You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, @yu154! I'm here to help you with any bugs, questions, or contributions you have for Dify. Let's work together to resolve the issue you're experiencing.
The issue with Dify misidentifying the segmentation of a MARKDOWN file likely stems from an incorrect JSON payload structure. Specifically, the 'parent_mode' and 'subchunk_segmentation' parameters might be misplaced. They should be placed under the 'rules' object, not directly under 'process_rule'. Here's the correct structure:
Ensure that the segmentation rules are set according to the document format and context requirements. Proper configuration is crucial for accurate content classification and retrieval [1].
I confirm the same behavior; I've also tried with regular expressions.
Considering that most tools tend to convert scraped / extract content to markdown, this is an issue.
Using \n as suggested in the UI to separate chunks is a bad practice in RAG, because you will end up with very little information in each chunk vector and considering that retrieval in Dify is based only on a max TopK of 10 this ends up generating useless response from the AI.
Self Checks
Dify version
0.15.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Parent segmentation:### content1
Subclass segmentation:#### content2
✔️ Expected Behavior
content1 (Parent )
content2(Subclass)
❌ Actual Behavior
content1 (Parent )
#content2(Parent )
The text was updated successfully, but these errors were encountered: