-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connect the Retriever from chromDB to the Retriever in RetrieverTool #3520
Comments
Hello, @fyakghoon2266! I'm here to help you with the issue you're experiencing. I can assist with bug resolution, answer questions, and guide you on becoming a contributor. The issue you're encountering with connecting the Retriever from ChromaDB to the Retriever in RetrieverTool might be due to specific connection requirements and configurations. Here are some details that might help you troubleshoot and resolve the issue:
If these steps do not resolve your issue, it might be worth checking for any recent updates or planned updates in the Langflow repository that address this integration. Additionally, reviewing the related issue you mentioned (issues#2527) might provide further insights or workarounds. |
Hey @fyakghoon2266, how are you? Thanks for your message. I will update the component so you can connect and use it in the next release version. For now, if you want to connect and use it, just open the code of the RetrieverTool as shown in the image below and add this line: "input_types": ["Retriever"] You can check the image to see how it should look to work. Don't forget the commas. Thanks. PR: #3601 |
Hi @Cristhianzl , thank you so much for your help. I have managed to connect Chroma DB with RetrieverTool, but strangely, when I use RetrieverTool, it doesn't seem to access my Chroma DB to retrieve data. As a result, every response I get is generated by the OpenAI LLM model rather than from the information I have in Chroma DB. However, when I use the same connection method in Flowise, it does find the data. I later checked the logs in LangSmith and found that my retriever did not produce any output. Could it be that I made some mistake, or is there another bug I might have overlooked? Thank you very much for your assistance, and I wish you all the best. ![]() |
Since this component is under the "experimental" tab, it might not be fully functional. I'll take a look and see what I can do. Thanks for bringing this to our attention! |
"Thank you @Cristhianzl , I will check the situation together. I will try to import the new data into ChromaDB and give it another try. If I discover anything new, I will update you! Thank you very much for your help, and I wish you all the best." |
Hi @fyakghoon2266, I've created a new flow to validate. Please ensure that you provide the correct input to ChromaDB; this will help the agent respond accurately. You can check the Tool's logs for details (click on the little eye icon). In my case, the response was correct when I queried information that I had fed into the DB. Let me know if it worked for you! Thanks! |
Hi @Cristhianzl, thanks again for your help. I later re-imported a CSV file using the File tool, and I found that I could connect to ChromaDB in my process. However, I'm a bit confused about something: previously, I created a similar ChromaDB in Flowise for another project, and I stored my previous data in it. But for some reason, if I run the process without adding the File tool, it doesn't retrieve the data stored in ChromaDB in Flowise, and instead, the LLM model generates an answer on its own, ignoring the information in ChromaDB. Could it be because Langflow and Flowise have different ways of importing and retrieving data from ChromaDB? This is quite puzzling to me. Thank you very much for your assistance. I will continue testing. My goal is to import data into ChromaDB so that I don't need to use the File tool to import data again. Since I have a large amount of data to embed, I plan to use multiple ChromaDBs and multiple Retriever tools, with each ChromaDB responsible for different content. Some will specifically store health-related questions, while others will store financial-related data. I want to use the Retriever to automatically search across different ChromaDBs to provide the best answers to users. Here is the related flow and log after I successfully added the File tool. This is the response log from ChromaDB. This is the content of my test CSV file. Thank you again for your help. Langflow has really helped me a lot, and I hope it continues to improve. I also hope to complete this project using Langflow. Thank you very much! |
Hi @fyakghoon2266, I believe the difference lies in the agents. Langflow and Flowise use different agents, so their responses may naturally vary. Your concept and flow seem fine, but you might want to consider merging the flows with a prompt. For instance, you could create one flow with a retriever focused on a specific topic and another flow for a different topic, then merge them using a prompt. The key is to ask about specific content to avoid confusing the agent. Thank you for choosing Langflow!! |
Hi @Cristhianzl , thank you for your assistance and suggestions; they have been very helpful to me. However, I am currently facing an issue. Every time I import data into ChromaDB, I can successfully retrieve the data using the retriever tool. So, I usually remove the File tool because I assume the data is already stored inside. But when I reopen LangFlow and restart the chatbox, I find that I cannot query the previously imported data from the original ChromaDB collection. Is it because the data is not automatically saved into ChromaDB? Or do I need to perform an operation similar to "upsert" in Flowise? Or are there any misunderstandings on my part that are causing this failure? Thank you very much for your help and support, and I wish you all the best! This is the backend log after I removed the File tool, restarted LangFlow, and asked the chatbox the same question. It did not retrieve any of the information that I believed was previously saved in my ChromaDB. |
I’ve been reviewing the process, and once you ingest data into ChromaDB, a new database should automatically be created. After populating your database with embedding data, maintaining the connection is no longer necessary. However, you'll still need the retriever tool to access the data. Please ensure that your database is being created and populated correctly. Try to find the .sqlite3 file and try to take a look. For me worked fine after disconnecting the ingest data and restarting the backend, so that means that the agent is reading from the DB. No problem, feel free to contact us anytime :) |
Hi @Cristhianzl , hello again, and thank you very much for your assistance. I apologize for not clearly explaining many details. My ChromaDB is actually set up on AWS cloud using EC2, so I have a host that allows me to connect. I used your method to let LangFlow create a collection for me, but after running it, I couldn't find the collection when I tried to connect to ChromaDB using Python. The name of this collection is "langflow." I'm not sure if I'm using LangFlow incorrectly. Additionally, I noticed that LangFlow stores the data I want to upload to ChromaDB in the .catch folder on my Linux server (yes, my LangFlow is installed on my own Linux server). So, even if I restart LangFlow and connect to ChromaDB, it still says that the collection does not exist. Here are three images: The first one shows my flow process. Additionally, I tried creating a collection named "flowise" in Flowise. This time, I was able to find it when searching through the Python script. I suspect this might be because Flowise requires an upset operation to upload data to the database when using any DB, allowing the collection to be found. However, in LangFlow, I didn't do this, which is why the collection could not be found. But I’m not sure how to achieve this in LangFlow. Here is a diagram of my Flowise flow. This image shows the results of running a Python script after creating the ChromaDB collection in Flowise. In the last line that there is indeed a collection named "flowise." Thank you so much for your patience in helping me solve this problem. I sincerely appreciate it! Wishing you all the best! |
Hey @fyakghoon2266 how are you? So, I was looking at your architecture, something came to my mind. ![]() So when you start a new Chroma server, I think you are using a command like this: I'm pretty sure that you have to specify the path of you chroma data (--path ./my_chroma_data) to be the same as your "Persist Directory" described on your flow. Please try to build this on your setup (both paths must match) and let me know if It worked. Wishing you all the best too! :) |
Hi @Cristhianzl , hello! I had a very fulfilling weekend. I built a small hiding space for my cat using thick cardboard. Thank you so much for your help and for always assisting me with my issues. I apologize if I didn’t explain my problem clearly. In this project, I set up a chromadb on AWS cloud, and I hope to connect directly to this chromadb using langflow in the future to access the data stored inside. Therefore, it seems that I shouldn’t store data directly in any place within langflow. The “Persist Directory” seems to be intended for local directories, but I’m not sure if I have misunderstood this. I’m wondering if langflow's chromadb can currently connect to a chromadb set up on the cloud? If so, is there an error in my configuration, or is there something I might have overlooked? Because I hope that in the future, other members of my team can use langflow directly to connect to the chromadb we set up on the cloud and use different collections to store and retrieve data. Thank you once again for your help; you’ve really helped me solve many difficulties. My supervisor also hopes that in the future, the company will prioritize using langflow for any chatbox projects. Thank you very much! |
Hey @fyakghoon2266, haha that's nice! I think your cat enjoyed it :) I understand your architecture. I mean, the "Persist Directory" is not a required field—it's just if you really want to persist the data locally. You can find all the other ChromaDB configurations in the advanced tab of the component: If you scroll down on the dialog panel, you can see some DB configurations. I think it's just a detail that you're missing in these configurations. Maybe in ChromaDocs you can find something. Another doc that is very helpful is the Langchain docs because we use it for integration with Langflow: ChromaDocsLangchain. We're happy to hear that! Hope we can build something great together (Langflow + your company) :) |
Hi @Cristhianzl ! Thank you so much for all your replies and the documents you provided. After studying them, I found that when I modified the code in chromadb to the version shown below (see the image), my data could be successfully saved to my cloud-based chromadb. A new collection was also created, and I verified on the cloud that the newly created collection does indeed exist and can be used by other Flows. ![]() However, I'm not sure if it's due to certain submodule version issues within chromadb in langflow, or perhaps a bug in chromadb itself. If I don't modify the code, it's completely impossible to create a new collection or save data to chromadb. I have tried all the options in the Advanced Settings, but none of them resolved the issue. After reviewing the documents you provided, I decided to modify the Python code inside. Additionally, I conducted a test to compare the current method of creating a client in chromadbx within langflow and the method I previously used in langchain to see if there are any differences in reading chromadb collections. The result showed that when using the current method in langflow, the data read was an empty list, while using the Http method to create a client allowed me to access the data (as shown in the code and results in the image below). So, I suspect the issue lies within chromadb itself. However, since I modified the code to create the client using chromadb.HttpClient, this change doesn't seem to offer great compatibility and only fits my own use case. I will need to spend more time researching how to modify the code for better compatibility. Thank you so much for your assistance during this time; you have helped me solve so many issues that I wouldn’t have been able to resolve without your support. Because of these solutions, my company has decided to start deploying Langflow on AWS cloud next week to allow other projects and colleagues to start using it. In the future, we hope that all our company’s chatbots can be generated through Langflow. |
Hey @fyakghoon2266, I'm very happy that you were able to achieve what you wanted. I think this is the magic of Langflow—you can create, update, or edit anything you need. You're free to customize your components and make them work exactly as you want. No worries, I'm here for that; it's my job! Feel free to contact us anytime. We're always here to help you. It was a pleasure speaking with you. Wishing you all the best my friend! :) |
Hi @Cristhianzl, |
Bug Description
Hello, I am trying to connect the Retriever from chromDB to the Retriever in RetrieverTool, but it’s not connecting. Additionally, I noticed that the recommended connection options for the Retriever in RetrieverTool are MultiQueryRetriever and VectorStoreRetriever. However, these two options are not suitable and cannot connect to the Retriever in RetrieverTool.
I also found a related issue in issues#2527, but the responses there did not resolve the problem I’m currently facing. I’m not sure if this is a bug. Is there any way to handle this, or will these features be available in the next update? Thank you very much.
Reproduction
Expected behavior
I currently have some data stored in my chromaDB, but I have multiple ChromaDBs. I would like to use the RetrieverTool to integrate data from multiple DBs, allowing user queries to search across these DBs to find the most suitable answer. At the end, I will use the Tool Agent to help handle the final response. However, the Retriever in my chromaDB cannot connect to the Retriever in RetrieverTool. I hope there is a way to solve this issue. Thank you very much.
Who can help?
@carlosrcoelho
Operating System
Ubuntu 22.04.4 LTS
Langflow Version
1.0.15
Python Version
3.10
Screenshot
No response
Flow File
No response
The text was updated successfully, but these errors were encountered: