Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect the Retriever from chromDB to the Retriever in RetrieverTool #3520

Closed
fyakghoon2266 opened this issue Aug 23, 2024 · 17 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@fyakghoon2266
Copy link

Bug Description

Hello, I am trying to connect the Retriever from chromDB to the Retriever in RetrieverTool, but it’s not connecting. Additionally, I noticed that the recommended connection options for the Retriever in RetrieverTool are MultiQueryRetriever and VectorStoreRetriever. However, these two options are not suitable and cannot connect to the Retriever in RetrieverTool.

I also found a related issue in issues#2527, but the responses there did not resolve the problem I’m currently facing. I’m not sure if this is a bug. Is there any way to handle this, or will these features be available in the next update? Thank you very much.

Reproduction

截圖 2024-08-23 上午10 29 54

Expected behavior

I currently have some data stored in my chromaDB, but I have multiple ChromaDBs. I would like to use the RetrieverTool to integrate data from multiple DBs, allowing user queries to search across these DBs to find the most suitable answer. At the end, I will use the Tool Agent to help handle the final response. However, the Retriever in my chromaDB cannot connect to the Retriever in RetrieverTool. I hope there is a way to solve this issue. Thank you very much.

Who can help?

@carlosrcoelho

Operating System

Ubuntu 22.04.4 LTS

Langflow Version

1.0.15

Python Version

3.10

Screenshot

No response

Flow File

No response

@fyakghoon2266 fyakghoon2266 added the bug Something isn't working label Aug 23, 2024
Copy link

dosubot bot commented Aug 23, 2024

Hello, @fyakghoon2266! I'm here to help you with the issue you're experiencing. I can assist with bug resolution, answer questions, and guide you on becoming a contributor.

The issue you're encountering with connecting the Retriever from ChromaDB to the Retriever in RetrieverTool might be due to specific connection requirements and configurations. Here are some details that might help you troubleshoot and resolve the issue:

  1. RetrieverTool Configuration:
    The RetrieverTool requires specific configurations for the retriever, name, and description. These configurations are defined in the build_config method of the RetrieverToolComponent class.

    def build_config(self):
        return {
            "retriever": {
                "display_name": "Retriever",
                "info": "Retriever to interact with",
                "type": BaseRetriever,
            },
            "name": {"display_name": "Name", "info": "Name of the tool"},
            "description": {"display_name": "Description", "info": "Description of the tool"},
        }
    
    def build(
        self,
        retriever: BaseRetriever,
        name: str,
        description: str,
    ) -> Tool:
        return create_retriever_tool(
            retriever=retriever,
            name=name,
            description=description,
        )

    Ensure that your retriever is correctly configured with these parameters [1].

  2. ChromaDB Configuration:
    The Chroma class in ChromaDB can be configured with various settings, including HTTP and gRPC protocols, with optional SSL support. Here is an example configuration:

    def build(
            self,
            collection_name: str,
            embedding: Embeddings,
            chroma_server_ssl_enabled: bool,
            index_directory: Optional[str] = None,
            inputs: Optional[List[Record]] = None,
            chroma_server_cors_allow_origins: Optional[str] = None,
            chroma_server_host: Optional[str] = None,
            chroma_server_port: Optional[int] = None,
            chroma_server_grpc_port: Optional[int] = None,
        ) -> Union[VectorStore, BaseRetriever]:
        # Chroma settings
        chroma_settings = None
    
        if chroma_server_host is not None:
            chroma_settings = chromadb.config.Settings(
                chroma_server_cors_allow_origins=chroma_server_cors_allow_origins or None,
                chroma_server_host=chroma_server_host,
                chroma_server_port=chroma_server_port or None,
                chroma_server_grpc_port=chroma_server_grpc_port or None,
                chroma_server_ssl_enabled=chroma_server_ssl_enabled,
            )

    Ensure that your ChromaDB instance is correctly configured with these settings [2].

  3. MultiQueryRetriever and VectorStoreRetriever:
    If the recommended options (MultiQueryRetriever and VectorStoreRetriever) are not suitable, you might need to look into custom configurations or other retriever options that can interface with both ChromaDB and RetrieverTool.

  4. Integration of Multiple ChromaDBs:
    Currently, there is no specific information available regarding the integration of multiple ChromaDBs with the RetrieverTool. You might need to implement a custom solution to handle multiple databases and aggregate the results.

If these steps do not resolve your issue, it might be worth checking for any recent updates or planned updates in the Langflow repository that address this integration. Additionally, reviewing the related issue you mentioned (issues#2527) might provide further insights or workarounds.

To continue talking to Dosu, mention @dosu.

@Cristhianzl
Copy link
Member

Hey @fyakghoon2266, how are you? Thanks for your message.

I will update the component so you can connect and use it in the next release version. For now, if you want to connect and use it, just open the code of the RetrieverTool as shown in the image below and add this line:

"input_types": ["Retriever"]

You can check the image to see how it should look to work. Don't forget the commas.

Thanks.

image

image

PR: #3601

@fyakghoon2266
Copy link
Author

Hi @Cristhianzl , thank you so much for your help. I have managed to connect Chroma DB with RetrieverTool, but strangely, when I use RetrieverTool, it doesn't seem to access my Chroma DB to retrieve data. As a result, every response I get is generated by the OpenAI LLM model rather than from the information I have in Chroma DB. However, when I use the same connection method in Flowise, it does find the data. I later checked the logs in LangSmith and found that my retriever did not produce any output. Could it be that I made some mistake, or is there another bug I might have overlooked?

Thank you very much for your assistance, and I wish you all the best.

截圖 2024-08-29 上午10 58 33

messageImage_1724900543377

@Cristhianzl
Copy link
Member

Cristhianzl commented Aug 29, 2024

@fyakghoon2266

Since this component is under the "experimental" tab, it might not be fully functional. I'll take a look and see what I can do.

Thanks for bringing this to our attention!

@Cristhianzl Cristhianzl reopened this Aug 29, 2024
@fyakghoon2266
Copy link
Author

"Thank you @Cristhianzl , I will check the situation together. I will try to import the new data into ChromaDB and give it another try. If I discover anything new, I will update you! Thank you very much for your help, and I wish you all the best."

@Cristhianzl
Copy link
Member

Cristhianzl commented Aug 29, 2024

Hi @fyakghoon2266,

I've created a new flow to validate. Please ensure that you provide the correct input to ChromaDB; this will help the agent respond accurately.

You can check the Tool's logs for details (click on the little eye icon). In my case, the response was correct when I queried information that I had fed into the DB.

image

image

Let me know if it worked for you!

Thanks!

@fyakghoon2266
Copy link
Author

Hi @Cristhianzl, thanks again for your help. I later re-imported a CSV file using the File tool, and I found that I could connect to ChromaDB in my process. However, I'm a bit confused about something: previously, I created a similar ChromaDB in Flowise for another project, and I stored my previous data in it. But for some reason, if I run the process without adding the File tool, it doesn't retrieve the data stored in ChromaDB in Flowise, and instead, the LLM model generates an answer on its own, ignoring the information in ChromaDB. Could it be because Langflow and Flowise have different ways of importing and retrieving data from ChromaDB? This is quite puzzling to me.

Thank you very much for your assistance. I will continue testing. My goal is to import data into ChromaDB so that I don't need to use the File tool to import data again. Since I have a large amount of data to embed, I plan to use multiple ChromaDBs and multiple Retriever tools, with each ChromaDB responsible for different content. Some will specifically store health-related questions, while others will store financial-related data. I want to use the Retriever to automatically search across different ChromaDBs to provide the best answers to users.

Here is the related flow and log after I successfully added the File tool.
image
image

This is the response log from ChromaDB.
image

This is the content of my test CSV file.
image

Thank you again for your help. Langflow has really helped me a lot, and I hope it continues to improve. I also hope to complete this project using Langflow. Thank you very much!

@Cristhianzl
Copy link
Member

Hi @fyakghoon2266,

I believe the difference lies in the agents. Langflow and Flowise use different agents, so their responses may naturally vary. Your concept and flow seem fine, but you might want to consider merging the flows with a prompt. For instance, you could create one flow with a retriever focused on a specific topic and another flow for a different topic, then merge them using a prompt.

The key is to ask about specific content to avoid confusing the agent.

Thank you for choosing Langflow!!
We're very happy to hear that you find our tool helpful! :)

@fyakghoon2266
Copy link
Author

Hi @Cristhianzl , thank you for your assistance and suggestions; they have been very helpful to me. However, I am currently facing an issue. Every time I import data into ChromaDB, I can successfully retrieve the data using the retriever tool. So, I usually remove the File tool because I assume the data is already stored inside. But when I reopen LangFlow and restart the chatbox, I find that I cannot query the previously imported data from the original ChromaDB collection. Is it because the data is not automatically saved into ChromaDB? Or do I need to perform an operation similar to "upsert" in Flowise? Or are there any misunderstandings on my part that are causing this failure?

Thank you very much for your help and support, and I wish you all the best!

This is the backend log after I removed the File tool, restarted LangFlow, and asked the chatbox the same question. It did not retrieve any of the information that I believed was previously saved in my ChromaDB.

image

@Cristhianzl
Copy link
Member

Cristhianzl commented Aug 30, 2024

@fyakghoon2266

I’ve been reviewing the process, and once you ingest data into ChromaDB, a new database should automatically be created.

image

After populating your database with embedding data, maintaining the connection is no longer necessary. However, you'll still need the retriever tool to access the data.

image

Please ensure that your database is being created and populated correctly. Try to find the .sqlite3 file and try to take a look.

For me worked fine after disconnecting the ingest data and restarting the backend, so that means that the agent is reading from the DB.

No problem, feel free to contact us anytime :)

@fyakghoon2266
Copy link
Author

Hi @Cristhianzl , hello again, and thank you very much for your assistance. I apologize for not clearly explaining many details.

My ChromaDB is actually set up on AWS cloud using EC2, so I have a host that allows me to connect. I used your method to let LangFlow create a collection for me, but after running it, I couldn't find the collection when I tried to connect to ChromaDB using Python. The name of this collection is "langflow." I'm not sure if I'm using LangFlow incorrectly. Additionally, I noticed that LangFlow stores the data I want to upload to ChromaDB in the .catch folder on my Linux server (yes, my LangFlow is installed on my own Linux server). So, even if I restart LangFlow and connect to ChromaDB, it still says that the collection does not exist.

Here are three images:

The first one shows my flow process.
The second one shows the result of my Python script trying to search for the collection name "langflow," along with the log and code.
The third one is the log from LangFlow, where I realized that my data is stored on my Linux server.
The fourth picture is the source of my third picture.
image

image

image

image

Additionally, I tried creating a collection named "flowise" in Flowise. This time, I was able to find it when searching through the Python script. I suspect this might be because Flowise requires an upset operation to upload data to the database when using any DB, allowing the collection to be found. However, in LangFlow, I didn't do this, which is why the collection could not be found. But I’m not sure how to achieve this in LangFlow.

Here is a diagram of my Flowise flow.
The dot in the top right corner of the image represents the upset function. After I execute it, the Flowise collection will appear in ChromaDB.

image

This image shows the results of running a Python script after creating the ChromaDB collection in Flowise. In the last line that there is indeed a collection named "flowise."

image

Thank you so much for your patience in helping me solve this problem. I sincerely appreciate it! Wishing you all the best!

@Cristhianzl
Copy link
Member

Cristhianzl commented Sep 2, 2024

Hey @fyakghoon2266 how are you?
Hope you had a good weekend.

So, I was looking at your architecture, something came to my mind.
When you fill this field "Persist Directory" is the local directory that you DB will be saved. That means on the current folder of Langflow, on a folder named "langflow" a new .db file will be created and filled.

image

image

So when you start a new Chroma server, I think you are using a command like this:
chroma run --host localhost --port 8000 --path ./my_chroma_data

I'm pretty sure that you have to specify the path of you chroma data (--path ./my_chroma_data) to be the same as your "Persist Directory" described on your flow.

Please try to build this on your setup (both paths must match) and let me know if It worked.

Wishing you all the best too! :)
Have a lovely week.

@fyakghoon2266
Copy link
Author

Hi @Cristhianzl , hello! I had a very fulfilling weekend. I built a small hiding space for my cat using thick cardboard. Thank you so much for your help and for always assisting me with my issues. I apologize if I didn’t explain my problem clearly.

In this project, I set up a chromadb on AWS cloud, and I hope to connect directly to this chromadb using langflow in the future to access the data stored inside. Therefore, it seems that I shouldn’t store data directly in any place within langflow. The “Persist Directory” seems to be intended for local directories, but I’m not sure if I have misunderstood this.

I’m wondering if langflow's chromadb can currently connect to a chromadb set up on the cloud? If so, is there an error in my configuration, or is there something I might have overlooked?

Because I hope that in the future, other members of my team can use langflow directly to connect to the chromadb we set up on the cloud and use different collections to store and retrieve data.

Thank you once again for your help; you’ve really helped me solve many difficulties. My supervisor also hopes that in the future, the company will prioritize using langflow for any chatbox projects. Thank you very much!

@Cristhianzl
Copy link
Member

Hey @fyakghoon2266, haha that's nice! I think your cat enjoyed it :)

I understand your architecture. I mean, the "Persist Directory" is not a required field—it's just if you really want to persist the data locally. You can find all the other ChromaDB configurations in the advanced tab of the component:

image

If you scroll down on the dialog panel, you can see some DB configurations.

image

I think it's just a detail that you're missing in these configurations. Maybe in ChromaDocs you can find something. Another doc that is very helpful is the Langchain docs because we use it for integration with Langflow: ChromaDocsLangchain.

We're happy to hear that! Hope we can build something great together (Langflow + your company) :)

@fyakghoon2266
Copy link
Author

Hi @Cristhianzl ! Thank you so much for all your replies and the documents you provided. After studying them, I found that when I modified the code in chromadb to the version shown below (see the image), my data could be successfully saved to my cloud-based chromadb. A new collection was also created, and I verified on the cloud that the newly created collection does indeed exist and can be used by other Flows.

image

However, I'm not sure if it's due to certain submodule version issues within chromadb in langflow, or perhaps a bug in chromadb itself. If I don't modify the code, it's completely impossible to create a new collection or save data to chromadb. I have tried all the options in the Advanced Settings, but none of them resolved the issue. After reviewing the documents you provided, I decided to modify the Python code inside. Additionally, I conducted a test to compare the current method of creating a client in chromadbx within langflow and the method I previously used in langchain to see if there are any differences in reading chromadb collections. The result showed that when using the current method in langflow, the data read was an empty list, while using the Http method to create a client allowed me to access the data (as shown in the code and results in the image below). So, I suspect the issue lies within chromadb itself. However, since I modified the code to create the client using chromadb.HttpClient, this change doesn't seem to offer great compatibility and only fits my own use case. I will need to spend more time researching how to modify the code for better compatibility.

image

Thank you so much for your assistance during this time; you have helped me solve so many issues that I wouldn’t have been able to resolve without your support. Because of these solutions, my company has decided to start deploying Langflow on AWS cloud next week to allow other projects and colleagues to start using it. In the future, we hope that all our company’s chatbots can be generated through Langflow.
Thank you again for your help! Wishing you all the best!

@Cristhianzl
Copy link
Member

Hey @fyakghoon2266,

I'm very happy that you were able to achieve what you wanted. I think this is the magic of Langflow—you can create, update, or edit anything you need. You're free to customize your components and make them work exactly as you want.

No worries, I'm here for that; it's my job! Feel free to contact us anytime. We're always here to help you.

It was a pleasure speaking with you. Wishing you all the best my friend! :)
I will close this issue, if you need more help in the future, just open up another one, ok?

@fyakghoon2266
Copy link
Author

Hi @Cristhianzl,
Thank you so much! I will continue using LangFlow on the cloud. I appreciate your help in resolving these issues, and I wish you all the best. I hope I can consult you again in the future if I encounter any problems with LangFlow. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants