Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: ⚡️ Speed up function find_last_node by 29,891% #5261

Conversation

misrasaurabh1
Copy link
Contributor

📄 find_last_node in src/backend/base/langflow/graph/graph/utils.py

✨ Performance Summary:

  • Speed Increase: 📈 29,891% (298.91x faster)
  • Runtime Reduction: ⏱️ From 117 milliseconds down to 391 microseconds (best of 47 runs)

📝 Explanation and details

We can optimize the existing code by minimizing the checks inside the loop and improving the lookup operations. Here's an optimized version of the program.

Explanation.

  1. Set for Fast Lookup: We first create a set of all source IDs from the edges. This is efficient because checking for membership in a set is on average O(1) time complexity.
  2. Iterate Through Nodes: We loop through each node and check if its ID is not in the set of source IDs. If a node's ID is not found in the set, it means this node has no outgoing edges and is the "last node".

This approach ensures we only iterate over the edges once to create the set and then do a fast lookup for each node, improving the overall efficiency.


Correctness verification

The new optimized code was tested for correctness. The results are listed below:

Test Status Details
⚙️ Existing Unit Tests 10 Passed See below
🌀 Generated Regression Tests 31 Passed See below
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Coverage 100.0%

⚙️ Existing Unit Tests Details

Click to view details
- graph/test_graph.py

🌀 Generated Regression Tests Details

Click to view details
import pytest  # used for our unit tests
from langflow.graph.graph.utils import find_last_node

# unit tests

# Basic Functionality
def test_single_node_no_edges():
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_multiple_nodes_no_edges():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)  # Any node is valid

def test_multiple_nodes_one_edge():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)

# Edge Cases
def test_empty_nodes_list():
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_empty_edges_list():
    nodes = [{"id": 1}, {"id": 2}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)  # Any node is valid

def test_all_nodes_as_sources():
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)

# Complex Graphs
def test_linear_graph():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)

def test_branched_graph():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}, {"source": 3, "target": 4}]
    codeflash_output = find_last_node(nodes, edges)

# Cyclic Graphs
def test_simple_cycle():
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)

def test_complex_cycle():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}, {"source": 3, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)

# Large Scale Test Cases
def test_large_number_of_nodes_and_edges():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+1} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges)

def test_sparse_connections():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+2} for i in range(0, 998, 2)]
    codeflash_output = find_last_node(nodes, edges)

# Invalid Inputs

def test_edges_with_missing_sources():
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"target": 2}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)

def test_mixed_valid_and_invalid_nodes():
    nodes = [{"id": 1}, {"name": "node2"}, {"id": 3}]
    edges = [{"source": 1, "target": 3}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)

# Mixed Data Types
def test_non_integer_ids():
    nodes = [{"id": "a"}, {"id": "b"}]
    edges = [{"source": "a", "target": "b"}]
    codeflash_output = find_last_node(nodes, edges)

def test_mixed_integer_and_string_ids():
    nodes = [{"id": 1}, {"id": "2"}]
    edges = [{"source": 1, "target": "2"}]
    codeflash_output = find_last_node(nodes, edges)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest  # used for our unit tests
from langflow.graph.graph.utils import find_last_node

# unit tests

def test_single_node_no_edges():
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_multiple_nodes_no_edges():
    nodes = [{"id": 1}, {"id": 2}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_single_edge():
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)

def test_linear_chain():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)

def test_forked_path():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)

def test_converging_path():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 3}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)

def test_disconnected_graph():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)

def test_cyclic_graph():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}, {"source": 3, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)

def test_empty_nodes_and_edges():
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_nodes_with_no_corresponding_edges():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 4, "target": 5}]
    codeflash_output = find_last_node(nodes, edges)


def test_large_number_of_nodes_and_edges():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+1} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges)

def test_sparse_graph():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+1} for i in range(0, 1000, 10)]
    codeflash_output = find_last_node(nodes, edges)

def test_performance_with_maximum_nodes():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+1} for i in range(1000)]
    codeflash_output = find_last_node(nodes, edges)

def test_performance_with_random_edges():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": (i*2) % 1000000} for i in range(1000)]
    codeflash_output = find_last_node(nodes, edges)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

📣 **Feedback**

If you have any feedback or need assistance, feel free to join our Discord community:

Discord

Certainly! We can optimize the existing code by minimizing the checks inside the loop and improving the lookup operations. Here's an optimized version of the program.



### Explanation.
1. **Set for Fast Lookup**: We first create a set of all source IDs from the edges. This is efficient because checking for membership in a set is on average O(1) time complexity.
2. **Iterate Through Nodes**: We loop through each node and check if its ID is not in the set of source IDs. If a node's ID is not found in the set, it means this node has no outgoing edges and is the "last node".

This approach ensures we only iterate over the edges once to create the set and then do a fast lookup for each node, improving the overall efficiency.
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Dec 13, 2024
@dosubot dosubot bot added the enhancement New feature or request label Dec 13, 2024
Copy link

codspeed-hq bot commented Dec 13, 2024

CodSpeed Performance Report

Merging #5261 will not alter performance

Comparing codeflash-ai:codeflash/optimize-find_last_node-2024-12-11T13.39.44 (97c2182) with main (1ec6380)

Summary

✅ 15 untouched benchmarks

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 14, 2024
@ogabrielluiz ogabrielluiz changed the title ⚡️ Speed up function find_last_node by 29,891% refactor: ⚡️ Speed up function find_last_node by 29,891% Dec 16, 2024
@ogabrielluiz ogabrielluiz changed the title refactor: ⚡️ Speed up function find_last_node by 29,891% refactor: ⚡️ Speed up function find_last_node by 29,891% Dec 16, 2024
@github-actions github-actions bot added refactor Maintenance tasks and housekeeping and removed enhancement New feature or request labels Dec 16, 2024
@ogabrielluiz ogabrielluiz enabled auto-merge (squash) December 16, 2024 19:19
@github-actions github-actions bot added refactor Maintenance tasks and housekeeping and removed refactor Maintenance tasks and housekeeping labels Dec 16, 2024
@ogabrielluiz ogabrielluiz merged commit e8d3714 into langflow-ai:main Dec 16, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer refactor Maintenance tasks and housekeeping size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants