refactor: ⚡️ Speed up function `find_last_node` by 29,891% #5261

misrasaurabh1 · 2024-12-13T22:16:17Z

📄 `find_last_node` in `src/backend/base/langflow/graph/graph/utils.py`

✨ Performance Summary:

Speed Increase: 📈 29,891% (298.91x faster)
Runtime Reduction: ⏱️ From 117 milliseconds down to 391 microseconds (best of 47 runs)

📝 Explanation and details

We can optimize the existing code by minimizing the checks inside the loop and improving the lookup operations. Here's an optimized version of the program.

Explanation.

Set for Fast Lookup: We first create a set of all source IDs from the edges. This is efficient because checking for membership in a set is on average O(1) time complexity.
Iterate Through Nodes: We loop through each node and check if its ID is not in the set of source IDs. If a node's ID is not found in the set, it means this node has no outgoing edges and is the "last node".

This approach ensures we only iterate over the edges once to create the set and then do a fast lookup for each node, improving the overall efficiency.

✅ Correctness verification

The new optimized code was tested for correctness. The results are listed below:

Test	Status	Details
⚙️ Existing Unit Tests	✅ 10 Passed	See below
🌀 Generated Regression Tests	✅ 31 Passed	See below
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Coverage	100.0%

⚙️ Existing Unit Tests Details

Click to view details

- graph/test_graph.py

🌀 Generated Regression Tests Details

Click to view details

import pytest  # used for our unit tests
from langflow.graph.graph.utils import find_last_node

# unit tests

# Basic Functionality
def test_single_node_no_edges():
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_multiple_nodes_no_edges():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)  # Any node is valid

def test_multiple_nodes_one_edge():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)

# Edge Cases
def test_empty_nodes_list():
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_empty_edges_list():
    nodes = [{"id": 1}, {"id": 2}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)  # Any node is valid

def test_all_nodes_as_sources():
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)

# Complex Graphs
def test_linear_graph():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)

def test_branched_graph():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}, {"source": 3, "target": 4}]
    codeflash_output = find_last_node(nodes, edges)

# Cyclic Graphs
def test_simple_cycle():
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)

def test_complex_cycle():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}, {"source": 3, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)

# Large Scale Test Cases
def test_large_number_of_nodes_and_edges():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+1} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges)

def test_sparse_connections():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+2} for i in range(0, 998, 2)]
    codeflash_output = find_last_node(nodes, edges)

# Invalid Inputs

def test_edges_with_missing_sources():
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"target": 2}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)

def test_mixed_valid_and_invalid_nodes():
    nodes = [{"id": 1}, {"name": "node2"}, {"id": 3}]
    edges = [{"source": 1, "target": 3}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)

# Mixed Data Types
def test_non_integer_ids():
    nodes = [{"id": "a"}, {"id": "b"}]
    edges = [{"source": "a", "target": "b"}]
    codeflash_output = find_last_node(nodes, edges)

def test_mixed_integer_and_string_ids():
    nodes = [{"id": 1}, {"id": "2"}]
    edges = [{"source": 1, "target": "2"}]
    codeflash_output = find_last_node(nodes, edges)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest  # used for our unit tests
from langflow.graph.graph.utils import find_last_node

# unit tests

def test_single_node_no_edges():
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_multiple_nodes_no_edges():
    nodes = [{"id": 1}, {"id": 2}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_single_edge():
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)

def test_linear_chain():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)

def test_forked_path():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)

def test_converging_path():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 3}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)

def test_disconnected_graph():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)

def test_cyclic_graph():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}, {"source": 3, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)

def test_empty_nodes_and_edges():
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_nodes_with_no_corresponding_edges():
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 4, "target": 5}]
    codeflash_output = find_last_node(nodes, edges)


def test_large_number_of_nodes_and_edges():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+1} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges)

def test_sparse_graph():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+1} for i in range(0, 1000, 10)]
    codeflash_output = find_last_node(nodes, edges)

def test_performance_with_maximum_nodes():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i+1} for i in range(1000)]
    codeflash_output = find_last_node(nodes, edges)

def test_performance_with_random_edges():
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": (i*2) % 1000000} for i in range(1000)]
    codeflash_output = find_last_node(nodes, edges)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

📣 **Feedback**

If you have any feedback or need assistance, feel free to join our Discord community:

Certainly! We can optimize the existing code by minimizing the checks inside the loop and improving the lookup operations. Here's an optimized version of the program. ### Explanation. 1. **Set for Fast Lookup**: We first create a set of all source IDs from the edges. This is efficient because checking for membership in a set is on average O(1) time complexity. 2. **Iterate Through Nodes**: We loop through each node and check if its ID is not in the set of source IDs. If a node's ID is not found in the set, it means this node has no outgoing edges and is the "last node". This approach ensures we only iterate over the edges once to create the set and then do a fast lookup for each node, improving the overall efficiency.

…T13.39.44

codspeed-hq · 2024-12-13T22:28:24Z

CodSpeed Performance Report

Merging #5261 will not alter performance

_{Comparing codeflash-ai:codeflash/optimize-find_last_node-2024-12-11T13.39.44 (97c2182) with main (1ec6380)}

Summary

✅ 15 untouched benchmarks

…T13.39.44

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Dec 13, 2024

Merge branch 'main' into codeflash/optimize-find_last_node-2024-12-11…

d09f16a

…T13.39.44

dosubot bot added the enhancement New feature or request label Dec 13, 2024

ogabrielluiz approved these changes Dec 14, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 14, 2024

ogabrielluiz changed the title ~~⚡️ Speed up function find_last_node by 29,891%~~ refactor: ⚡️ Speed up function find_last_node by 29,891% Dec 16, 2024

Merge branch 'main' into codeflash/optimize-find_last_node-2024-12-11…

b8ca37e

…T13.39.44

ogabrielluiz changed the title ~~refactor: ⚡️ Speed up function find_last_node by 29,891%~~ refactor: ⚡️ Speed up function find_last_node by 29,891% Dec 16, 2024

github-actions bot added refactor Maintenance tasks and housekeeping and removed enhancement New feature or request labels Dec 16, 2024

ogabrielluiz enabled auto-merge (squash) December 16, 2024 19:19

Merge branch 'main' into codeflash/optimize-find_last_node-2024-12-11…

97c2182

…T13.39.44

github-actions bot added refactor Maintenance tasks and housekeeping and removed refactor Maintenance tasks and housekeeping labels Dec 16, 2024

ogabrielluiz merged commit e8d3714 into langflow-ai:main Dec 16, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: ⚡️ Speed up function `find_last_node` by 29,891% #5261

refactor: ⚡️ Speed up function `find_last_node` by 29,891% #5261

misrasaurabh1 commented Dec 13, 2024

codspeed-hq bot commented Dec 13, 2024 •

edited

Loading

refactor: ⚡️ Speed up function find_last_node by 29,891% #5261

refactor: ⚡️ Speed up function find_last_node by 29,891% #5261

Conversation

misrasaurabh1 commented Dec 13, 2024

📄 find_last_node in src/backend/base/langflow/graph/graph/utils.py

✨ Performance Summary:

📝 Explanation and details

Explanation.

✅ Correctness verification

⚙️ Existing Unit Tests Details

🌀 Generated Regression Tests Details

codspeed-hq bot commented Dec 13, 2024 • edited Loading

CodSpeed Performance Report

Merging #5261 will not alter performance

Summary

refactor: ⚡️ Speed up function `find_last_node` by 29,891% #5261

refactor: ⚡️ Speed up function `find_last_node` by 29,891% #5261

📄 `find_last_node` in `src/backend/base/langflow/graph/graph/utils.py`

codspeed-hq bot commented Dec 13, 2024 •

edited

Loading