⚡️ Speed up mget()
by 68% in libs/langchain/langchain/storage/file_system.py
#29
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄
mget()
inlibs/langchain/langchain/storage/file_system.py
📈 Performance went up by
68%
(0.68x
faster)⏱️ Runtime went down from
14075.68μs
to8401.84μs
Explanation and details
(click to show)
Your already well optimized Python code is ensuring file paths are valid, and checking for their existence. However, there are a few improvements we can implement while not altering function names/sigs and preserving output exactly as before.
First, pre-compile the regular expression outside of your get_full_path function. The re.compile() function reduces overhead since the pattern string isn’t read and converted to a regular expression object for every iteration, as opposed to re.match(r"^[a-zA-Z0-9.-/]+$", key).
Second, optimise the mget function to reduce the number of disk accesses. The os.path.exists(key) function requires a disk access, as well as os.path.read_bytes(). We could reduce this by utilising a try/except block in the value assignment, catching the FileNotFoundError exception.
Here's the optimized code.
This piece of code will have a faster runtime by avoiding redundant disk accesses and regex compilations.
Correctness verification
The new optimized code was tested for correctness. The results are listed below.
✅ 0 Passed − ⚙️ Existing Unit Tests
✅ 0 Passed − 🎨 Inspired Regression Tests
✅ 7 Passed − 🌀 Generated Regression Tests
(click to show generated tests)