Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move UUIDText, dsc, and TimeSync to HashMaps #66

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

puffyCid
Copy link
Collaborator

@puffyCid puffyCid commented Feb 27, 2025

This PR continues the work done by jrouaix and dgmcdona to reduce this library’s memory usage.

Prior to this PR the library would read and cache all UUIDText and dsc files on a system or logarchive prior to parsing the actual logs.

The UUIDText and dsc files contains part of the log message.

This PR changes that behavior.
Instead, the library will parse the log file first and then read the corresponding UUIDText and/or dsc file.
The library will then cache the the corresponding UUIDText and/or dsc file before moving on to the next log entry.

The library will keep a small cache of UUIDText and dsc files while reading the log entries. This is all handle via the FileProvider trait that dgmcdona added

Currently it will cache 30 UUIDText files and 2 dsc files.
There typically ~250 UUIDText files. With a total size of ~90MB to ~140MB

Dsc files are larger but fewer in number.
Typically ~6 dsc files. With a total size of ~600MB

By using a smaller cache, the memory usage of the library drops a pretty good amount.

Memory usage prior to this PR: ~300MB-1GB. Dependent on how many UUIDText and dsc files
Now: ~100MB-500MB.
(Though this is still higher than the builtin in log command Apple. Which is ~150MB - ~200MB)

Switching to HashMap also makes the library a bit faster now

basic hyperfine test on a few log files

Benchmark 1: ./iteratorV2 -m log-archive -i /Users/android/Downloads/system_logs.logarchive -o out.csv -f csv
  Time (mean ± σ):      6.324 s ±  0.051 s    [User: 5.469 s, System: 0.638 s]
  Range (min … max):    6.264 s …  6.405 s    10 runs

Benchmark 2: ./iteratorV1 -m log-archive -i /Users/android/Downloads/system_logs.logarchive -o out2.csv -f csv
  Time (mean ± σ):      7.391 s ±  0.119 s    [User: 6.641 s, System: 0.595 s]
  Range (min … max):    7.278 s …  7.604 s    10 runs

Summary
  ./iteratorV2 -m log-archive -i /Users/android/Downloads/system_logs.logarchive -o out.csv -f csv ran
    1.17 ± 0.02 times faster than ./iteratorV1 -m log-archive -i /Users/android/Downloads/system_logs.logarchive -o out2.csv -f csv

Where iteratorV2 is the HashMap version

Finally this PR includes a fix to better handle <private> number entries.

Also since we switch from Vec to HashMap this is probably a breaking change. The example file has been updated.

@puffyCid puffyCid marked this pull request as ready for review February 27, 2025 05:13
@puffyCid puffyCid linked an issue Feb 27, 2025 that may be closed by this pull request
@puffyCid
Copy link
Collaborator Author

@mrguiman @jrouaix both of you discussed about attempts to lower memory usage.
If either of have a chance to try out this branch (hashmaps) and compare memory usage to the current implementation. That would be great, if not no worries.

Im hopeful there should be a bit more lower memory usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lowering the memory footprint of parse results
1 participant