Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Processing Time in dpkg-db-cataloger with all-layers Option (Syft 1.20.0) #3683

Open
tallstory opened this issue Feb 24, 2025 · 3 comments · Fixed by anchore/stereoscope#382 · May be fixed by #3636
Open

Long Processing Time in dpkg-db-cataloger with all-layers Option (Syft 1.20.0) #3683

tallstory opened this issue Feb 24, 2025 · 3 comments · Fixed by anchore/stereoscope#382 · May be fixed by #3636
Assignees
Labels
bug Something isn't working

Comments

@tallstory
Copy link

tallstory commented Feb 24, 2025

What happened:
After upgrading Syft from 0.95.0 to 1.20.1, I noticed that analyzing the nvcr.io/nvidia/pytorch:24.08-py3 image with the all-layers option results in dpkg-db-cataloger taking significantly longer than before - around 50 minutes.

Initially, I suspected this might be a bug, but after further analysis, I’m unsure if this is expected behavior or if there’s something specific about this image that is causing the delay.

What you expected to happen:
I expected the analysis to complete in a timeframe similar to previous Syft versions (e.g., 0.95.0), where this step appeared to run significantly faster.

Steps to reproduce the issue:

  1. Run the following command with Syft 0.95.0

    $ time ./syft_0_95_0 packages --scope all-layers nvcr.io/nvidia/pytorch:24.08-py3 -v
    [0000]  INFO syft version: 0.95.0
    [0000]  INFO new version of syft is available: 1.20.0 (current version is 0.95.0)
    ...
    
    ./syft_0_95_0 packages --scope all-layers nvcr.io/nvidia/pytorch:24.08-py3   318.20s user 62.48s system 77% cpu 8:10.52 total
    
  2. Run the following command with Syft 1.20.0

    $ time ./syft_1_20_0 scan --scope all-layers nvcr.io/nvidia/pytorch:24.08-py3 -v
    [0000]  INFO syft version: 1.20.0
    ...
    [3451]  INFO task completed elapsed=49m35.771351958s task=dpkg-db-cataloger
    ...
    ./syft_1_20_0 scan --scope all-layers nvcr.io/nvidia/pytorch:24.08-py3 -v  3535.15s user 89.69s system 102% cpu 58:40.23 total
    

Anything else we need to know?:

  • Is this expected behavior for certain images, or could this indicate an optimization issue?
  • Are there any recommended workarounds or alternative configurations to reduce processing time in this case?
  • Let me know if additional debugging information would be helpful.

Environment:

  • Output of syft version: 0.95.0, 1.20.0
  • OS: macOS (Apple Silicon)
    $ sw_vers -productVersion
    15.1.1
    
@tallstory tallstory added the bug Something isn't working label Feb 24, 2025
@tallstory tallstory changed the title Long Processing Time in dpkg-db-cataloger with all-layers Option (Syft 1.20.1) Long Processing Time in dpkg-db-cataloger with all-layers Option (Syft 1.20.0) Feb 24, 2025
@spiffcs spiffcs self-assigned this Feb 25, 2025
@spiffcs spiffcs moved this to In Progress in OSS Feb 25, 2025
@spiffcs
Copy link
Contributor

spiffcs commented Feb 27, 2025

Hey @tallstory - Just a quick update here I've been trying to track down where the performance decrease happened between 0_95_0 to most recent. When I track the version where this was introduced I'll run a git bisect and try to find what is causing the issue.

Apologies for the regression here.

@tallstory
Copy link
Author

Hey @spiffcs , just checking in-any updates on this issue?
Let me know if there's anything I can do to help. Thanks!

@github-project-automation github-project-automation bot moved this from In Progress to Done in OSS Mar 14, 2025
@kzantow kzantow reopened this Mar 14, 2025
@kzantow kzantow moved this from Done to In Progress in OSS Mar 14, 2025
@kzantow
Copy link
Contributor

kzantow commented Mar 14, 2025

Hey @tallstory -- @spiffcs is on PTO, so I just thought I would have a quick look at this since I have an experimental PR that is parallelizing the dpkg cataloger and when this gets merged, it will muddy the waters. I think you noticed a change to improve this; I'm leaving this issue open until the parallelism work lands, as I think it will get performance close enough to be considered fixed, even though it may increase the overall cataloger time by a minute or more, as I noted in the PR, this is due to a change that the cataloger is necessarily doing more work, but should bring performance in the ballpark to what you were seeing before.

@kzantow kzantow linked a pull request Mar 14, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
3 participants