-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows-msvc is slower than Linux or macOS (no asm) #43
Comments
Oh dear, thanks for the report! The I didn't know though that zlib had asm files to build! What files were you thinking we should build? |
Cool! The .asm files for each platform are hiding under "contrib" which made them a bit invisible to me at first ;) but there's a note in the win32 Makefile listing how to enable them there. Need to set defines on the C compiles:
Files for x64:
Files for x86:
The |
Ah ok, cool! Are we sure it's a good idea to use those? There's a relatively scary comment indicating that they're not necessarily recommended for use |
@Brion We'd also be happy to accept PR's in vcpkg to enable the ASM! |
Hmm, after more careful poking the Debian/Ubuntu package doesn't appear to be using the gcc asm either, unless it's triggering via some means I don't understand (Deb packages confuse me!) but it's sure faster... Not 100% sure what's the best course of action here. |
Perhaps an off-by-default feature could be included? That way users could determine if the asm works for them? |
Sounds good -- then can opt in easily. :) |
@Brion want to test out https://github.com/alexcrichton/libz-sys/tree/asm and see if it works for you? |
@alexcrichton works great, I see a definite improvement of several percent in total PNG encoding throughput when enabling the 'asm' feature on Windows, both x86_64 and x86. :) |
Ok thanks! I've pushed that to master now |
I'm using libz-sys in a multithreaded PNG encoder and found that my Windows builds run slower than Linux or macOS builds; much of the difference seems to be down to how zlib is built.
For my test compressing a very large screenshot on a single thread:
(Something like 2/3 of the total runtime is in deflate.)
Using the vcpkg version (with VCPKGRS_DYNAMIC=1) seems to give a small boost of a few percent, but a bigger win comes from manually rebuilding zlib1.dll with assembly optimizations and dropping that in over the vcpkg version. This gets my whole run going almost as fast as the Linux version.
Would it be worth special-casing the msvc builds to pull in the x86/x64 assembly bits? I'm not sure how hard that is with the
cc
crate build framework (they're extra .asm files that need to be run through the assembler into .obj files, not inline assembler in .c files). Or should there be an easier way to drop in a customized zlib build if you need it?(And thanks for the crate -- it gave me exactly the low-level interface into zlib I needed to create a stream stitched together from work done on multiple threads!)
The text was updated successfully, but these errors were encountered: