Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows-msvc is slower than Linux or macOS (no asm) #43

Closed
bvibber opened this issue Sep 30, 2018 · 10 comments
Closed

Windows-msvc is slower than Linux or macOS (no asm) #43

bvibber opened this issue Sep 30, 2018 · 10 comments

Comments

@bvibber
Copy link

bvibber commented Sep 30, 2018

I'm using libz-sys in a multithreaded PNG encoder and found that my Windows builds run slower than Linux or macOS builds; much of the difference seems to be down to how zlib is built.

For my test compressing a very large screenshot on a single thread:

  • static version: 671ms
  • vcpkg-built zlib1.dll: 646ms
  • manually-built zlib1.dll with asm: 596ms
  • Linux version on WSL: 555ms

(Something like 2/3 of the total runtime is in deflate.)

Using the vcpkg version (with VCPKGRS_DYNAMIC=1) seems to give a small boost of a few percent, but a bigger win comes from manually rebuilding zlib1.dll with assembly optimizations and dropping that in over the vcpkg version. This gets my whole run going almost as fast as the Linux version.

Would it be worth special-casing the msvc builds to pull in the x86/x64 assembly bits? I'm not sure how hard that is with the cc crate build framework (they're extra .asm files that need to be run through the assembler into .obj files, not inline assembler in .c files). Or should there be an easier way to drop in a customized zlib build if you need it?

(And thanks for the crate -- it gave me exactly the low-level interface into zlib I needed to create a stream stitched together from work done on multiple threads!)

@alexcrichton
Copy link
Member

Oh dear, thanks for the report! The cc crate does indeed have support for simply feeding it assembly files (and it takes care of compliation on platforms and making sure the objects make their way to the linker)

I didn't know though that zlib had asm files to build! What files were you thinking we should build?

@bvibber
Copy link
Author

bvibber commented Sep 30, 2018

Cool! The .asm files for each platform are hiding under "contrib" which made them a bit invisible to me at first ;) but there's a note in the win32 Makefile listing how to enable them there.

Need to set defines on the C compiles:

  • -DASMV
  • -DASMINF

Files for x64:

  • contrib\masmx64\inffasx64.asm
  • contrib\masmx64\gvmat64.asm
  • contrib\masmx64\inffas8664.c

Files for x86:

  • contrib\masmx86\inffas32.asm
  • contrib\masmx86\match686.asm

The inffas* bits should speed up inflate decompression, while the *mat* bits speed up deflate's longestMatch function during compression.

@alexcrichton
Copy link
Member

Ah ok, cool! Are we sure it's a good idea to use those? There's a relatively scary comment indicating that they're not necessarily recommended for use

@ras0219-msft
Copy link

@Brion We'd also be happy to accept PR's in vcpkg to enable the ASM!

@bvibber
Copy link
Author

bvibber commented Oct 1, 2018

Ah ok, cool! Are we sure it's a good idea to use those? There's a relatively scary comment indicating that they're not necessarily recommended for use

Hmm, after more careful poking the Debian/Ubuntu package doesn't appear to be using the gcc asm either, unless it's triggering via some means I don't understand (Deb packages confuse me!) but it's sure faster... Not 100% sure what's the best course of action here.

@alexcrichton
Copy link
Member

Perhaps an off-by-default feature could be included? That way users could determine if the asm works for them?

@bvibber
Copy link
Author

bvibber commented Oct 1, 2018

Sounds good -- then can opt in easily. :)

@alexcrichton
Copy link
Member

@Brion want to test out https://github.com/alexcrichton/libz-sys/tree/asm and see if it works for you?

@bvibber
Copy link
Author

bvibber commented Oct 1, 2018

@alexcrichton works great, I see a definite improvement of several percent in total PNG encoding throughput when enabling the 'asm' feature on Windows, both x86_64 and x86. :)

@alexcrichton
Copy link
Member

Ok thanks! I've pushed that to master now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants