Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow: try various build options to skip using oneDNN. #9759

Open
wants to merge 2 commits into
base: IB/CMSSW_15_1_X/master
Choose a base branch
from

Conversation

gartung
Copy link
Member

@gartung gartung commented Mar 25, 2025

No description provided.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @gartung for branch IB/CMSSW_15_1_X/master.

@cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 25, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-118f2e/45196/summary.html
COMMIT: feb726f
CMSSW: CMSSW_15_1_X_2025-03-25-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9759/45196/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ bazel --batch --output_user_root ../build --host_jvm_args=--add-opens=java.base/java.nio=ALL-UNNAMED --host_jvm_args=--add-opens=java.base/java.lang=ALL-UNNAMED build -s --verbose_failures --distinct_host_configuration=false --copt=-march=x86-64-v2 --distinct_host_configuration=true --config=opt --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 -j 16 --config=noaws --config=nogcp --config=nohdfs --config=nonccl --config=nomkl //tensorflow/tools/pip_package:build_pip_package
$TEST_TMPDIR defined: output root default is '/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tensorflow-sources_x86-64-v2/2.12.0-41504b13dd687788f80e65f41891c9c7/build' and max_idle_secs default is '15'.
Extracting Bazel installation...
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
ERROR: Config value 'nomkl' is not defined in any .rc file
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.qCwBj1 (%build)


RPM build errors:
line 42: It's not recommended to have unversioned Obsoletes: Obsoletes: external+tensorflow-sources_x86-64-v2+2.12.0-41504b13dd687788f80e65f41891c9c7
Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.qCwBj1 (%build)


@cmsbuild
Copy link
Contributor

Pull request #9759 was updated.

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-118f2e/45197/summary.html
COMMIT: 26e61b1
CMSSW: CMSSW_15_1_X_2025-03-25-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9759/45197/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ bazel --batch --output_user_root ../build --host_jvm_args=--add-opens=java.base/java.nio=ALL-UNNAMED --host_jvm_args=--add-opens=java.base/java.lang=ALL-UNNAMED build -s --verbose_failures --distinct_host_configuration=false --copt=-march=x86-64-v3 --config=opt --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 -j 16 --config=noaws --config=nogcp --config=nohdfs --config=nonccl --config=mkl_open_source_only //tensorflow/tools/pip_package:build_pip_package
$TEST_TMPDIR defined: output root default is '/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tensorflow-sources/2.12.0-0d4577df4c24c230513fceae920e9291/build' and max_idle_secs default is '15'.
Extracting Bazel installation...
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
ERROR: Config value 'mkl_open_source_only' is not defined in any .rc file
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.d8Poqt (%build)


RPM build errors:
line 42: It's not recommended to have unversioned Obsoletes: Obsoletes: external+tensorflow-sources+2.12.0-0d4577df4c24c230513fceae920e9291
Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.d8Poqt (%build)


@cmsbuild
Copy link
Contributor

Pull request #9759 was updated.

@gartung gartung changed the title See if --config=nomkl works Tensorflow: try various build options to skip using oneDNN. Mar 25, 2025
@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-118f2e/45198/summary.html
COMMIT: bcac610
CMSSW: CMSSW_15_1_X_2025-03-25-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9759/45198/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

Trying to install the rpm package external+tensorflow+2.12.0-23f781ae88ca393898c9cd0beec6a793 just built.
Checking package dependencies: external+tensorflow+2.12.0-23f781ae88ca393898c9cd0beec6a793
Done checking package dependencies: external+tensorflow+2.12.0-23f781ae88ca393898c9cd0beec6a793
Checking local path dependency for rpm package external+tensorflow+2.12.0-23f781ae88ca393898c9cd0beec6a793 just build.
RPM installation stderr tensorflow:
error: Failed dependencies:
	libiomp5.so()(64bit) is needed by external+tensorflow+2.12.0-23f781ae88ca393898c9cd0beec6a793-1-1.x86_64

Failed to install RPM for tensorflow
Build logs cleanup py3-tensorflow
Build successful py3-tensorflow.


@cmsbuild
Copy link
Contributor

Pull request #9759 was updated.

@cmsbuild
Copy link
Contributor

Pull request #9759 was updated.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-118f2e/45226/summary.html
COMMIT: c197106
CMSSW: CMSSW_15_1_X_2025-03-26-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9759/45226/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 18 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3909207
  • DQMHistoTests: Total failures: 64
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3909123
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@gartung
Copy link
Member Author

gartung commented Mar 27, 2025

enable profiling

@gartung gartung force-pushed the gartung-tf-no-mkl branch from c197106 to fe1bef2 Compare March 27, 2025 16:28
@gartung
Copy link
Member Author

gartung commented Mar 27, 2025

please test

@cmsbuild
Copy link
Contributor

Pull request #9759 was updated.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-118f2e/45248/summary.html
COMMIT: fe1bef2
CMSSW: CMSSW_15_1_X_2025-03-27-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9759/45248/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 12 lines to the logs
  • Reco comparison results: 11 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3909207
  • DQMHistoTests: Total failures: 51
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3909136
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

Pull request #9759 was updated.

@gartung
Copy link
Member Author

gartung commented Mar 29, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build Tensorflow with options to disable OneDNN completely
2 participants