Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather CPU info from lscpu and merge data into cpu attribute namespace #1454

Merged
merged 27 commits into from
Apr 20, 2021

Conversation

ramereth
Copy link
Contributor

@ramereth ramereth commented Apr 21, 2020

Description

This contains two changes:

Gather CPU info from lscpu and create merge data into cpu attribute namespace

This adds a cpu/lscpu node data name space merges data containing information from the lscpu command (if it exists) into the cpu namespace. This augments and standardizes the information you can gather from /proc/cpuinfo which can be an issue on non-x86 architectures such as ppc64le and aarch64. Namely, when trying to determine whether or not a node is a guest or not. In addition, this properly fixes cpu counts on s390x nodes and uses lscpu data for determining cpu/total, cpu/real and cpu/cores if lscpu exists.

Fixes for detecting KVM guests on certain platforms and architectures

On some architectures (i.e. ppc64le), dmidecode does not exist nor does the proper information in /proc/cpuinfo exist to detect KVM guests. To workaround that, we can use lscpu data to easily determine this as a fallback.

In addition on CentOS 6, /sys/devices/virtual/misc/kvm does not exist so this works around that problem as well.

I don't believe this is a breaking change however please let me know if it is.

TODO:

  • Additional tests for other architectures and platforms for lscpu

Related Issue

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Chore (non-breaking change that does not add functionality or fix an issue)

Checklist:

  • I have read the CONTRIBUTING document.
  • I have run the pre-merge tests locally and they pass.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commits have been signed-off for the Developer Certificate of Origin.

@ramereth ramereth requested review from a team as code owners April 21, 2020 23:18
@jaymzh jaymzh added the Status: Waiting on Contributor A pull request that has unresolved requested actions from the author. label Jan 25, 2021
@ramereth
Copy link
Contributor Author

@tas50 I'm going to start working on this again with the hopes of getting it included in Chef 17. I'm trying to remember where we left off on this. I seem to recall that you suggested instead of making a new cpu/lscpu namespace, to just replace it with the current cpu name space. Is that still the case? If so, do I need to replicate the current structure that duplicates all of the cpu attributes for every single core?

@ramereth ramereth changed the title Gather CPU info from lscpu and create cpu/lscpu attribute namespace Gather CPU info from lscpu and merge data into cpu attribute namespace Mar 22, 2021
@ramereth
Copy link
Contributor Author

@jaymzh as requested, you can find my output gists here: https://gist.github.com/ramereth/6b9fdc88ba36af7aa3cc9f14b363e34a

@lamont-granquist @tas50 @jaymzh please have another look at this

ramereth added 16 commits April 19, 2021 09:53
This adds a cpu/lscpu node data name space containing information from the lscpu
command (if it exists). This augments and standardizes the information you can
gather from /proc/cpuinfo which can be an issue on non-x86 architectures such as
ppc64le and aarch64. Namely, when trying to determine whether or not a node is a
guest or not.

In addition, this properly fixes cpu counts on s390x nodes and uses lscpu data
for determining cpu/total, cpu/real and cpu/cores if lscpu exists.

Signed-off-by: Lance Albertson <[email protected]>
On some architectures (i.e. ppc64le), dmidecode does not exist nor does the
proper information in /proc/cpuinfo exist to detect KVM guests. To workaround
that, we can use lscpu data to easily determine this as a fallback.

In addition on CentOS 6, /sys/devices/virtual/misc/kvm does not exist so this
works around that problem as well.

Signed-off-by: Lance Albertson <[email protected]>
This also adds machine_type and fixes the typo with mhz_dynamic. In addition,
add missing test for sockets.

Signed-off-by: Lance Albertson <[email protected]>
Signed-off-by: Lance Albertson <[email protected]>
This adds tests for ppc64le CPUs including tests for lscpu.

Signed-off-by: Lance Albertson <[email protected]>
It's better to clearly state what each of these are as they are described in
lscpu. So this makes the following changes:

cpu/lscpu/threads -> cpu/lscpu/threads_per_core
cpu/lscpu/cores -> cpu/lscpu/cores_per_socket

On s390x, it will properly show the information for "Socket(s) per book" as:
cpu/lscpu/sockets_per_book

cpu/lscpu/books -> cpu/lscpu/books_per_drawer

In addition, properly include drawers when calculating cores.

Signed-off-by: Lance Albertson <[email protected]>
On ppc64le, /proc/cpuinfo is formatted in a weird way which currently makes ohai
output virtually no information for cpu. This properly fixes that and includes
additional attributes for ppc64le.

Signed-off-by: Lance Albertson <[email protected]>
If lscpu exists and returns data, the cpu data will be filled using that data
(with a few exceptions). If lscpu does not return data, it will fall back to the
old method of only reading data from /proc/cpuinfo.

Some data on a few architectures is only available from /proc/cpuinfo, so add it
when needed when using lscpu.

In addition, this changes the unit tests to pull data from files instead of
adding it directly to the unit test file.

Signed-off-by: Lance Albertson <[email protected]>
This should simplify testing multiple cpu configurations without a lot of work.

Signed-off-by: Lance Albertson <[email protected]>
Signed-off-by: Lance Albertson <[email protected]>
Signed-off-by: Lance Albertson <[email protected]>
Also apply a few fixes for ppc64le guests.

Signed-off-by: Lance Albertson <[email protected]>
Newer versions of lscpu (via coreutils) include CPU vulnerabilities in its
output. If this output exists, populate this data.

TODO: If this output doesn't exist, we should gather the data directly from
/sys/devices/system/cpu/vulnerabilities.

Signed-off-by: Lance Albertson <[email protected]>
We moved all the data from the cpu/lscpu namespace into cpu so this updates it
to reflect that.

Signed-off-by: Lance Albertson <[email protected]>
Signed-off-by: Lance Albertson <[email protected]>
Signed-off-by: Lance Albertson <[email protected]>
Also add a useful comment.

Signed-off-by: Lance Albertson <[email protected]>
Signed-off-by: Lance Albertson <[email protected]>
These should already be strings unless otherwise noted.

Signed-off-by: Lance Albertson <[email protected]>
The cache size is typically balanced between the cpus. The easiest way to do
this is getting the data from /proc/cpuinfo.  We *could* probably get the
information from newer versions of lscpu but let's not worry about it for now.

Signed-off-by: Lance Albertson <[email protected]>
This matches the old behaviour.

Signed-off-by: Lance Albertson <[email protected]>
Based on review comments, this is a better way to handle this. Also update the
comment to better reflect what this method does.

Signed-off-by: Lance Albertson <[email protected]>
Create numa_node_cpus and vulnerability mashes once we know we have lscpu and
then just return the empty lscpu_info object.

Signed-off-by: Lance Albertson <[email protected]>
@jaymzh jaymzh merged commit 9fb22a8 into chef:master Apr 20, 2021
maxlazio pushed a commit to gitlabhq/omnibus-gitlab that referenced this pull request Jul 7, 2022
The update to CINC 17.10 in
https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/6048
updated Ohai to v17.9.0. As a result of
chef/ohai#1454, `lscpu` is used to determine
the number of real cores, total CPUs, and threads per core. However,
on some older platforms, threads per core is 0, resulting in the total
number of virtual CPUs being 0. This would cause NGINX to spin up with
0 cores.

To prevent startup failures, we now attempt to fallback to the number
of cores. Failing that, for NGINX we set the minimum worker processes
to 1.

We also convert to using `dig` in case the CPU data is not available
for some reason.

Relates to:

* https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6895
* https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6896

Changelog: fixed
stanhu added a commit to stanhu/ohai that referenced this pull request Jul 27, 2022
chef#1454 made `lscpu` the primary way
Ohai gathers CPU data, but the parser doesn't handle the output from a
Raspberry Pi 4 Cortex-A72. For example, an ARM cluster has the output
`Core(s) per cluster` instead of `Core(s) per socket`.

We should fall back to `/proc/cpuinfo` if we don't get reasonable data
back. It's not enough just to have `lscpu` output.

Closes chef#1760
stanhu added a commit to stanhu/ohai that referenced this pull request Jul 27, 2022
chef#1454 made `lscpu` the primary way
Ohai gathers CPU data, but the parser doesn't handle the output from a
Raspberry Pi 4 Cortex-A72. For example, an ARM cluster has the output
`Core(s) per cluster` instead of `Core(s) per socket`.

We should fall back to `/proc/cpuinfo` if we don't get reasonable data
back. It's not enough just to have `lscpu` output.

Closes chef#1760
stanhu added a commit to stanhu/ohai that referenced this pull request Jul 27, 2022
chef#1454 made `lscpu` the primary way
Ohai gathers CPU data, but the parser doesn't handle the output from a
Raspberry Pi 4 Cortex-A72. For example, an ARM cluster has the output
`Core(s) per cluster` instead of `Core(s) per socket`.

We should fall back to `/proc/cpuinfo` if we don't get reasonable data
back. It's not enough just to have `lscpu` output.

Closes chef#1760

Signed-off-by: Stan Hu <[email protected]>
stanhu added a commit to stanhu/ohai that referenced this pull request Aug 2, 2022
chef#1454 made `lscpu` the primary way
Ohai gathers CPU data, but the parser doesn't handle the output from a
Raspberry Pi 4 Cortex-A72. For example, an ARM cluster has the output
`Core(s) per cluster` instead of `Core(s) per socket`.

We should fall back to `/proc/cpuinfo` if we don't get reasonable data
back. It's not enough just to have `lscpu` output.

Closes chef#1760

Signed-off-by: Stan Hu <[email protected]>
stanhu added a commit to stanhu/ohai that referenced this pull request Aug 24, 2022
chef#1454 made `lscpu` the primary way
Ohai gathers CPU data, but the parser doesn't handle the output from a
Raspberry Pi 4 Cortex-A72. For example, an ARM cluster has the output
`Core(s) per cluster` instead of `Core(s) per socket`.

We should fall back to `/proc/cpuinfo` if we don't get reasonable data
back. It's not enough just to have `lscpu` output.

Closes chef#1760

Signed-off-by: Stan Hu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Waiting on Contributor A pull request that has unresolved requested actions from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants