Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Git pillar lock files not being removed. #65816

Closed
9 tasks
dwoz opened this issue Jan 8, 2024 · 6 comments
Closed
9 tasks

[BUG] Git pillar lock files not being removed. #65816

dwoz opened this issue Jan 8, 2024 · 6 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior

Comments

@dwoz
Copy link
Contributor

dwoz commented Jan 8, 2024

3006.4

Description

It's been observed that git pillar lock files are not being removed properly.

Dec 21 08:26:23 salt-master[33771]: [WARNING ] git_pillar_global_lock is enabled and update lockfile /var/cache/salt/master/git_pillar/nonprod/_/.git/update.lk is>

Dec 21 08:26:23 salt-master[33771]: [WARNING ] Update lock file is present for git_pillar remote 'nonprod git@***', >

Dec 21 08:35:34  salt-master[60671]: [WARNING ] git_pillar_global_lock is enabled and update lockfile /var/cache/salt/master/git_pillar/nonprod/_/.git/update.lk is>

Dec 21 08:35:34 salt-master[60671]: [WARNING ] Update lock file is present for git_pillar remote 'nonprod git@***', >

Dec 21 08:46:40 salt-master[60671]: [WARNING ] git_pillar_global_lock is enabled and update lockfile /var/cache/salt/master/git_pillar/nonprod/_/.git/update.lk is>

Dec 21 08:46:40  salt-master[60671]: [WARNING ] Update lock file is present for git_pillar remote 'nonprod git@***', >

However, running fileserver.update from the command line did not indicate there was a lock.

$ sudo salt-run fileserver.update

True
$ sudo salt-run cache.clear_git_lock git_pillar type=update

cleared:

Removed update lock for git_pillar remote 'nonprod git@***'

Setup
(Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

Please be as specific as possible and give set-up details.

  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior
(Include debug logs if possible and relevant)

Expected behavior
Pillars should update without issue.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:

Salt: 3006.4

Python Version:

Python: 3.10.13 (main, Oct 4 2023, 21:54:22) [GCC 11.2.0]

Dependency Versions:

cffi: 1.14.6

cherrypy: 18.6.1

dateutil: 2.8.1

docker-py: Not Installed

gitdb: Not Installed

gitpython: Not Installed

Jinja2: 3.1.2

libgit2: 1.3.0

looseversion: 1.0.2

M2Crypto: Not Installed

Mako: Not Installed

msgpack: 1.0.2

msgpack-pure: Not Installed

mysql-python: Not Installed

packaging: 22.0

pycparser: 2.21

pycrypto: Not Installed

pycryptodome: 3.9.8

pygit2: 1.7.0

python-gnupg: 0.4.8

PyYAML: 6.0.1

PyZMQ: 23.2.0

relenv: 0.13.12

smmap: Not Installed

timelib: 0.2.4

Tornado: 4.5.3

ZMQ: 4.3.4

Salt Extensions:

SSEAPE: 8.14.1.5

System Versions:

dist: rhel 9.3 Plow

locale: utf-8

machine: x86_64

release: 5.14.0-362.8.1.el9_3.x86_64

system: Linux

version: Red Hat Enterprise Linux 9.3 Plow

Additional context
Add any other context about the problem here.

@dwoz dwoz added Bug broken, incorrect, or confusing behavior needs-triage and removed needs-triage labels Jan 8, 2024
@dwoz dwoz added this to the Sulfur v3006.6 milestone Jan 8, 2024
@dmurphy18 dmurphy18 self-assigned this Jan 9, 2024
@dmurphy18
Copy link
Contributor

dmurphy18 commented Jan 12, 2024

The salt-master's configuration file only contains log_level_logfile: debug, all git fs settings are in raas, and this appears to be sseapi related. Will talk with raas team to investigate this further, as to how git fs is being used with regular open salt and from raas.

Chatted with the raas team and they don't use GitFS. Found the GitFS settings in a separate file masterXX_gitfs.conf which leverages azure

Q: how is azure accessed, are they using the azure salt extension ?

@dmurphy18
Copy link
Contributor

Have a new master log file with the lock error's.
I am seeing a large number of Traceback errors in the file due to jinja2 rendering errors, this may be due to the pillar data is not accessible from GitFS, or some other issue, but investigating if the error paths on the GitFS access are not releasing the lock files, and that is why they still might be present (there may be an error path that was overlooked in the implementation).

@dmurphy18
Copy link
Contributor

See probable cause, for example: process 50973 obtains a lock but never releases it, and gets killed, SIGTERM, and then another process attempts to obtain lock but finds process 509732 still has it, but it is dead. Note the long time difference between obtaining lock and SIGTERM, would have thought it would have been released soon after obtaining it, probable two bugs, should not hold locks that long, and SIGTERM should check and release resources held by a process like lock files.

Obtain lock

2024-01-11 09:20:26,218 [salt.utils.gitfs :999 ][DEBUG   ][50973] Set update lock for gitfs remote '[email protected]:v3/Vizientinc/Hosting/SaltStack'

SIGTERM approx. 16 hours later

2024-01-12 01:15:41,521 [salt.utils.process:1070][DEBUG   ][50973] FileserverUpdate received a SIGTERM. Exiting

Finding lock held by dead process

2024-01-12 01:16:25,647 [salt.utils.gitfs :961 ][WARNING ][3200] gitfs_global_lock is enabled and update lockfile /var/cache/salt/master/gitfs/work/NlJQs6Pss_07AugikCrmqfmqEFrfPbCDBqGLBiCd3oU=/_/update.lk is present for gitfs remote '[email protected]:v3/Vizientinc/Ho     sting/SaltStack'. Process 50973 obtained the lock but this process is not running. The update may have been interrupted. If using multi-master with shared gitfs cache, the lock may have been obtained by another master.

@dmurphy18
Copy link
Contributor

The current testing for gitfs is only relying on _master_lock, and exercising it

_master_lock = multiprocessing.Lock()

and use acquire and release. this doesn't really test the locking in the GitProvider class with PyGit2 and GitPython lock & clear_lock mechanism but goes directly to underlying acquire and release,n should be testing the methods

@dmurphy18
Copy link
Contributor

Have changes and tests in PR #65937 for handling GitFS lock files, and droppings from when the process is terminated, getting cleaned up.

@dwoz dwoz modified the milestones: Sulfur v3006.7, Sulfur v3006.9 May 1, 2024
@dmurphy18
Copy link
Contributor

closing since PR #65937 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior
Projects
None yet
Development

No branches or pull requests

2 participants