Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NixOps locks itself out when deploying to a node for the first time with --force-reboot and the SSH key for provisioning isn't in the nixops configuration #904

Open
grahamc opened this issue Mar 8, 2018 · 5 comments

Comments

@grahamc
Copy link
Member

grahamc commented Mar 8, 2018

So this is a bit weird, but...

  1. Have an ssh key just for provisioning and have it be the only key accepted by the server, not sure this is significant
  2. do a nixops deploy for the first time to a server using that key and use --force-reboot
  3. the ssh key is no longer acceptable by the target server, and nixops doesn't dare try to use the SSH key it generated and placed
  4. nixops just prompts for the password, which doesn't exist

Workaround: don't --force-reboot the first deploy, or: add the provisioning ssh key to your ssh agent and do a regular deploy once

@kimburgess
Copy link
Member

I'm seeing the same behavior here too.

Using a configuration that has a secondary user which can be used to SSH in and inspect the post-deploy state, the public key for the root user is definitely inserted:

[root@cotag-5-01:/]# cat /etc/ssh/authorized_keys.d/root
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILNnvPApjYRx/9glo5FEeJ5r2LviY72SvSh0MnqiB+7i NixOps client key for cotag-5-01

So it looks like this may be something on the nixops connection side.

As a side note: in my environment only the second workaround above worked. Even when doing a live switch (no --force-reboot) the public key generated by nixops is still inserted into the target machine, but it's private counterpart does not appear to be used when trying to connect.

@selaux
Copy link

selaux commented May 27, 2019

I ran into the same issue. I narrowed it down to this condition: https://github.com/NixOS/nixops/blob/master/nixops/backends/none.py#L82. self.cur_toplevel is None in this case as the configuration has not been applied yet. I think the same happens if the first deployment fails.

As far as I can see the check for self.cur_toplevel is unneccessary in this case as self._ssh_public_key_deployed already handles the case when the ssh key has not yet been deployed. Should I open a PR for this?

@asymmetric
Copy link
Contributor

@selaux how did you debug this?

@selaux
Copy link

selaux commented May 27, 2019

I noticed that the ssh key is not written into the temporary directory during deployment, so it seemed that nixops was not using it. I looked into the state file where sshPublicKeyDeployed was set to 1, which seemed right to me. Then I basically used a local version of nixops and just put in a print statement...

Not sure how this change will influence other things, but since sshPublicKeyDeployed is set on first deployment it should be fine.

@asymmetric
Copy link
Contributor

I would definitely open a PR so it can be tested more easily :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants