JWS anti-replay nonce error #1

Kaaal · 2018-03-09T09:55:58Z

Hello,
I regularly have an error when generating certificates :

request challenge for XXX
error while requesting challenge for XXX
  {
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce a_und1PinIcRvGo9HQ6HaOzlrmIgum_AfiwnaLllAD8",
  "status": 400
} ({
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce a_und1PinIcRvGo9HQ6HaOzlrmIgum_AfiwnaLllAD8",
  "status": 400
})

The error usually disappears when relaunching the script.

Thanks for your help

The text was updated successfully, but these errors were encountered:

bruncsak · 2018-03-09T15:08:38Z

Hello,

Are you behind a web proxy? The RFC says that the server should reply with "Cache-Control: no-store" HTTP header field (as Letsencrypt's prod and staging server do), but some proxy may be broken.
Is the XXX domain name always the first domain in the list of domains for requesting challenge?

You may try to comment out the line
sed -e '/Replay-Nonce: / ! d; s/^Replay-Nonce: //' "$RESP_HEADER" | tr -d '\r\n' > "$LAST_NONCE"
in the send_get_req() function. Please let me know how does it behave than.

Kaaal · 2018-03-09T15:25:40Z

No, I'm not behind a web proxy.
Every day I have several certificates to request, it's not always the first one, sometimes many have the error, other times none.
I will try to comment out this line, thank you !

bruncsak · 2018-03-09T15:32:41Z

Do you have long elapsed time between the different actions? There may be timing issue on the server how long a nonce is valid.

If it is not always the first domain which gets the error, than try to comment out the same line from the send_req() function also, please.

Kaaal · 2018-03-09T16:02:51Z

A maximum of 1 or 2 seconds can pass between two requests.
I have several hundred certificates, my daily cron checks the expiration date (with openssl) of each certificate, and makes a request if necessary. If all certificates are ok, the cron take 2 seconds to check them all.

bruncsak · 2018-03-09T20:21:18Z

Only one job is running at a time and renewing the certificates sequentially, or the jobs are parallel running? Do all jobs use the same account key?

Kaaal · 2018-03-10T05:21:18Z

Only one job at a time, and all use the same account key. Domain names are not all configured on the same IP, but all IPs belong to the same server.

Kaaal · 2018-03-10T09:42:33Z

Today, with the two lines commented out, I got an error again. There were 2 certificates to renew, and both failed :

Generating RSA private key, 4096 bit long modulus
...............................................................++
....................................++
e is 65537 (0x10001)
generate certificate request
request challenge for XXX
error while requesting challenge for XXX
  {
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce yQ9e8_az0NQ_Fr8mj2MSWYCTI0z-LjxuVdZJ2t1fo3I",
  "status": 400
} ({
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce yQ9e8_az0NQ_Fr8mj2MSWYCTI0z-LjxuVdZJ2t1fo3I",
  "status": 400
})
Generating RSA private key, 4096 bit long modulus
..................................................................................++
..............................................................++
e is 65537 (0x10001)
generate certificate request
request challenge for YYY
error while requesting challenge for YYY
  {
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce K7VVkwRTeI75xBbANx5etZGUDX2WYClMKbItOY_4sA8",
  "status": 400
} ({
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce K7VVkwRTeI75xBbANx5etZGUDX2WYClMKbItOY_4sA8",
  "status": 400
})

I ran the script again, the first one failed again, the second one worked :

Generating RSA private key, 4096 bit long modulus
....................++
....................................................................................................++
e is 65537 (0x10001)
generate certificate request
request challenge for XXX
error while requesting challenge for XXX
  {
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce NRxuagOCP4a7uN5lfkX-IuHN8aV9bRSvx5Jx56DYq2s",
  "status": 400
} ({
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce NRxuagOCP4a7uN5lfkX-IuHN8aV9bRSvx5Jx56DYq2s",
  "status": 400
})
Generating RSA private key, 4096 bit long modulus
........................................++
.................................................................................++
e is 65537 (0x10001)
generate certificate request
request challenge for YYY
push response for YYY
request verification of YYY
check verification of YYY
YYY is valid
remove response for YYY
request certificate

Third times, the error was not exactly the same :

Generating RSA private key, 4096 bit long modulus
.......................++
.................++
e is 65537 (0x10001)
generate certificate request
request challenge for XXX
push response for XXX
request verification of XXX
check verification of XXX
XXX is valid
remove response for XXX
request certificate
unhandled response while requesting certificate

HTTP/1.1 100 Continue
Expires: Sat, 10 Mar 2018 09:23:16 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache

HTTP/1.1 400 Bad Request
Server: nginx
Content-Type: application/problem+json
Content-Length: 149
Boulder-Requester: 27228030
Replay-Nonce: qXpZgOyfCmPa6s5HMKgcyWqQqxUtzBTfLVS1sIVTz2k
Expires: Sat, 10 Mar 2018 09:23:16 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Sat, 10 Mar 2018 09:23:16 GMT
Connection: close

{
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce iTbp24xCF2YD8NxJVERnZovpTP1Mam2mvXMn9CdBpV8",
  "status": 400
}

The fourth :

Generating RSA private key, 4096 bit long modulus
.....++
.................................................................................++
e is 65537 (0x10001)
generate certificate request
request challenge for XXX
push response for XXX
request verification of XXX
unhandled response while requesting verification of challenge of XXX

HTTP/1.1 100 Continue
Expires: Sat, 10 Mar 2018 09:24:40 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache

HTTP/1.1 400 Bad Request
Server: nginx
Content-Type: application/problem+json
Content-Length: 149
Boulder-Requester: 27228030
Replay-Nonce: PqJPXT7hUzAQAsqaPYcGpfRUzvYP5CBOo1DZZdmlSug
Expires: Sat, 10 Mar 2018 09:24:40 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Sat, 10 Mar 2018 09:24:40 GMT
Connection: close

{
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce z92hn5lKVcWqPq6Zpi3Zkk1iAOoKu4YDlB9wGGXOGMo",
  "status": 400
}

The fifth time, it worked !

bruncsak · 2018-03-10T10:40:26Z

Was that already with the modification I requested (two lines commented out in the script)? What type of OS and version are you on, does your curl is standard shipped with the OS?

Kaaal · 2018-03-10T10:43:20Z

Yes, the two lines are commented out in the script. The script is running on Ubuntu 16.04 with standard curl.

bruncsak · 2018-03-12T13:54:39Z

I created a new debug version for you in the branch badNonce-debug. Kindly use this one and in case of failure please send me the output.

Kaaal · 2018-03-18T20:30:38Z

I set up the debug version, but for a week there were no errors. I'll let you know as soon as it happens again.

Kaaal · 2018-03-21T08:35:19Z

There was an error this morning, after a successful request :

Generating RSA private key, 4096 bit long modulus
................................++
....++
e is 65537 (0x10001)
-e 1521619207.108593240 Replay-Nonce: WF5izbGMEwrK-0lJ9JuPTaXliQ_pPrpaDPoh6Xm8stE^M$
generate certificate request
request challenge for XXX
-e 1521619214.011580466 Replay-Nonce: h6CMwHNkX1VIwi1wjLduaK98bqjsvmxMOooQENeL4HM^M$
push response for XXX
request verification of XXX
-e 1521619222.501540693 Replay-Nonce: 0Mmfaqd4ZaGVaPYnTkrvrn4wNc_NAFS_koLi_EtFQ70^M$
check verification of XXX
-e 1521619223.968245048 Replay-Nonce: mY5KJax5NfX_xwt4V48I6fS2RkszpydltoE05kiPtJs^M$
XXX is pending
check verification of XXX
-e 1521619225.458722309 Replay-Nonce: FT8utBITkRPV90lN73jk5hf9q4UMejZoFmecPVS--EU^M$
XXX is valid
remove response for XXX
request certificate
-e 1521619239.555913964 Replay-Nonce: USiMfNapPdK1PcgORwlpy6nzPPLm0hxO9RtmrRW95O0^M$
-e 1521619240.080922003 Replay-Nonce: LpU7xuQo9xpdC34dmDJ-DERj8QtYjvAoyScingGmmbA^M$

Generating RSA private key, 4096 bit long modulus
............................................................................++
.......................................................................................................................................................................................................................++
e is 65537 (0x10001)
-e 1521619243.788096334 Replay-Nonce: Y-vLQeG893Xv9qrYUHFMpkB7Mqn0EuBFcMCSViJZ4IA^M$
generate certificate request
request challenge for YYY
-e 1521619245.009965970 Replay-Nonce: nCd9wlYA22YTHLqWBvssYjSQKMaEoOLSlHkJuosnHVs^M$
error while requesting challenge for YYY
  {
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce Y-vLQeG893Xv9qrYUHFMpkB7Mqn0EuBFcMCSViJZ4IA",
  "status": 400
} ({
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce Y-vLQeG893Xv9qrYUHFMpkB7Mqn0EuBFcMCSViJZ4IA",
  "status": 400
})

bruncsak · 2018-03-22T15:18:12Z

I opened a topic at the community of letsencrypt concerning your problem:

https://community.letsencrypt.org/t/regular-badnonce-errors/57332

May I ask you to let me know the answer for the boulder engineer question:
"Is there any chance the user in question is accessing the API from multiple egress public IP addresses?"

The boulder engineer also asked to add more debugging output to the code, which I am ready to do, but that would lead to see your domain names in question publicly. Normally that is not a problem, since the domain names for issued certificates are all made public in Certificate Transparency logs (e.g. https://crt.sh/?q=example.com).

Kaaal · 2018-03-23T09:55:43Z

I use only one IP to access the API, but domains are set up on various IPs.

Domain names belong to our customers, it would be easier for me if I could send debug logs only to you and the boulder engineer. Is that possible?

cpu · 2018-03-23T12:29:29Z

Domain names belong to our customers, it would be easier for me if I could send debug logs only to you and the boulder engineer. Is that possible?

@Kaaal 👋 I'm the Boulder engineer in question :-) You can email the unsantized logs to cpu <at> letsencrypt.org. Thanks!

bruncsak · 2018-03-23T13:38:56Z

@Kaaal , I put more output into the debug version, please update your instant.

https://github.com/bruncsak/letsencrypt.sh/tree/badNonce-debug

If you have badNonce failure again, no need to post here the output, please send to the boulder engineer's e-mail address.

Kaaal · 2018-03-23T14:12:32Z

Thank you @bruncsak, I updated the script. And thank you @cpu, I will send you the logs when the failure happens again.

bruncsak · 2018-03-23T14:26:20Z

@Kaaal Just get in my mind something else. On the server you are running the script to get the certificate, do you have dual IP stack running, IPv4 and IPv6 as well? There may be the possibility that one connection is using IPv4, the other one IPv6.

cpu · 2018-03-23T15:02:09Z

@Kaaal I know earlier you said you weren't behind a web proxy. Can you confirm that there's no chance your ISP or IT department might have your server behind a proxy you weren't aware of?

I ask because one of my coworkers points out in your original log there is a strange HTTP/1.1 100 Continue response immediately before the bad request response caused by the nonce error.

As far as I'm aware there isn't any part of the Let's Encrypt API stack that would return an "HTTP/1.1 100 Continue" response which makes me believe there might be a proxy meddling with requests. That would also explain the badNonce errors neatly.

HTTP/1.1 100 Continue
Expires: Sat, 10 Mar 2018 09:23:16 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache

HTTP/1.1 400 Bad Request
Server: nginx
Content-Type: application/problem+json
Content-Length: 149
Boulder-Requester: 27228030
Replay-Nonce: qXpZgOyfCmPa6s5HMKgcyWqQqxUtzBTfLVS1sIVTz2k
Expires: Sat, 10 Mar 2018 09:23:16 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Sat, 10 Mar 2018 09:23:16 GMT
Connection: close

bruncsak · 2018-03-23T15:30:15Z

@cpu ,
I had ' 100 Continue' failure as well. I am behind double squid proxies, but I am not sure that the proxy gave that error:

request verification of XXX
unhandled response while requesting verification of challenge of XXX

HTTP/1.1 200 Connection established

HTTP/1.1 100 Continue
Expires: Fri, 01 Dec 2017 11:30:45 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache

HTTP/1.1 500 Internal Server Error
Server: AkamaiGHost
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 175
Expires: Fri, 01 Dec 2017 11:30:45 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Fri, 01 Dec 2017 11:30:45 GMT
Connection: close

The boulder is having Akamai reverse proxies in front that may be the other reason.

cpu · 2018-03-23T15:33:56Z

@bruncsak Interesting!

HTTP/1.1 500 Internal Server Error
Server: AkamaiGHost

What was the HTTP request you sent that received that response? What URL was it being sent to?

Can you share the results of running the following a few times:

curl -I <that url> -H "Pragma: akamai-x-cache-on, akamai-x-get-cache-key, akamai-x-get-true-cache-key, akamai-x-get-request-id"

bruncsak · 2018-03-23T15:48:54Z

@cpu Unfortunately I do not have the URL. I am not running always in full debug mode the client. That error happened near 4 months ago. The URL had to have the value of the "uri:" json field of the challenge returned. Would you be able to provide a generic URL to test?

Anyhow, the local squid proxy could not be responsible. It does not terminate the HTTPS session, the proxy uses the CONNECT method to pass the connection further without interacting.

Oups, I forgot something! That was the error text itself returned by the Akamai server:

<TITLE>Error</TITLE> An error occurred while processing your request.

Reference #179.4df90a17.1512127845.3e011e

cpu · 2018-03-23T15:53:52Z

That error happened near 4 months ago.

OK. Let's see what @kaal has to say. Errors from 4mo ago are too stale for me to do much with based on our at-hand log retention.

Kaaal · 2018-03-23T15:58:44Z

I didn't think of that, indeed I have one IPv4 and one IPv6 ! Maybe we should trace used IPs in the debug?
I just called the datacenter that hosts our servers, they confirmed to me that there was no "hidden" proxy.

cpu · 2018-03-23T16:05:46Z

@Kaaal If you share the IPv4 and IPv6 addresses your server uses I can check the logs and see if we're seeing a split between the two or if it's uniformly the IPv6 address.

Kaaal · 2018-03-23T16:07:35Z

@cpu No problem, I'll send them to you.

cpu · 2018-03-23T16:49:46Z

There may be the possibility that one connection is using IPv4, the other one IPv6.

@Kaaal @bruncsak Using the IP addresses @Kaaal sent I was able to confirm this theory. I think we can conclusively say this is the root cause!

Over the past 7 days I saw 180 requests from @Kaaal's IPv6 address and 5 from @Kaaal's IPv4 address. 100% of the requests made by the IPv6 address went to one data centre. 100% of the requests made by the IPv4 address went to the other data centre. This would definitely cause a badNonce error if the ACME client used a nonce from an IPv4 request/response with an IPv6 request.

@Kaaal I admit I'm not sure what advice to give you on pinning your egress traffic to one address or the other but I believe doing so will resolve your problem.

Edit: A colleague smartly points out this might be caused by "Happy Eyeballs" behaviour. The IPv4 requests may have been done as retries when an initial IPv6 connection failed for some reason (flaky upstream routes, etc).

On the Let's Encrypt side I think presently our options for load-balancing are fairly restrictive and we likely won't be able to pin an IPv4 and an IPv6 address to the same data centre reliably.

Kaaal · 2018-03-25T07:37:51Z

Thank you both so much for solving my problem, @bruncsak and @cpu.
I can pin traffic to IPv4 or IPv6 (really easy with curl). But other people may have the same problem, whether with @bruncsak's client or another one. If you can't solve the problem at let's encrypt, I think you should at least report it somewhere.
Thank you again !

bruncsak · 2018-04-06T10:00:24Z

@Kaaal, I updated the code on the badNonce-debug branch. It is not really debug, but rather quality assurance level code now. The new command line options "-4" and "-6" allow to restrict the egress IP address.

Kaaal · 2018-04-09T09:16:19Z

On my side, I put a variable "CURL_OPTS", I was waiting to see if I didn't have any more problem, which is the case. I was planning on making a pull request to send you this update, but it seems to not be useful anymore.

bruncsak closed this as completed May 24, 2019

felixfontein mentioned this issue Jan 4, 2025

Stop hammering LetsEncrypt 100 times for badNonce diafygi/acme-tiny#287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JWS anti-replay nonce error #1

JWS anti-replay nonce error #1

Kaaal commented Mar 9, 2018

bruncsak commented Mar 9, 2018

Kaaal commented Mar 9, 2018

bruncsak commented Mar 9, 2018

Kaaal commented Mar 9, 2018

bruncsak commented Mar 9, 2018

Kaaal commented Mar 10, 2018

Kaaal commented Mar 10, 2018

bruncsak commented Mar 10, 2018

Kaaal commented Mar 10, 2018

bruncsak commented Mar 12, 2018 •

edited

Loading

Kaaal commented Mar 18, 2018

Kaaal commented Mar 21, 2018

bruncsak commented Mar 22, 2018

Kaaal commented Mar 23, 2018

cpu commented Mar 23, 2018

bruncsak commented Mar 23, 2018 •

edited

Loading

Kaaal commented Mar 23, 2018

bruncsak commented Mar 23, 2018

cpu commented Mar 23, 2018

bruncsak commented Mar 23, 2018

cpu commented Mar 23, 2018

bruncsak commented Mar 23, 2018 •

edited

Loading

cpu commented Mar 23, 2018

Kaaal commented Mar 23, 2018

cpu commented Mar 23, 2018

Kaaal commented Mar 23, 2018

cpu commented Mar 23, 2018 •

edited

Loading

Kaaal commented Mar 25, 2018

bruncsak commented Apr 6, 2018

Kaaal commented Apr 9, 2018

JWS anti-replay nonce error #1

JWS anti-replay nonce error #1

Comments

Kaaal commented Mar 9, 2018

bruncsak commented Mar 9, 2018

Kaaal commented Mar 9, 2018

bruncsak commented Mar 9, 2018

Kaaal commented Mar 9, 2018

bruncsak commented Mar 9, 2018

Kaaal commented Mar 10, 2018

Kaaal commented Mar 10, 2018

bruncsak commented Mar 10, 2018

Kaaal commented Mar 10, 2018

bruncsak commented Mar 12, 2018 • edited Loading

Kaaal commented Mar 18, 2018

Kaaal commented Mar 21, 2018

bruncsak commented Mar 22, 2018

Kaaal commented Mar 23, 2018

cpu commented Mar 23, 2018

bruncsak commented Mar 23, 2018 • edited Loading

Kaaal commented Mar 23, 2018

bruncsak commented Mar 23, 2018

cpu commented Mar 23, 2018

bruncsak commented Mar 23, 2018

cpu commented Mar 23, 2018

bruncsak commented Mar 23, 2018 • edited Loading

cpu commented Mar 23, 2018

Kaaal commented Mar 23, 2018

cpu commented Mar 23, 2018

Kaaal commented Mar 23, 2018

cpu commented Mar 23, 2018 • edited Loading

Kaaal commented Mar 25, 2018

bruncsak commented Apr 6, 2018

Kaaal commented Apr 9, 2018

bruncsak commented Mar 12, 2018 •

edited

Loading

bruncsak commented Mar 23, 2018 •

edited

Loading

bruncsak commented Mar 23, 2018 •

edited

Loading

cpu commented Mar 23, 2018 •

edited

Loading