-
-
Notifications
You must be signed in to change notification settings - Fork 15k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build failure: postgresql on darwin #371242
Comments
Of course the simple thing to do would be to disable the tests for x86_64-darwin. But maybe somebody has a better idea why it happens in that specific case? |
My guess is Rosetta 2 weirdness (Hydra emulates |
Correct me if I'm wrong, but isn't x86_64-darwin dying out anyways? Does upstream run their tests on that platform, so that we may be able to steal potential fixes? |
The last Mac models with Intel CPUs were released in 2020 (running on 10th gen Intel CPUs), about half a year before the release of the M1. Edit: of course it's possible Rosetta will die before that, in which case nixpkgs might have to sunset x86_64-darwin support sooner than that as the builders rely on Rosetta |
We’re realigning the support window to match Apple’s. The plan after #352129 is to go to 11.3 for 25.05 and 14.4 for 25.11. If macOS 17 is the last version to support Intel hardware, then that would make 28.05 the last release to support x86_64-darwin on Intel hardware. However, if Rosetta 2 is dropped, it’s likely that x86_64-darwin won’t be cached. Personally, given past transitions, I expect Rosetta 2 to remain supported for one release after Intel hardware support is dropped. That’s going to suck for applications stuck on Intel, but it’s not like that stopped Apple from killing 32-bit support. (RIP my Steam library. Again.) |
And it still just works for me, sad news that nixpkgs will drop support some day. |
FWIW I daily drive Thankfully NixOS will most likely continue to run pretty well on Intel Macs even after that! |
Hmm, that's good idea to install NixOS on Mac) |
I tried and failed. The tests, too.
I'll probably just disable the tests for x86_64-darwin again. It's not that we had them enabled for long anyway, until recently the tests for darwin were not enabled at all for a long time. Seems like we only tested aarch64-darwin in #358248. |
Yes, I think that’s fine in the absence of someone who feels like digging into the details. I’m guessing they might pass on native machines, but Hydra doesn’t have any native Sorry for the lengthy digression about the state of the platform 😅 |
A very similar error now popped up in #371463 (comment) for a postgres extension built via buildPgrxExtension, which runs postgresql / initdb as part of the build process. This time it's on aarch64-darwin, so it doesn't seem to be Rosetta 2 related. Community builder also doesn't seem to be overloaded or so. What else could be a reason / workaround / fix for this kind of failure? |
Other packages fail with this same error. https://hydra.nixos.org/build/281936070 It built fine as of https://hydra.nixos.org/build/280560529. This log has:
That shows us that initdb / postgres was running fine back then. And it was running with 16.5 already. Thus: This failure appearing is not related to us enabling the tests on darwin and is also not related to updating postgresql. What else changed?
@emilazy is this something that recently (since November 25 last year) changed? If not, then something for darwin changed in nixpkgs which is causing those failures. |
IIRC Hydra had one single native The builders have had some other changes though (macOS upgrades, using the Nix daemons to run builds, etc.), and Darwin has had a number of changes recently (the SDK rework – though that landed prior to November – but also the LLVM upgrade and so on). I’m not sure what it could be in this case. I’m also not sure who to ping for such an arcane failure; we have people who know Darwin and people who know Postgres, but I’m not sure if we have people who know both :) I don’t think it would be the end of the world to just mark it as broken, but I do imagine Postgres on Darwin gets some use for development environments. Have you tried reproducing the failure on the community builder? |
Well, if it was only x86_64-darwin, yeah. But since it also appeared on aarch64-darwin at least once (see #371242 (comment)), I don't think this would be enough.
That will be my next step, but I have too many builds running right now to even think about it :D |
So bisect leads me to fc9c333, ofc a merge of staging-next. Before that commit But.. when I try to build the commit where it was still passing on the community-builder with a trivial change (changed order of build inputs) I get the same build failure again. Thus, the problem seems to be introduced by some external factor, indeed, not internal to nixpkgs.
Where can I track down which kind of changes were made to the hydra builders between 2024-11-25 (the last passing build) and 2024-12-23 (the merge of staging above)? I assume the community builders would have gone through the same changes, but not necessarily at the same time. The community builders are currently on macOS 15.2 (build 24C101). Of course.. this version was released on December 11th: https://support.apple.com/en-us/100100. So that fits right in. The changelog for this version also lists at least 4 items related to "Kernel", 3 of them mentioning "memory"... I found a comment about something very similar here: https://wsjtx.groups.io/g/main/message/52474. The timeline doesn't match up 100%, because that's from September, but:
This seems to be a problem with the configuration of both the hydra and the community builders. I don't have a Mac myself, so I can't investigate / reproduce / fix it that way. If somebody could step up to investigate, that would be great. |
I'm seeing postgresql_17 fail on x86_64-darwin (native, not using Rosetta) due to a similar error:
From postmaster.log:
I'm using the default values for the shared memory sysctls:
The tests still fail after increasing |
Seems like those same errors already appeared in 2022: #198495. Imho this implies that there must be a way to fix this, very likely via some system configuration. |
This is now happening for aarch64-darwin on hydra as well: Edit: Rebuilding seems to have fixed it. The failing builds where |
I'm looking into this again and here are some observations:
The current
One further note: The There is an interesting comment in the postgres source code about this topic here: After googling around for this specific error with size=56... I found this: https://discourse.nixos.org/t/nixbld-leaving-around-shared-memory-segments/30043/5 This seems to be on the point. The community-builder currently has some left-over shared memory segments:
The timings say that they were created when I first started to work on the community builder today. I tried to run some derivations with Here's what happens, I think:
Since I don't have root access to the community builder, I can't clear those memory segments up for further testing. This seems to go wrong in a couple of places:
|
Oof, thanks a lot for the investigation.
I'd argue that ultimately, yes it should. For two reasons:
|
I was again fooled by the output. While those memory segments are blocking, they are not big. Those are the same 56 bytes as can be seen in the SEGSZ column. This also means they can't be changed to mmap, because they are just allocated as some kind of lock on the data directory and this will always use sysv. They are cleaned up on the Linux sandbox automatically, because Linux stores them on a tmpfs mounted at PostgreSQL removes them on INT, QUIT and TERM signals. But unsurprisingly doesn't on ABRT and KILL. So I guess we can't do anything on the nixpkgs side. |
Created NixOS/nix#12548 on the Nix side. And https://git.lix.systems/lix-project/lix/issues/691 for Lix. |
Can't you trap the signal and send TERM to PostgreSQL before potentially repeating the ABRT/KILL? |
IIUC, the idea of KILL is, that I can not trap it. I'm not exactly sure how nix kills the builder, though. I assume it must be via KILL, because of those observations. Consider this example:
This will print hello, then sleep. When you cancel the build, it does not show "trapped" for me. To double check, run it straight in bash:
This will print "trapped" when killing. So.. I don't think we can trap this. |
Steps To Reproduce
postgresql currently fails to build for all versions 15+ on x86_64-darwin. It fails in the installCheckPhase and versions 13 and 14 have the tests disabled anyway, thus they pass.
Build log
Adding the following debug statement gives some insight into the failure:
Additional context
Hydra failures:
Notify maintainers
@NixOS/postgres @NixOS/darwin-maintainers
Note for maintainers: Please tag this issue in your PR.
Add a 👍 reaction to issues you find important.
The text was updated successfully, but these errors were encountered: