Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with aBSREL #1794

Open
jaredbernard opened this issue Jan 22, 2025 · 11 comments
Open

Segmentation fault with aBSREL #1794

jaredbernard opened this issue Jan 22, 2025 · 11 comments

Comments

@jaredbernard
Copy link

I ran a large dataset (>1000 sequences) with aBSREL using the mpi set to 8 cores, which was going fine for a week until it stopped with a segmentation fault error. Is this a memory problem? Do you have suggestions?

@spond
Copy link
Member

spond commented Jan 22, 2025

Dear @jaredbernard,

Was there an error message printed out the stdout or stderr prior to the crash? Usually there's something that hyphy encounters prior to a fault.

Without any further information, I can't be very hepful.

Best,
Sergei

@jaredbernard
Copy link
Author

No further information was given unfortunately. I just wondered if segmentation fault was generally connected to memory issues. I restarted the run, so I'll see in a week if there's a better result.

@spond
Copy link
Member

spond commented Jan 22, 2025

Dear @jaredbernard,

What was the actual crash message? Like the code attached to the segfault. Also, which hyphy version do you have (hyphy --version)?

Just an FYI, absrel won't use MPI (it will use multiple threads if available). If you start an MPI process, all the work will happen on the "master" node.

As far segmentatiion faults go, they need not be memory related. More often than not hyphy encounters an irrecoverable error (and it will print a message out for you) and then the segfault occurs during the shutdown.

Best,
Sergei

@jaredbernard
Copy link
Author

jaredbernard commented Jan 31, 2025

Another job ran for a week (on version HYPHY 2.5.64(MP) for Linux on x86_64 x86 SSE4 SIMD zlib (v1.2.11)) and ended in the message

libgomp: Thread creation failed: Invalid argument
Segmentation fault (core dumped)

The program didn't appear to print any stdout or stderr.

@jaredbernard
Copy link
Author

I'm rerunning the command using sudo, just wondering if there are limits when not executed from root.

@spond
Copy link
Member

spond commented Feb 2, 2025

Dear @jaredbernard,

I have never seen the libgomp issue should up before, but libgomp is the standard multithreaded library that supports OpenMP. This error happens outside hyphy, and I don't have the foggiest idea of what could be causing it. This should in no way be affected by whether or not you are a superuser or not. Can you try adding CPU=1 to the command line argument for hyphy, like hyphy CPU=1 absrel ... and see if the same issue recurs?

Best,
Sergei

@jaredbernard
Copy link
Author

jaredbernard commented Feb 2, 2025

Okay, I'll interrupt it and restart. I wanted it to run faster than 1 core.

@jaredbernard
Copy link
Author

Just so you know, aBSREL didn't have this problem with other datasets I've analyzed. This one is somewhat larger, not in terms of number of sequences but length of alignment. Given that, could this error be caused by a different issue? You already indicated that aBSREL won't use multiple threads anyway, so perhaps setting the threads to 1 wouldn't matter. My job is still processing, and probably will for at least 5 more days, but if it's likely to end in another segmentation fault, I'd be willing to stop it again and try something else. If you have any further ideas, I'm open to them. Do you think installing something that enables libgomp would help?

@spond
Copy link
Member

spond commented Feb 4, 2025

Dear @jaredbernard,

How long is your alignment? There's an experimental option (--blb N, where 0<N≤1), which implements a bag of bootstraps approach (https://pubmed.ncbi.nlm.nih.gov/34734192/) for aBSREL. It should significantly speed up aBSREL runs for long (say 10,000 codons or more) alignments. The payoff is greater for longer alignments. There's stochasticity in the process, so the idea is to run absrel --blb N several times and take majority/consensus results.

For example, with v2.5.68 on MBPro M4 (20 sequences, 3331 codons).

hyphy absrel --alignment ~/Development/hyphy/tests/data/mammalian_mtDNA.mtnex --code "Vertebrate-mtDNA"

....

### Adaptive branch site random effects likelihood test 
Likelihood ratio test for episodic diversifying positive selection at Holm-Bonferroni corrected _p =   0.0500_ found **5** branches under selection among **37** tested.

* Node1, p-value =  0.00000
* Node34, p-value =  0.00000
* Node18, p-value =  0.00022
* Node2, p-value =  0.02826
* Node31, p-value =  0.03161
hyphy absrel --alignment ~/Development/hyphy/tests/data/mammalian_mtDNA.mtnex  12479.36s user 164.98s system 489% cpu 43:05.22 total


BLB run 1 (--blb 0.7 ~40x faster than a full run, can be farmed out to individual CPUs)

hyphy absrel --alignment ~/Development/hyphy/tests/data/mammalian_mtDNA.mtnex --code "Vertebrate-mtDNA" --blb 0.7

### Adaptive branch site random effects likelihood test 
Likelihood ratio test for episodic diversifying positive selection at Holm-Bonferroni corrected _p =   0.0500_ found **9** branches under selection among **37** tested.

* Node18, p-value =  0.00000
* Node34, p-value =  0.00000
* Node31, p-value =  0.00000
* Node3, p-value =  0.00000
* GIBBON, p-value =  0.00000
* Node1, p-value =  0.00000
* MOUSE, p-value =  0.00000
* Node28, p-value =  0.00000
* Node4, p-value =  0.00003
hyphy absrel --alignment ~/Development/hyphy/tests/data/mammalian_mtDNA.mtnex  190.51s user 3.63s system 280% cpu 1:09.32 total

BLB run 2

### Adaptive branch site random effects likelihood test 
Likelihood ratio test for episodic diversifying positive selection at Holm-Bonferroni corrected _p =   0.0500_ found **8** branches under selection among **37** tested.

* Node34, p-value =  0.00000
* Node1, p-value =  0.00000
* Node26, p-value =  0.00000
* COW, p-value =  0.00000
* Node5, p-value =  0.00000
* Node2, p-value =  0.00006
* Node18, p-value =  0.00015
* SUMATRAN_ORANGUTAN, p-value =  0.00105
hyphy absrel --alignment ~/Development/hyphy/tests/data/mammalian_mtDNA.mtnex  225.99s user 4.34s system 279% cpu 1:22.43 total

....

Best,
Sergei

@jaredbernard
Copy link
Author

Thanks, I'm trying it now. My alignment is nearly 700 sequences and it was 288 codons long, although I just trimmed it a bit more to eliminate more gaps. I set blb to 0.8, and I'll let you know how it goes.

@spond
Copy link
Member

spond commented Feb 5, 2025

Dear @jaredbernard,

Ah, that's for the clarification. You have a lot of sequences (700) not sites, so blb is not gonna do much.

Best,
Sergei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants