Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fusion support to HTCondor #5865

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

JosephLalli
Copy link

Hi there,

Feel free to use this code or not, but I want to let you know that I have come back to this project (#3697) and have Nextflow successfully running on UWisc CHTC's HTCondor implementation using Fusion and a connection to a local s3 storage. I have a separate Dockerfile and launch script [here] that allows for running Nextflow in an Apptainer container off a submit node with the ancillary files necessary to establish a connection to the scheduler.

I have a feeling that this code would require some changes to meet your house style. First, I do not know how Nextflow typically handles the kind of issues surrounding scheduler access that I described above. I know you don't have a separate Dockerfile for each grid computation software package. I'm willing to work to implement scheduler access in a Nextflow-y manner (even if it's just "make sure the user environment is set up correctly").

Second, I could not figure out how to directly submit both a server configuration request (called a submit file in Condor) and the script to run over stdin. Condor requires both a submit file (which can be submitted via stdin) and a separate execution file, which is specified in the submit file and automatically transferred to the compute server. For now, I have created a generateFusionBashWrapperCommand function in the CondorTaskHandler class that reports the output of FusionHelper.runWithContainer with a bash shebang. fusionStdinWrapper writes the output of generateFusionBashWrapperCommand to FileHelper.getLocalTempPath().resolve(".condor.${task.id}.${task.hash}.sh"), which is then included in the submit file that is passed to condor_submit. I copied write0 from GridTaskHandler, since write0 is private and I couldn't figure out how to run it from CondorTaskHandler. I imagine there is a better file writing function to use, as well as a file location and file name that better comports with Nextflow style.

Finally, Condor allows for the "condor_history" command to parse a local log file that is automatically updated by multiple jobs and report the job status (amongst other attributes) of each job. The advantage of this method over "condor_q" is that it does not make a submission to the scheduler server, and can be repeatedly and easily checked. I give each job the same local log file, saving it to the $PWD with the run uuid as file name. I imagine there is a logical place for Nextflow to store this file, but I do not know where.

It has only been tested with Apptainer process containers, but both nextflow hello and nextflow rnaseq-nf run to completion. Root access is not an option on remote servers, so docker cannot be used, but most other containerization software should function well.

Notably, condor will handle file transfer of small files, including container image files. While I used a shared drive for my apptainer cache location, it would be straightforward to tell condor to transfer a locally stored container image to the execution server before running Fusion's apptainer command, or potentially to download it from Tower.

I'm happy to work with you all to integrate this into Nextflow as you see fit.

Signed-off-by: Joseph L Lalli [email protected]

bentsherman and others added 30 commits June 21, 2023 17:57
…custom CondorTaskHandler. Now both are defined.
@JosephLalli JosephLalli requested a review from a team as a code owner March 9, 2025 23:41
Copy link

netlify bot commented Mar 9, 2025

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit ae98186
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67ce292899068d0008ecddbb
😎 Deploy Preview https://deploy-preview-5865--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants