Automating Nextflow Launch from files on HPC Systems

Main use case I’ve heard from people is automating kicking off jobs as FastQs/base files come off the seqeuncer, then they want to launch a nextflow run something/pipeline with those files as input.

Ideas off the top of my head that I’ve heard discussed:

  1. Using watchPath with a constantly running Nextflow job.
  2. Using inotify
  3. Running a cronjob

We are running a cronjob for this purpose.

How difficult is it to set up cronjobs for this purpose @hubin-keio ? Do you have sudo access on your HPC or can this be done without sudo access?

I had a (very old) implementation of this idea here; GitHub - NYU-Molecular-Pathology/lyz-nf: lab monitoring program

main points being, usage of a cron job (see Makefile for details) and a lock file (detected inside the Nextflow script) to handle automated execution. You can put whatever Nextflow task you like inside it. I actually had a separate demultiplexing pipeline repo in that same GitHub organization, though I did not automate it because of the (at the time) difficulties in passing the Illumina bcl2fastq / bcl-convert SampleSheet.csv from the wet-lab to the HPC, among other things; you may have your own solution for this already implemented that would make the process easier. Worth noting that there’s a more updated nf-core demultiplexing pipeline available as well for reference

depending on how complex your automation needs are, you may or may not need more involved infrastructure setup. These days, if I were to do it over again, I would consider usage of Seqera Platform / Nextflow Tower plus the Platform’s cli interfaces, or API, to send jobs to Platform for mangement instead of executing the jobs directly from crontab.

I have done this in the past using watchmedo from watchdog.

1 Like