Pipeline getting frozen (nf-core)

HI! Hope you can help me with a pipeline getting frozen.Im running rnaseq pipeline and finding an issue previously reported (link below) but without a proper solution.

In the rnaseq github they pointed out this may not be a pipeline issue so I should reach out nextflow forums

Here there is a thread with a similar issue (but without any anser): Slack

The thingis, im running the rnaseq on a high RAM cpu (standalone machine) and after few hours (or even days) the process get stuck without prompting any error (and making no progress). I have tried to batch the samples to reduce the computational time, going from 110 sampels to 10, but same thing happened (just earlier on the 10 samples batch).

The execution trace does not report any failure (neither the shell), it just freezes and do not update anymore , and the nextflow log repeats over and over a set of lines (and it DOES update, but just repeating the same). Please find attached the nextflow logs and the execution trace (Showing no errors).

If any additional info is required just ask , and thank you so much.

Example of the error

g-26 12:24:00.053 [Task submitter] DEBUG n.processor.TaskPollingMonitor - %% executor local > tasks in the submission queue: 23 -- tasks to be submitted are shown below
~> TaskHandler[id: 73; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (A189_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/59/f394ecc3e8372f1118ab25e37b882b]
~> TaskHandler[id: 69; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (A187_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/fb/3b0c6e6abb0cd30a5b2819761fb678]
~> TaskHandler[id: 71; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (FACE34_PRE); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/1f/4a4bf3c2a95d8ff11ec3ea1bc9af81]
~> TaskHandler[id: 65; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (A193_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/a9/e51ad5f840f0993b8c547b8b134e45]
~> TaskHandler[id: 74; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (FACE11_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/89/fcc28cf8ffd5b422bc1d5d14e73f8c]
~> TaskHandler[id: 63; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (FACE13_PREm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/55/3e86b42259706eea46cc4e8effc944]
~> TaskHandler[id: 68; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (A187_PRE); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/e7/4f68e2b7ea07112071514853a6a38f]
~> TaskHandler[id: 62; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (FACE34_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/a9/8c39378f80e5eb3ac879ad3a8e8af7]
~> TaskHandler[id: 70; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (FACE11_PREm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/4b/2c31d2f7d599920b395f8a085a2040]
~> TaskHandler[id: 66; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN_IGENOMES (FACE13_POST); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/34/2c3b625c1ef1cb5b65e96481945aca]
.. remaining tasks omitted.

nextflow.log (799,1 KB)
execution_trace_2024-08-26_02-16-03.txt (12,4 KB)
nextflow.log.1.txt (3,1 MB)

While you have plenty of memory, you only have 20 CPUs:

Aug-23 16:05:00.423 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=20; memory=251.4 GB; capacity=20; pollInterval=100ms; dumpInterval=5m

nf-core/rnaseq uses 16 CPUs for resource intensive tasks, so this means only 1 concurrent run of a process like STAR_ALIGN. You can force these to cap out at 10 cpus using the option --max_cpus 10. This will enable 2 concurrent runs of these big processes.

I also notice you are downloading the genomes from the igenomes bucket every time. This will take considerable time before you can submit a process, so I would recommend downloading and providing your own reference build which is stored locally, preferably on the fastest storage your machine has access to. This will provide the lowest latency for running a pipeline. You can see the documentation here: rnaseq: Usage

Aug-23 16:05:16.778 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa to work dir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/stage-b383e9c7-a922-4e50-9271-c26b4e61aafa/5e/7a9c679b56550b9fa73d860ca781a0/genome.fa

Hi!

Thanks for the insights on the pipeline! I have implemented the max-cpus restriction and downloade the ref genomes (even tho I think Im still missing something).

I have made an attep updating the pipeline to the latest version , but even if the log now changes, the issue remains.

Im uploading the last log / trace, im quite desperate about why this is happening.
.nextflow (copy).log (293,8 KB)
execution_trace_2024-08-30_12-45-21.txt (11,7 KB)

According to your logs, you are still downloading igenomes every time. Set --igenomes_base to the local path you downloaded igenomes to:

Aug-30 15:15:41.607 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex to work dir: /home/bioinfo/nfcore/rnaseq_cocnhi/work/stage-0463113e-12b1-4a78-9695-785c6b4d0ca5/70/a4df0e698aad199b68a36d8ca83707/STARIndex

Check out the docs here: Docs: Reference Genomes