Increasing allocation for running on google cloud

Hello. I am trying to run the rnaseq pipeline on google cloud. I am using the following command. However the pipeline runs into various issues sometimes with internal timeouts sometimes with dispatching nodes. Is there a way to increase timeout so that all samples run.

nextflow run nf-core/rnaseq -revision 3.12.0 -profile googlels -bg --project_id key-fabric-370115 --outdir xxxx --input yyy -w zzz --remove_ribo_rna --use_spot false --use_private_ip false --fasta gs://acc86e37d100400190e7d6-293357e0034f41bca80aad6cb82c405b/Genomes/GRCh38/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz --gtf gs://acc86e37d100400190e7d6-293357e0034f41bca80aad6cb82c405b/Genomes/GRCh38/Homo_sapiens.GRCh38.110.gtf.gz

Hi @Sreya_Mukherjee. Welcome to the community forum!

Can you share exactly the errors you’re getting? nf-core/rnaseq, by default, has dynamic retries, which means tasks failing for taking too long will be retried with longer time limits (just like for CPUs and memory).

each time it has stopped at different processes. Sharing the latest one.

Command executed:

  unset DISPLAY
  mkdir -p tmp
  export _JAVA_OPTIONS=-Djava.io.tmpdir=./tmp
  qualimap \
      --java-mem-size=29491M \
      rnaseq \
       \
      -bam L8-21342_T1.markdup.sorted.bam \
      -gtf Homo_sapiens.GRCh38.110.gtf \
      -p non-strand-specific \
      -pe \
      -outdir L8-21342_T1
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:QUALIMAP_RNASEQ":
      qualimap: $(echo $(qualimap 2>&1) | sed 's/^.*QualiMap v.//; s/Built.*$//')
  END_VERSIONS

Command exit status:
  9

Command output:
  (empty)

Command error:
  The worker was unable to check in, possibly due to a misconfigured network

This googlels profile is a custom profile you created, right? Can you please share it here?

Besides, can you share the .nextflow.log of a failed run? It may show some info related to quota limits that you may be hitting.

runs_Jan262024_realignments_Results_run2_addon_pilot_pipeline_info_execution_trace_2024-02-16_16-39-48.txt (20.6 KB)

Only few samples have failed. I reran the samples with a custom config file as follows:
process {
time = ‘96h’ // Set a timeout of 4 hours for all processes
}

executor {
queueSize = 200
exitReadTimeout = ’ 48h’

but the samples still failed

Hi @Sreya_Mukherjee

I asked for the .nextflow.log file. For some reason, you shared the trace file. I also asked for the googlels profile (with sensitive information removed please). I need your help in order to help you :slight_smile:

So far, the evidence you brought makes me believe it’s a misconfiguration on Google Cloud and unrelated to Nextflow.

I see. actually only 2 samples are failing from the sortmerna process. I resumed the pipleine by increasing the memory as I showed before. If this also doesnt work. I dont know what the issue is. How can i resolve thi google cloud situation.

Also all the other samples seem to have run but I don’t get the final counts. Is there a nexftlow protocol that if all samples do not run then the final output of counts is not genrated. Can I bypass that?

Does the pipeline finishes with success, apart from these two failed samples?

They do. I see upto featurecounts. But no MultiQC folder. and there is no final merged tsv or rds

This is the latest

TaskHandler[id: 3530; name: NFCORE_RNASEQ:RNASEQ:SORTMERNA (xxx); status: RUNNING; exit: -; error: -; workDir: gs://486512684c40242cbadedfc86044ba316c425facc9556836db702c/runs//work_dir/8a/96a773e97170c053c65b5a41b20e29]
~> TaskHandler[id: 3611; name: NFCORE_RNASEQ:RNASEQ:SORTMERNA (xxx); status: RUNNING; exit: -; error: -; workDir: gs://486512684c40242cbadedfc86044ba316c425facc9556836db702c/runs/work_dir/a1/3ba889b9161de22409580cd7e40cd0]
~> TaskHandler[id: 7883; name: NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_READDUPLICATION (xxx); status: RUNNING; exit: -; error: -; workDir: gs://486512684c40242cbadedfc86044ba316c425facc9556836db702c/runs/work_dir/fc/8540aa1ecc5cc2be796f4e01329197]

and there is nothing in those buckets other than .command.run , .command.begin. and .command.sh

I have noticed that the fastq file stops in the sortmerna step. Is there a way to know what nextflow does in this step that it stops