Renaming large files causing possible Fusion error

Adam_Talbot · July 4, 2024, 3:02pm

You don’t actually need to mv or rename the file here. It’s not adding anything to the process. Instead, just capture the original file, then rename it when you need to use it.

process initial_feature_count{
  tag "$sample_id"

  input:
  tuple val(sample_id), path(sorted_bam)
  path(gtf)

  output:
  tuple val(sample_id), path("${sample_id}.sortedByCoord.out.bam.featureCounts.bam"), emit: feature_count_bam

  script:
  """
  # Run feature counts on the sorted STAR bam including strandedness and annotation of multimappers
  featureCounts -a $gtf -o ${sample_id}.star.featureCounts.gene.txt -R BAM $sorted_bam -T 4 -t transcript -g gene_id --fracOverlap 0.5 --extraAttributes gene_name -s 1 -M
  """
}

We can rename it two ways. If we use it in a subsequent process, we can just change the input name at runtime:

process process_2 {
  tag "$sample_id"

  input:
  tuple val(sample_id), path("${sample_id}.sortedByCoord.featureCounts.bam")
  path(gtf)

  output:
  tuple val(sample_id), path("${sample_id}_files.txt"), emit: feature_count_bam

  script:
  """
  ls -lh ${sample_id}.sortedByCoord.featureCounts.bam > ${sample_id}_files.txt
  """
}

If you want to publish it, you can rename it with the saveAs option of publishDir:

publishDir "${params.outdir}, pattern: "*.sortedByCoord.out.bam.featureCounts.bam", saveAs: { filename -> "${sample_id}.sortedByCoord.featureCounts.bam" }

Doing this will save you expensive IO operations for renaming a file.

Here’s a miniature example for demonstration purposes:

process ECHO_1 {
    input:
        val sample
    output:
        tuple val(sample), path("${sample}_echo_1.txt"), emit: output

    script:
    """
    touch ${sample}_echo_1.txt
    """
}

process ECHO_2 {
    input:
        tuple val(sample), path("${sample}.txt")

    output:
        tuple val(sample), path("${sample}_echo_2.txt"), emit: output

    """
    cat ${sample}.txt > ${sample}_echo_2.txt
    """
}

workflow {
    input_ch = Channel.of("A", "B", "C")
    ECHO_1(input_ch)
    ECHO_2(ECHO_1.out.output)
    ECHO_2.out.output.view()
}

Topic		Replies	Views
Using mv command in AWS environment with fusion Ask for help nextflow , fusion	1	310	October 28, 2023
Aws s3 cp with Fusion terminating multipart download early with v24.04.03 Ask for help nextflow , tower , fusion , aws	0	68	July 29, 2024
Pipeline not working in AWS Batch because of a fusion problem Ask for help fusion , aws , platform	6	63	April 21, 2025
Problem with upload nf-core output on minio server Ask for help aws , nf-core	0	45	February 28, 2025
Nf-core/pangenome aws-megatests fail - maybe because of fusion or wave? Ask for help fusion , aws , nf-core	4	182	March 1, 2024

Renaming large files causing possible Fusion error

Related topics