Why nextflow overwrite my input?

LChan · March 13, 2025, 4:59am

Dear seqera community,

I am new to nextflow. Attached is my first nextfow script, I tried to clean some weird characters found in the sub.fq.gz headers attached. However, the nextflow I wrote did run the first two process with no error, but it seems overwrote the input. Also, the third step failed. I appreciate anyone can help me test and suggests me what I am doing run.

Thank you so much.
Best,
LC

GeraldineVdA · March 14, 2025, 4:58pm

Hi @LChan, have you had a chance to go through our newcomer training course, Hello Nextflow? It covers the basics of managing inputs and outputs so I’d recommend you check that out and see if it helps you understand what’s going wrong with your script.

Adam_Talbot · March 17, 2025, 5:25pm

Hi @LChan,

In your workflow, you are creating an output file with the same name as an input file so it tries to write over the existing file, e.g.:

process clean_fq {
    publishDir "${params.output_dir}", mode: 'copy'

    input:
    path input_file1
    val input_filename

    output:
    path "${input_filename}", emit: cleaned_file

    script:
    """
    zcat ${input_file1} | awk '{
        if (NR % 4 == 1) {
            gsub(/\\x00/, "")
        }
        if (\$0 != "") {
            print
        }
    }' | gzip > ${input_filename}

    """
}

workflow {
    // Resolve the full path to the input file
    input_file1 = file(params.input_file1)

    // Extract the filename (excluding dir path)
    input_filename = input_file1.name

    clean_fq(input_file1, input_filename)
}

becomes:

zcat sub.fq.gz | awk '{
    if (NR % 4 == 1) {
        gsub(/\\x00/, "")
    }
    if (\$0 != "") {
        print
    }
}' | gzip > sub.fq.gz

Instead, you should make sure to rename the output file created at runtime.

#! /usr/bin/env nextflow

// Define the input parameters
params.input_files = "fastq/sub.fq.gz"
params.output_dir = "cleaned_fastq"

// Define the processes

// process to clean the fastq file
process clean_fq {
    publishDir "${params.output_dir}", mode: 'copy'

    input:
    path input_file

    output:
    path "${output_filename}", emit: cleaned_file

    script:
    output_filename = "${input_file.baseName}" + ".trim.fastq.gz"
    """
    zcat ${input_file} | awk '{
        if (NR % 4 == 1) {
            gsub(/\\x00/, "")
        }
        if (\$0 != "") {
            print
        }
    }' | gzip > ${output_filename}
    """
}

workflow {
    // Resolve the full path to the input file
    input_files = Channel.fromPath(params.input_files)

    clean_fq(input_files)
}

Differences:

Replace file with Channel.fromPath to handle as many inputs as you like (file will just do one)
Determine the output filename within the process using the file methods
Use the output filename within the script instead of the value

Benefits:

This will run as many samples as you like using a glob! (--input_files "fastq/*.fq.gz")
no file collisions within the process ever (i.e. overwriting the input with the output)
Easier to pass to the next process

Hopefully this gets you started and helps you fix the other two.

LChan · March 19, 2025, 8:13pm

Hi Adam,

Thank you so much for pointing out my issues. I thought the same file name with different input and output folder wouldn’t be an issue. My understanding was wrong. All my three processes works now.

In my terminal, the --input_files "fastq/*.fq.gz" must be in double-quotes.

Really appreciate your help.

Best,
LC

LChan · March 19, 2025, 8:14pm

Hi @GeraldineVdA,

Thanks for the suggestion. I watched the tutorial, it helps a lot.

Best,
LC

Adam_Talbot · March 21, 2025, 1:48pm

Ah woops! that’s my mistake. Yes you want to pass the string to Nextflow and not let your shell expand the glob.

In your terminal, this:

--input_files fastq/*.fq.gz

becomes:

 --input_files fastq/1.fq.gz fastq/2.fq.gz fastq/3.fq.gz fastq/4.fq.gz ...

Which doesn’t make sense to Nextflow. Quoting it prevents the expansion.

system · March 28, 2025, 1:48pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Input file name collision - There are multiple input files Ask for help	6	475	April 25, 2024
Getting Unexpected input: '{' Error, but neither I or github copilot can find any syntax error Ask for help	12	305	July 1, 2024
DRY Principle in Nextflow: Reusing Output Path Definitions in `output:` and `script:` sections Ask for help nextflow	3	32	June 18, 2025
Error on selecting a specific output from a process that outputs multiple files, and pass it to the next process? Ask for help nextflow , hpc	6	265	August 11, 2024
Running workflow on multiple samples Ask for help nextflow	4	280	August 12, 2024

Why nextflow overwrite my input?

Related topics