Using IRMA with nextflow

heksaani · August 28, 2025, 9:15am

I have this problem running my own written nextflow workflow.
I am trying to use IRMA GitHub - CDCgov/irma: IRMA (the Iterative Refinement Meta-Assembler) is a highly configurable and adaptive tool for virus genome assembly.
But the problem is that for some reason the result files in the folder /sample_name/amended_consensus are named wrong.
With this workflow they are
sample_name_1, sample_name_2, sample_name_3 etc.
and they should be
sample_name_A_PB2.fa etc
my process looks like this:

process run_irma {
    conda './envs/IRMA_env.yml'
    tag "$sample_id"

    publishDir "${params.output_dir}", mode: 'copy'

    input:
    tuple val(sample_id), path(fastqs), val(IRMA_mod), val(seq_type)


    script:
    """
    sample_dir=\$(pwd)/${sample_id}
    echo \$sample_dir
    
    if [ "$seq_type" == 'minion' ] ; then
        echo "using minion"
        conda run -n IRMA IRMA ${IRMA_mod} ${fastqs[0]} \$sample_dir
    elif [ "$seq_type" == 'nextseq' ] || [ "$seq_type" == 'miseq' ]; then
        echo "Not using minion "
        conda run -n IRMA IRMA ${IRMA_mod} ${fastqs[0]} ${fastqs[1]} \$sample_dir
    fi
    """

}

Is there any reason why nextflow would make this naming problem? And more importantly is there anyway to correct / fix this ?

mribeirodantas · August 29, 2025, 12:53pm

Hi @heksaani,

Welcome to the community forum

Could you please share a minimum reproducible example? With this snippet alone, it isn’t easy to debug what’s going on.

The publishDir directive moves specified process output files to a specific location. Still, as you don’t have an output block in your process, no output from this process should be moved to the specified location. I believe these files are NOT being moved there. Outputs from other processes are, and you think they’re the output of this process you shared.

heksaani · September 1, 2025, 12:30pm

Hey @mribeirodantas ,

I’ve accidentally copied version of the code which did not include the output block since I’ve already tried different versions to resolve the problem described. Sorry for that, here’s the code with the output block, using the workflow outputs.

#! /usr/bin/env nextflow

 
process run_irma {
    conda './envs/IRMA_env.yml'
    tag "$sample_id"

    input:
    tuple val(sample_id), path(fastqs), val(mode), val(seq_type)

    output:
    path("${sample_id}"), emit: out

    script:
    """
    sample_dir="${sample_id}"

    if [ $seq_type == 'minion' ] ; then
        conda run -n IRMA IRMA ${mode} ${fastqs[0]} \$sample_dir
    elif [ $seq_type == 'nextseq' ] || [ $seq_type == 'miseq' ]; then
        conda run -n IRMA IRMA ${mode} ${fastqs[0]} ${fastqs[1]} \$sample_dir
    fi
    """

}

and the main.nf


nextflow.preview.output = true
params.now = new Date().format("yyyy-MM-dd_HH-mm")
params.input_dir = null
params.output_dir = "irma_results_${params.now}"
params.input_type = null // 'nextseq', 'miseq', or 'minion'

/*
Import modules
*/
include { run_irma } from './modules/runIrma.nf'

workflow {
    main:

    if (!params.input_dir) {
        error "Please provide a folder path using --input_dir"
    }

    if (params.input_type == 'nextseq') {
        println "Processing NextSeq data"
        def mode = 'FLU'
        println "Mode is set to $mode "
        sample_files = Channel
        .fromFilePairs("${params.input_dir}/*_{R1,R2}*.fastq.gz")
        .ifEmpty { error "No sample fastqs found in: ${params.input_dir}" }

        irma_inputs  = sample_files
            .map { sample_id, fastqs -> 
                sample_id = sample_id.tokenize('_')[0]
                tuple(sample_id, fastqs, mode, params.input_type)
            }

    } else {
        error "Unsupported input type: ${params.input_type}. Supported types are: nextseq"
    }
    run_irma(irma_inputs)
    publish:
    irma_results = run_irma.out
}


output {
  irma_results {
    path { _path -> "${params.output_dir}" }
  }
}

These produce following folders for two test samples

/results/irma_results_2025-09-01_13-54/test_sample1/

/results/irma_results_2025-09-01_13-54/test_sample2/

in these folders are IRMA’s results.

However, the FASTA files in the amended_consensus/ subfolder are misnamed.

Running IRMA directly on the command line / or with conda produces segment-specific names:

/results/irma_results_2025-09-01_13-54/test_sample1/amended_consensus/test_sample1_HA.fa

/results/irma_results_2025-09-01_13-54/test_sample1/amended_consensus/test_sample1_NA.fa

Running IRMA via Nextflow produces generic numbered names:

amended_consensus/test_sample1_1.fa

amended_consensus/test_sample1_2.fa

The segment names (_HA, _NA) are lost in the Nextflow run. This naming is important for downstream analyses, because these suffixes indicate the viral segment in each FASTA file.

bentsherman · September 16, 2025, 1:38pm

I think this is happening because you staged the input files as a single collection rather than as separate paths.

When you declare the input files as path(fastqs), sometimes they are automatically staged with numbered suffixes like 1, 2, etc. You could try declaring separate inputs:

    input:
    tuple val(sample_id), path(fastq_1), path(fastq_2), val(mode), val(seq_type)

The second path appears to be optional, so you’ll need to pass an empty list [] instead of null when the second path is missing, due to the current limitations with path inputs.

I’m not 100% sure about this, but I think it’s worth trying

Muneeb · September 28, 2025, 5:06am

@heksaani

I’m not sure if this is the root cause, but I noticed an unusual setup!
IRMA is being run inside a Conda environment within the Nextflow process using conda run -n IRMA, while the process itself is also configured to activate a Conda environment via the conda './envs/IRMA_env.yml' directive.

This creates a nested Conda environment situation, which may interfere with IRMA’s normal behavior and might be leading to the generic numbered filenames instead of the expected segment-specific names.

heksaani · October 8, 2025, 1:10pm

I tried this one but it seems that it had no effect on the results

heksaani · October 8, 2025, 1:11pm

Oh good catch, thank you. I also took out the conda call but it did not affect the file naming error

Topic		Replies	Views
Why nextflow overwrite my input? Ask for help nextflow	6	113	March 28, 2025
Custom filepaths for workflow-level output Ask for help nextflow	3	112	June 21, 2024
Error on selecting a specific output from a process that outputs multiple files, and pass it to the next process? Ask for help nextflow , hpc	6	423	August 11, 2024
Challenging Nextflow workflow - Need help please! Ask for help nextflow , nf-core	2	102	January 29, 2025
Using Reference File Name to Emit Reference Path Ask for help	3	104	June 25, 2024

Using IRMA with nextflow

Related topics