Using IRMA with nextflow

I have this problem running my own written nextflow workflow.
I am trying to use IRMA GitHub - CDCgov/irma: IRMA (the Iterative Refinement Meta-Assembler) is a highly configurable and adaptive tool for virus genome assembly.
But the problem is that for some reason the result files in the folder /sample_name/amended_consensus are named wrong.
With this workflow they are
sample_name_1, sample_name_2, sample_name_3 etc.
and they should be
sample_name_A_PB2.fa etc
my process looks like this:

process run_irma {
    conda './envs/IRMA_env.yml'
    tag "$sample_id"

    publishDir "${params.output_dir}", mode: 'copy'

    input:
    tuple val(sample_id), path(fastqs), val(IRMA_mod), val(seq_type)


    script:
    """
    sample_dir=\$(pwd)/${sample_id}
    echo \$sample_dir
    
    if [ "$seq_type" == 'minion' ] ; then
        echo "using minion"
        conda run -n IRMA IRMA ${IRMA_mod} ${fastqs[0]} \$sample_dir
    elif [ "$seq_type" == 'nextseq' ] || [ "$seq_type" == 'miseq' ]; then
        echo "Not using minion "
        conda run -n IRMA IRMA ${IRMA_mod} ${fastqs[0]} ${fastqs[1]} \$sample_dir
    fi
    """

}

Is there any reason why nextflow would make this naming problem? And more importantly is there anyway to correct / fix this ?

Hi @heksaani,

Welcome to the community forum :slight_smile:

Could you please share a minimum reproducible example? With this snippet alone, it isn’t easy to debug what’s going on.

The publishDir directive moves specified process output files to a specific location. Still, as you don’t have an output block in your process, no output from this process should be moved to the specified location. I believe these files are NOT being moved there. Outputs from other processes are, and you think they’re the output of this process you shared.

Hey @mribeirodantas ,

I’ve accidentally copied version of the code which did not include the output block since I’ve already tried different versions to resolve the problem described. Sorry for that, here’s the code with the output block, using the workflow outputs.

#! /usr/bin/env nextflow

 
process run_irma {
    conda './envs/IRMA_env.yml'
    tag "$sample_id"

    input:
    tuple val(sample_id), path(fastqs), val(mode), val(seq_type)

    output:
    path("${sample_id}"), emit: out

    script:
    """
    sample_dir="${sample_id}"

    if [ $seq_type == 'minion' ] ; then
        conda run -n IRMA IRMA ${mode} ${fastqs[0]} \$sample_dir
    elif [ $seq_type == 'nextseq' ] || [ $seq_type == 'miseq' ]; then
        conda run -n IRMA IRMA ${mode} ${fastqs[0]} ${fastqs[1]} \$sample_dir
    fi
    """

}

and the main.nf


nextflow.preview.output = true
params.now = new Date().format("yyyy-MM-dd_HH-mm")
params.input_dir = null
params.output_dir = "irma_results_${params.now}"
params.input_type = null // 'nextseq', 'miseq', or 'minion'

/*
Import modules
*/
include { run_irma } from './modules/runIrma.nf'

workflow {
    main:

    if (!params.input_dir) {
        error "Please provide a folder path using --input_dir"
    }

    if (params.input_type == 'nextseq') {
        println "Processing NextSeq data"
        def mode = 'FLU'
        println "Mode is set to $mode "
        sample_files = Channel
        .fromFilePairs("${params.input_dir}/*_{R1,R2}*.fastq.gz")
        .ifEmpty { error "No sample fastqs found in: ${params.input_dir}" }

        irma_inputs  = sample_files
            .map { sample_id, fastqs -> 
                sample_id = sample_id.tokenize('_')[0]
                tuple(sample_id, fastqs, mode, params.input_type)
            }

    } else {
        error "Unsupported input type: ${params.input_type}. Supported types are: nextseq"
    }
    run_irma(irma_inputs)
    publish:
    irma_results = run_irma.out
}


output {
  irma_results {
    path { _path -> "${params.output_dir}" }
  }
}

These produce following folders for two test samples

/results/irma_results_2025-09-01_13-54/test_sample1/

/results/irma_results_2025-09-01_13-54/test_sample2/

in these folders are IRMA’s results.

However, the FASTA files in the amended_consensus/ subfolder are misnamed.

Running IRMA directly on the command line / or with conda produces segment-specific names:

/results/irma_results_2025-09-01_13-54/test_sample1/amended_consensus/test_sample1_HA.fa

/results/irma_results_2025-09-01_13-54/test_sample1/amended_consensus/test_sample1_NA.fa

Running IRMA via Nextflow produces generic numbered names:

amended_consensus/test_sample1_1.fa

amended_consensus/test_sample1_2.fa

The segment names (_HA, _NA) are lost in the Nextflow run. This naming is important for downstream analyses, because these suffixes indicate the viral segment in each FASTA file.

I think this is happening because you staged the input files as a single collection rather than as separate paths.

When you declare the input files as path(fastqs), sometimes they are automatically staged with numbered suffixes like 1, 2, etc. You could try declaring separate inputs:

    input:
    tuple val(sample_id), path(fastq_1), path(fastq_2), val(mode), val(seq_type)

The second path appears to be optional, so you’ll need to pass an empty list [] instead of null when the second path is missing, due to the current limitations with path inputs.

I’m not 100% sure about this, but I think it’s worth trying

@heksaani

I’m not sure if this is the root cause, but I noticed an unusual setup!
IRMA is being run inside a Conda environment within the Nextflow process using conda run -n IRMA, while the process itself is also configured to activate a Conda environment via the conda './envs/IRMA_env.yml' directive.

This creates a nested Conda environment situation, which may interfere with IRMA’s normal behavior and might be leading to the generic numbered filenames instead of the expected segment-specific names. :grinning_face: