Amending workflow to work with docker

Hello all, I am trying to get a workflow that was written with singularity containers in mind but I now only have access to a system with docker. The workflow can be found here: GitHub - stenglein-lab/read_preprocessing: A nextflow workflow / pipeline to perform common NGS read preprocessing. This can be used as stand-alone or can be used as subworkflow in another nextflow pipeline. I went through each module and it seems like they have docker containers specified (?). I added the docker profile to the config file so I can specify that in the run command.

When I run the pipeline I get this error: Caused by:
Missing container image for process MAIN_WORKFLOW:PREPROCESS_READS:MARSHAL_FASTQ:CHECK_FASTQ_COMPRESSED

But, this process doesn’t have any containers because it is all written in groovy. Is this error occurring because there is an error with that first process or because somewhere else in the pipeline there is a process that is missing a docker container?

If anyone could point me to some type of documentation on how to specify multiple profiles in each module that would also be helpful.

Thanks for your help!

Hi Lexi

The CHECK_FASTQ_COMPRESSED you linked is more than just Groovy - it’s generating a bash command that is run inside the container. It’s definitely a very unusual way of checking file extensions, but it is definitely still generating bash. This bash will need to run in a container. Most minimal bash containers will suffice, e.g. Quay

In Nextflow configuration, you can add the container with:

process {
  withName: CHECK_FASTQ_COMPRESSED {
     container = 'quay.io/nextflow/bash'
  }
}

}

I think that was it! Thank you!

Note that you should be able to make the process pure Groovy (no containers) if you want to. A more standard way to do this would be in the workflow with a map operator on your input channel though.

Here’s a mini Nextflow pipeline to show the kind of thing I mean:

#!/usr/bin/env nextflow

params.input = '*.fastq.gz'
params.single_end = false

workflow {
    // Create input channel based on single_end parameter
    if (params.single_end) {
        input_ch = Channel.fromPath(params.input)
            .map { file -> [file.baseName, file] }
    } else {
        input_ch = Channel.fromFilePairs(params.input, size: 2)
    }
    
    // Check filename extensions using map operator
    checked_ch = input_ch.map { meta, files ->
        def fileList = files instanceof List ? files : [files]
        fileList.each { file ->
            if (!file.toString().endsWith('.gz')) {
                error("Error: Input file ${file.toString()} does not end with .gz")
            }
        }
        return [meta, files]
    }
    
    checked_ch.view()
}

Example run:

❯ nextflow run main.nf --input "sample*_R{1,2}.*"

 N E X T F L O W   ~  version 25.04.6

Launching `main.nf` [distracted_swirles] DSL2 - revision: 097973ef0d

Error: Input file /testing/sample2_R1.fastq does not end with .gz

This approach is better because it’ll run on the head Nextflow process almost instantaneously, without needing to submit a job to the compute nodes / cloud and without needing to provision a software container (all of which can take several minutes).

Hope that helps!

Thank you! I’ll take this into account in the future!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.