When I run the pipeline I get this error: Caused by:
Missing container image for process MAIN_WORKFLOW:PREPROCESS_READS:MARSHAL_FASTQ:CHECK_FASTQ_COMPRESSED
But, this process doesn’t have any containers because it is all written in groovy. Is this error occurring because there is an error with that first process or because somewhere else in the pipeline there is a process that is missing a docker container?
If anyone could point me to some type of documentation on how to specify multiple profiles in each module that would also be helpful.
The CHECK_FASTQ_COMPRESSED you linked is more than just Groovy - it’s generating a bash command that is run inside the container. It’s definitely a very unusual way of checking file extensions, but it is definitely still generating bash. This bash will need to run in a container. Most minimal bash containers will suffice, e.g. Quay
In Nextflow configuration, you can add the container with:
process {
withName: CHECK_FASTQ_COMPRESSED {
container = 'quay.io/nextflow/bash'
}
}
Note that you should be able to make the process pure Groovy (no containers) if you want to. A more standard way to do this would be in the workflow with a map operator on your input channel though.
Here’s a mini Nextflow pipeline to show the kind of thing I mean:
#!/usr/bin/env nextflow
params.input = '*.fastq.gz'
params.single_end = false
workflow {
// Create input channel based on single_end parameter
if (params.single_end) {
input_ch = Channel.fromPath(params.input)
.map { file -> [file.baseName, file] }
} else {
input_ch = Channel.fromFilePairs(params.input, size: 2)
}
// Check filename extensions using map operator
checked_ch = input_ch.map { meta, files ->
def fileList = files instanceof List ? files : [files]
fileList.each { file ->
if (!file.toString().endsWith('.gz')) {
error("Error: Input file ${file.toString()} does not end with .gz")
}
}
return [meta, files]
}
checked_ch.view()
}
Example run:
❯ nextflow run main.nf --input "sample*_R{1,2}.*"
N E X T F L O W ~ version 25.04.6
Launching `main.nf` [distracted_swirles] DSL2 - revision: 097973ef0d
Error: Input file /testing/sample2_R1.fastq does not end with .gz
This approach is better because it’ll run on the head Nextflow process almost instantaneously, without needing to submit a job to the compute nodes / cloud and without needing to provision a software container (all of which can take several minutes).