There’s an important concept here, you need to check to be operating on the contents of a channel, not the channel itself. Let me explain…
When you do if(combined_fastq.check.ifEmpty(true))
you are saying the following:
if the channel combined_fastq.check
is empty, view the reuse channel
But this doesn’t really make sense, because a channel is an object. It’s a way of chaining processes together, so when you check it you are saying “hey, does this channel contain anything?”, but of course the channel does contain stuff, because you just populated it from the process. If you check the documentation for the ifEmpty operator, you will see it creates a channel when the channel is empty, so your example code is creating a new channel containing the value true
within the if
statement, instead of filtering on a criteria. Darn!
What you want to do is check the contents of the channel, to see if it contains the item contain frum_fastq
. If we rephrase your question slighty…
if items in the channel combined_fastq.reuse
contains frum_fastqs
, view them
This now becomes more clear what to do. We should look inside the items within the combined_fastq
outputs and see if they include frum_fastqs
.
Based on your samplesheet, we can actually do this without running a process at all! Let’s have a go. I’ve saved your example samplesheet into a file input.csv
.
workflow {
samplesheet_ch = Channel.fromPath("input.csv", checkIfExists: true)
.splitCsv(header: true) // Split the CSV file into individual items
// Let's get only the samples that have a valid value for frum_fastq_dir
samplesheet_ch
.filter { it.frum_fastq_dir }
.view()
}
This should write the following to your terminal:
> nextflow run .
N E X T F L O W ~ version 23.10.1
Launching `./main.nf` [chaotic_cray] DSL2 - revision: 62eb8c2da7
[sample:sample02_run1, samplename:sample02, orderid:ord_02, fastq_dir:/sample02/b03/*.fastq.gz, reference:, frum_fastq_dir:/fastq_pass/b01/*fastq.gz]
The first part is reading the samplesheet and parsing it using the splitCsv operator.
The second part uses filter to remove any samples from the channel that do not contain a value for frum_fastq_dir
. This leaves us with a single sample for frum_fastq
.
So how can we use this? Well it depends on exactly what you want to do, but let’s imagine you want to run PREPROCESS_FASTQS on samples that do not contain frum_fastqs
. We could do this:
process PREPROCESS_FASTQ {
input:
tuple val(sample_id), val(order_id), path(fastqs)
output:
tuple val(sample_id), val(order_id), path("${sample_id}_a_all.fastq.gz")
script:
"""
echo "myfastqdatagoeshere" | gzip > ${sample_id}_a_all.fastq.gz
"""
}
workflow {
samplesheet_ch = Channel.fromPath("input.csv", checkIfExists: true)
.splitCsv(header: true) // Split the CSV file into individual items
samplesheet_ch
// Let's remove samples that do not include frum_fastq_dir
.filter { !it.frum_fastq_dir }
// We use a map to make the channel fit the input tuple of the process
.map { meta ->
tuple(meta.sample_id, meta.order_id, file(meta.fastq_dir, checkIfExists: true))
}
.set { for_preprocessing_ch }
for_preprocessing_ch.view() // I put this here for debugging. It can be removed.
PREPROCESS_FASTQ(for_preprocessing_ch)
}
In conclusion, it’s possible to check if a channel is empty using isEmpty, however, I’m not sure this is what you want to achieve. Instead, you have to think about operating on the contents of your channels and using them to connect your processes together and build your pipeline. I hope this helps!