Input cardinality issue after collect

complexgenome · February 13, 2024, 1:50pm

I’m trying and learning to use collect after a process is completed. I’d like to send the output from collect to next

The output is multi channel so I use set and put them in a variable. The two variables are then send post collect.

The process for which collect is used:
feature.nf

process featurecounts {

    maxForks 10
        debug true
        errorStrategy 'retry'
    maxRetries 2
publishDir path: "${params.outdir}/${batch}/${timepoint}/RNA/primary/featurecounts/", mode: 'copy' 

input:
tuple val(batch),val(patient_id_tumor),val(timepoint), path(markdupli_bam, stageAs: 'feature_temp/*')

output:

 tuple val(batch),val(patient_id_tumor),val(timepoint), path("subreadout.fc.txt"), emit: foldchange
 tuple val(batch),val(patient_id_tumor),val(timepoint), path("subreadout.fc.txt.summary"), emit: foldchange_summary

    script:

    """
    /data1/software/subread-2.0.6-Linux-x86_64/bin/featureCounts -a $params.gtf_annotation_file -T 24 -o "subreadout.fc.txt" -p ${markdupli_bam}
    """
}

How collect is then passed multi_merge:

process merge_feature {
   
        debug true
        errorStrategy 'retry'
    maxRetries 2
publishDir path: "${params.outdir}/secondary_RNA/merged_featurecounts/", mode: 'copy' 

input:

tuple val(batch),val(patient_id_tumor),val(timepoint), path ('*.txt')
tuple val(batch),val(patient_id_tumor),val(timepoint), path ('*.txt')

output:
 path("*.{csv}")

script:

"""
Rscript /data1/software/Rscripts/Daphni2_scripts/RNA_Merge_NextFlow.R $params.outdir ./

"""

}

main.nf


include {merge_feature} 'feature.nf'
include {featurecounts} 'multimerge.nf'

featurecounts.out.foldchange | collect | set { out_foldchange }

featurecounts.out.foldchange_summary | collect | set { out_foldchange_summary }

merge_feature(out_foldchange,out_foldchange_summary)

Error:

WARN: Input tuple does not match input set cardinality declared by process rna:merge_feature – offending value: [SEMA-MM-002, MM-2692-T-01_T, MM-2692-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/36/cf52a6cdd41aca4d17b0093a87b5b7/subreadout.fc.txt, SEMA-MM-002, MM-3530-T-01_T, MM-3530-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/96/50ba577872a89b148cad5cc71757a1/subreadout.fc.txt, SEMA-MM-004, MM-4607-T-01_T, MM-4607-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/cd/e6fb64fa8fb22128bfb1e83776466d/subreadout.fc.txt, SEMA-MM-002, MM-0169-T-08_T, MM-0169-T-08, /mnt/data1/users/sanjeev/nextflow/batch/work/7d/8c4584ce0fff1b2793ef65065f7a72/subreadout.fc.txt, SEMA-MM-004, MM-0245-T-01_T, MM-0245-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/55/d06179893da6589657f6666a3c9d8d/subreadout.fc.txt, SEMA-MM-004, MM-2645-T-01_T, MM-2645-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/a9/bd7cddc7090b69732d3013243ebaa8/subreadout.fc.txt]

Caused by:
Process rna:merge_feature input file name collision – There are multiple input files for each of the following file names: .txt

I do not understand how to accept the list from collect in the multimerge process to avoid the cardinality warning.

mribeirodantas · February 13, 2024, 2:44pm

The flatten channel operator seems to be what you’re looking for. It will “un-collect” the channel.

complexgenome · February 13, 2024, 2:57pm

@mribeirodantas
Thank you. It prints them nicely, still I get error:

WARN: Input tuple does not match input set cardinality declared by process rna:merge_feature – offending value: SEMA-MM-001
ERROR ~ Error executing process > ‘rna:merge_feature (48)’

featurecounts.out.foldchange | collect | flatten| set { out_foldchange } 
featurecounts.out.foldchange_summary | collect  |flatten| set { out_foldchange_summary }

If I change input to:

process merge_feature {
    publishDir path: "${params.outdir}/secondary_RNA/merged_featurecounts/", mode: 'copy' 

input:

 path ('*.txt')
 path ('*.txt')

I get error:

ERROR ~ Error executing process > ‘rna:merge_feature (20)’

Caused by:
Process rna:merge_feature input file name collision – There are multiple input files for each of the following file names: .txt

How do I proceed?

Thank you again for your inputs, always.

mribeirodantas · February 13, 2024, 4:05pm

You gotta check how your channels look like after collect and flatten. For collect, for example, I believe you need flat: false as opton.

...
| collect(flat: false)
...

And based on the structure I’m trying to guess, I would say you’d rather need flatMap instead of flatten. Check the snippet below.

process FOO {
  input:
  val x
  val y

  output:
  tuple val(x), val(y)

  script:
  x = x + 1
  y = y + 1
  """
  """
}

process BAR {
  input:
  tuple val(x), val(y)

  output:
  stdout

  script:
  """
  echo ${x}+${y}
  """
}

workflow {
  Channel
    .of(1..10)
    .set { ch1 }
  Channel
    .of(11..20)
    .set { ch2 }

  FOO(ch1, ch2)
  | collect(flat: false)
  | flatMap()
  | BAR
  | view
}

Feel free to comment the | BAR line, or other lines before, to see what’s the output at the time. The dump channel operator is useful for this type of debugging

complexgenome · February 15, 2024, 4:13pm

Thank you.

system · February 22, 2024, 4:14pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sample names in value channel after collect() method Ask for help nextflow	4	268	January 6, 2024
Collecting channel entries consisting of tuples and sending it to a process Tips & Tricks nextflow	0	221	March 21, 2024
Making processes that collect a large number of input files resumable Ask for help nextflow	5	44	January 15, 2025
How to use collect on two process and do a join? Ask for help nextflow	4	167	March 13, 2024
How to print files names after collect in a process to a file Ask for help	11	340	May 20, 2024

Input cardinality issue after collect

Related topics