Input cardinality issue after collect

I’m trying and learning to use collect after a process is completed. I’d like to send the output from collect to next

The output is multi channel so I use set and put them in a variable. The two variables are then send post collect.

The process for which collect is used:
feature.nf

process featurecounts {

    maxForks 10
        debug true
        errorStrategy 'retry'
    maxRetries 2
publishDir path: "${params.outdir}/${batch}/${timepoint}/RNA/primary/featurecounts/", mode: 'copy' 

input:
tuple val(batch),val(patient_id_tumor),val(timepoint), path(markdupli_bam, stageAs: 'feature_temp/*')

output:

 tuple val(batch),val(patient_id_tumor),val(timepoint), path("subreadout.fc.txt"), emit: foldchange
 tuple val(batch),val(patient_id_tumor),val(timepoint), path("subreadout.fc.txt.summary"), emit: foldchange_summary

    script:

    """
    /data1/software/subread-2.0.6-Linux-x86_64/bin/featureCounts -a $params.gtf_annotation_file -T 24 -o "subreadout.fc.txt" -p ${markdupli_bam}
    """
}

How collect is then passed multi_merge:

process merge_feature {
   
        debug true
        errorStrategy 'retry'
    maxRetries 2
publishDir path: "${params.outdir}/secondary_RNA/merged_featurecounts/", mode: 'copy' 

input:

tuple val(batch),val(patient_id_tumor),val(timepoint), path ('*.txt')
tuple val(batch),val(patient_id_tumor),val(timepoint), path ('*.txt')

output:
 path("*.{csv}")

script:

"""
Rscript /data1/software/Rscripts/Daphni2_scripts/RNA_Merge_NextFlow.R $params.outdir ./

"""

}

main.nf


include {merge_feature} 'feature.nf'
include {featurecounts} 'multimerge.nf'

featurecounts.out.foldchange | collect | set { out_foldchange }

featurecounts.out.foldchange_summary | collect | set { out_foldchange_summary }

merge_feature(out_foldchange,out_foldchange_summary)

Error:

WARN: Input tuple does not match input set cardinality declared by process rna:merge_feature – offending value: [SEMA-MM-002, MM-2692-T-01_T, MM-2692-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/36/cf52a6cdd41aca4d17b0093a87b5b7/subreadout.fc.txt, SEMA-MM-002, MM-3530-T-01_T, MM-3530-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/96/50ba577872a89b148cad5cc71757a1/subreadout.fc.txt, SEMA-MM-004, MM-4607-T-01_T, MM-4607-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/cd/e6fb64fa8fb22128bfb1e83776466d/subreadout.fc.txt, SEMA-MM-002, MM-0169-T-08_T, MM-0169-T-08, /mnt/data1/users/sanjeev/nextflow/batch/work/7d/8c4584ce0fff1b2793ef65065f7a72/subreadout.fc.txt, SEMA-MM-004, MM-0245-T-01_T, MM-0245-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/55/d06179893da6589657f6666a3c9d8d/subreadout.fc.txt, SEMA-MM-004, MM-2645-T-01_T, MM-2645-T-01, /mnt/data1/users/sanjeev/nextflow/batch/work/a9/bd7cddc7090b69732d3013243ebaa8/subreadout.fc.txt]

Caused by:
Process rna:merge_feature input file name collision – There are multiple input files for each of the following file names: .txt

I do not understand how to accept the list from collect in the multimerge process to avoid the cardinality warning.

The flatten channel operator seems to be what you’re looking for. It will “un-collect” the channel.

@mribeirodantas
Thank you. It prints them nicely, still I get error:

WARN: Input tuple does not match input set cardinality declared by process rna:merge_feature – offending value: SEMA-MM-001
ERROR ~ Error executing process > ‘rna:merge_feature (48)’

featurecounts.out.foldchange | collect | flatten| set { out_foldchange } 
featurecounts.out.foldchange_summary | collect  |flatten| set { out_foldchange_summary } 

If I change input to:

process merge_feature {
    publishDir path: "${params.outdir}/secondary_RNA/merged_featurecounts/", mode: 'copy' 

input:

 path ('*.txt')
 path ('*.txt')

I get error:

ERROR ~ Error executing process > ‘rna:merge_feature (20)’

Caused by:
Process rna:merge_feature input file name collision – There are multiple input files for each of the following file names: .txt

How do I proceed?

Thank you again for your inputs, always.

You gotta check how your channels look like after collect and flatten. For collect, for example, I believe you need flat: false as opton.

...
| collect(flat: false)
...

And based on the structure I’m trying to guess, I would say you’d rather need flatMap instead of flatten. Check the snippet below.

process FOO {
  input:
  val x
  val y

  output:
  tuple val(x), val(y)

  script:
  x = x + 1
  y = y + 1
  """
  """
}

process BAR {
  input:
  tuple val(x), val(y)

  output:
  stdout

  script:
  """
  echo ${x}+${y}
  """
}

workflow {
  Channel
    .of(1..10)
    .set { ch1 }
  Channel
    .of(11..20)
    .set { ch2 }

  FOO(ch1, ch2)
  | collect(flat: false)
  | flatMap()
  | BAR
  | view
}

Feel free to comment the | BAR line, or other lines before, to see what’s the output at the time. The dump channel operator is useful for this type of debugging :wink:

Thank you.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.