Nextflow training week questions

Hello,

I have a couple of questions:

  1. What is the difference between naming each module’s nextflow script as main.nf and then importing vs giving each module its own name? In the genomics module, eventually each module was segregated into its own folder with each script named “main.nf”. However, in rnaseq module I see that each file is named after the process itself (eg. fastqc.nf). Are both conventions equally used?

  2. In rnaseq module while the reads are read in using a channel “read_ch = channel.fromPath(params.reads)”, the index file is being simply provided as file “HISAT2_ALIGN(TRIM_GALORE.out.trimmed_reads, file (params.hisat2_index_zip))”. Could you help me understand why both are not read in using Channel?

  3. Under rnaseq/data, I don’t find the index file for the genome. Can I download it from somewhere else?

Thank you,

Asma

Hi @asmariyaz23 , sorry we missed your question!

  1. The use of ‘main.nf’ for modules is mostly used in the context of the nf-core project. Outside of that context, both are valid. I personally prefer giving distinct names to modules, especially in a training context, because it can get quite confusing when you end up with multiple tabs all called ‘main.nf’ in your code editor.

  2. The file() notation is an alternative, lighter way to pass individual files to a process. The primary use of a channel is to load arbitrary amounts of data to be consumed by the process, like input samples for example. The channel has some built-in logic to ensure that the process will be run on each distinct element (sample) in the channel, in parallel if possible. In contrast, something like a reference index file is an accessory resource that you provide consistently to every invocation of the process. There is no such logic needed. Does that make sense?

Thank you! Yes, it does

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.