More advanced learning of nf-core and nextflow

Hello, I have a very general question. I’ve been using nextflow/nf-core for the last few months and have developed several pipelines, it’s amazing how useful and great this is so thank you very much to all the developers. I’ve been learning looking at the boards, recently the Nextflow Hackathon, and this forum and the slack channel. I think at this stage I am not a beginner anymore, can get things to work to a point, but I do feel like I am missing greater understanding: e.g. how does Nextflow stage files in the task work directory, what groovy classes can be used as inputs (can I use a LinkedHashMap), how exactly is parallelization achieved? How have people learned more advanced things about nf-core and nextflow. This is partly as I think with more general yet deeper understanding I wouldn’t bother the list so much!

Hey Ramiro. Great to see your progress and excitement about the community :partying_face:

How does Nextflow stage files in the task work directory?

Based on the input block of a process, instances of this process (tasks) will know what input files must be there for each task and will create a folder for each task and symbolically link all these required files over there. If you have an empty input block for a process, nothing will be symbolically linked in the task work directory.

What Groovy classes can be used as inputs?

Process inputs are ALWAYS Nextflow channels. Inside these channels you can have anything, any value pertaining to any Groovy class. I think that what troubles you is that sometimes you have a list of values and your process expects a single value, not a list of values, but this is not about Groovy classes. It’s about matching adequately what you have, and what you’re saying you have in your Nextflow process.

How exactly is parallelization achieved?

The way you write your logic with the Nextflow language automatically contributes to the implicit parallelization. If you have a command that you want to run for each input, in Nextflow this means you’ll have a channel with N files and a process that runs this command. Feeding this channel to this process will generate N tasks and Nextflow will try to run them in parallel.

How have people learned more advanced things about nf-core and Nexflow?

Practice. The more you write Nextflow code and run, the more different situations you’ll run into, which will contribute to improve your knowledge. Participating in the community also helps, as you’ll be writing nf-core modules, contributing to nf-core subworkflows, pipelines and so on. There are advanced training material too, such as here, but practice is the way :slight_smile:

2 Likes

To throw in a couple of extras, for

  1. If you really want to understand the nitty gritty of staging, I found it very helpful to study a task’s .command.run file in it’s work directory. This is what nextflow runs when the task spins up and how it creates/prepares everything for executing the .command.sh.

  2. Point 1 can also sort of help you with understanding parallelism (if you’re referring to executing multiple processes simultaneously), as you see what information nextflow embeds into a HPC batch script (for example), and also what other information nextflow is recording for tracking etc.

  3. What Marcel said: write lots of nextflow code, the more workflows you write, the more edge cases you’ll come across that will require you to get a better understanding of a particular aspect of nextflow - either by experimentation or asking the community :slight_smile:

4 Likes

Agreed. Practice/Application is the key to getting a better understanding. Use it whenever you can. Answer questions in the community spaces and try things out.

One skill I’ve developed over time is the ability to reduce a problem into the key things I’m trying to investigate. Minimal reproducible examples. I use my Nextflow Sandbox nearly daily to explore some new aspect of Nextflow I’m trying to figure out.

4 Likes

Thank you very much for this!!! I see that continuing developing workflows is the way - I also take the Nextflow Sandbox and the advance training as great resources as well!

1 Like