Nextflow 64 kb file size limit


I may need to convert a Nextflow DSL 1 pipeline to DSL2. I read that there is a 64 kB file size limit for DSL2, and that the size limit is imposed by the Groovy shell. I searched for additional information and read that Java methods have a 64 kB size limit but nothing about Groovy shells.

Are these the same limit?

If so, I wonder what constitutes a Java method in the context of Nextflow? Is it a file? Or is it a Groovy function? Or is it a Nextflow process? Or something else?

If functions and processes are methods, is it reasonable to assume that individual methods in a file cannot exceed 64 kB but that there is no limit to the sum of the individual method sizes in a file?

I ask because my DSL1 scripts are considerably larger than 64 kB and I want to not break the file up because that creates other complications.

Thank you!

Nextflow uses the Groovy compiler to execute Nextflow scripts and is subject to a 64KB size limit per script. I don’t remember right now where exactly the limit comes from or why it was introduced by DSL2. We may find a way around it one day, but for now the best thing you can do is split up your scripts into modules.

The hardest part of migrating to DSL2 is usually moving all of the channel logic into workflows, but after that, splitting the code into modules is pretty straightforward. I’m curious what complications you’re talking about?

Hi Ben,

Thank you for the background information.

I infer that you recommend making each module invokable as a workflow rather than as functions or processes.

The complication that concerns me is that the script uses functions inserted between processes to ‘condition’ the information for the downstream processes. The conditioning consists primarily of checking that the expected files are in the channel, combining the paths with additional information required for processing, which are gathered from global Groovy objects presented as function arguments - maps and scalars, and recombines them into tuples suitable for injection into a channel.

Based on your suggestions, I envision packaging one process per module so a module consists of one or more conditioning functions, a process, and a workflow. The workflow is invoked in the main NF file with the required channels as input and ‘returns’ one of more channels. So the main NF files contains a workflow consisting of a cascade of workflows, as well as the Groovy objects that are constructed on startup. This results in 37 module files plus the main file. Hmm.

Do you see flaws in this approach?

I appreciate your feedback.

Thank you.

As a first iteration, I would simply move each process into a separate module and keep everything else in the main script. So you would have one giant workflow which creates any global state, calls the processes, and calls the glue logic between them. That should at least get you under the 64kb limit and then you can decide how much more you want to refine.

You can always break things up into subworkflows as much as you want, but subworkflows are not strictly necessary. You can decide whether it’s worthwhile to factor out subworkflows, i.e. if it makes the code easier to read vs increasing the number of separate files to manage.

Hi Ben,

I appreciate your guidance!

Thank you.