Looking for Guidance on Improving Seqera Workflow Performance

Hello Everyone :hugs:,

I’ve only recently started using Seqera, and right now I’m trying to streamline my process in preparation for a big genomics project. I was hoping the community could provide some insight since I’ve been having some problems with performance.

To put things in perspective, my method entails several data processing stages, such as variant calling, annotation, and alignment. My server setup is rather powerful, but I’ve been getting slower compared to predicted performance, especially during the variant calling and alignment stages. My project’s timeframe has been disrupted and processing took longer than expected as a result.

Here are some details of my configuration:

  • Server Specifications: [Add pertinent information regarding RAM, CPU, and storage]
  • Workflow Specifics: [Give a summary of the workflow’s elements, including the tools and their respective versions.]
  • Data Volume: [Explain the quantity and nature of the processed data]

Although I’ve already experimented with a few optimisation techniques, such resource allocation adjustments and job parallelisation, the results have only been somewhat better. If you have any further advice or best practices for maximising Seqera performance, please share it with me.

In particular, I’m curious about:

  • Resource Allocation: What configurations are suggested to get the best results on certain kinds of tasks? :thinking:
  • Workflow Optimisation: How can complicated workflows be made more efficient to cut down on processing time? :thinking:
  • Troubleshooting: What are some ways to find and fix bottlenecks in my workflow? :thinking:

I also followed this :point_right: https://www.researchgate.net/publication/325560176_Building_a_scientific_workflow_framework_to_enable_real-time_machine_learning_and_visualization_real-time_framework_for_scientific_workflows_power_apps

Any guidance or recommendations that you could offer would be very valued. Furthermore, I would also appreciate any helpful links or supporting materials you may suggest.

Thank you :pray: in advance.

Hello @Kelly_Gloria,

Running your pipeline and then optmizing

If you run a pipeline at least once on Seqera Platform, a bulb will appear next to the pipeline name in the launchpad (in list view). See image below.

The empty bulbs indicate that the configuration of those pipelines (resource allocation requests) can be optimized automatically by Seqera’s AI. The bulb with a checkmark inside shows this has already been done.

Clicking on the empty bulb will open a dialog with multiple opportunities for optimization, as you can see below.

If you click on the Optimized configuraton tab, you will see a Nextflow configuration file that will be added to your pipeline run with higher priority, i.e. overwriting the original resource allocation requests. Example below:

This is the easiest and most practical way to optimize your pipeline on Seqera Platform. Maybe you’re asking where this comes from! If you check a pipeline run, you will see multiple charts at the bottom. In the image below, look at how some processes (boxplot of all instances, tasks, of that specific process) don’t use even half of the number of CPUs requested (e.g. CAT_FASTQ and GUNZIP_GT).

QUALIMAP_RNASEQ, on the other hand needs more CPUs. The same analysis can be done with other resources, such as memory in the image below. Make sure you click on the % Allocated and % RAM Allocated tabs if you want to do this analysis.

That’s part of how the AI works, and you’re free to manually tweak your resource requests based on these plots if you want.

Optimizing before the first run on Seqera Platform

If you’re building your Nextflow pipeline from scratch, I encourage you to check the curated assets the nf-core community provides. There are over 1200 curated modules (processes that are easy to plug into your Nextflow pipeline). Even if you want to do something slightly different, you can make a good guess on the resources you need based on the resources that a similar module is requesting.

Workflow optimization and troubleshooting

These last two questions are more challenging to answer without looking at your specific pipeline. We have some resources teaching you how to do these things. I will link three pieces below: