How to deal with ambguous outputs of a command-line process?

rebeccasenft · May 28, 2024, 1:18am

Hi,

I’m pretty new to Nextflow and nf-core and my ultimate goal is to contribute a simple module for a tool I use a lot that does not exist as a module yet (CellProfiler).

I’m trying to figure out how to deal with the fact that my tool has unpredictable outputs that depend on a user-supplied pipeline file. Depending on this file, many different extensions could be used for image and csv output. And very different folder structuring that ideally I would want to preserve. Is the only way to deal with this to include all possible file extensions as separate optional outputs?

Rebecca

FriederikeHanssen · May 28, 2024, 5:24am

Hi Rebecca,

yes this is generally what we do in nf-core. This also allows users of the module to explicitly access certain outputs for the next step unambiguously. Here is an example: modules/modules/nf-core/antismash/antismashlite/main.nf at eabe5808d97ccacdd694b9ce90af4bca47ddc54e · nf-core/modules · GitHub You can see that in some instances outputs are combined, where it makes sense.

Do you have a list of any possible output? I don’t know the tool itself to give some more detailed advice.

Hope this helps

rebeccasenft · May 28, 2024, 1:18pm

Thanks for the reply and for the example!

The outputs in this case would typically be csvs, pngs, tifs, tiffs, SQLite and possibly txt. I understand accounting for the possibility of each extension and I think this is what I’ve gone with so far (still testing). The tool allows users to customize a file structure as well (e.g., /<image_name>/cells.csv). Is there any way to preserve this structure? If not, is there a way to standardize a known file structure given metadata from the samplesheet? For instance, if I want nextflow to process in parallel for each image set but store results grouped by the name of the plate and well the image came from (and assume this is also a column in the sample sheet), is there a way to easily do that?

Thanks again!
Rebecca

FriederikeHanssen · June 5, 2024, 4:02pm

Hi Rebecca,

apologies for the delay.
The results directory organization is handled separately from the how data from the output directive is passed through channels.

We specify how the output of a process is published in the modules.config. You can publish it all in one location, split it into several depending on some condition, and split it up by some meta information, foe example.

Here are some examples:

For FastP here we publish the logs in one subdirectory of the results, and the trimmed FastQ files (if enabled) in another.

For Strelka here we publish in different subfolders based on the sample name, etc.>

So you can tinker with the results directory and where which files goes as needed.

Topic		Replies	Views
Custom filepaths for workflow-level output Ask for help nextflow	3	74	June 21, 2024
Optional publishable output? Ask for help	7	113	July 2, 2024
DRY Principle in Nextflow: Reusing Output Path Definitions in `output:` and `script:` sections Ask for help nextflow	3	37	June 18, 2025
Running workflow on multiple samples Ask for help nextflow	4	307	August 12, 2024
Nf-core modules best practice Ask for help nf-core	6	491	February 13, 2024

How to deal with ambguous outputs of a command-line process?

Related topics