I have followed the guidance here to configure my tower.yml file to display the most relevant outputs of my custom pipeline in the reports section. After completion of the workflow, only the outputs from one process appear in the reports, and despite multiple attempts to reconfigure the tower.yml the files from the first process never appear.
Process 1
Runs a custom python script and generates multiple csv and png files in a data directory created in the present working directory. The output is specified as such.
output:
path "data", emit: data_dir
Process 2
Runs another python script that consolidates all of the csv files from the data_dir in the previous process into a single xlsx file.
As of now, only compiled_data.xlsx appears in the reports section. Both processes have the publishDir set to the outdir param and process outputs are present in aws s3 in {params.outdir}/data after the run. I’ve tried writing the full path to the process 1 outputs as part of the output directive and in tower.yml, but cannot get those outputs to appear in the reports.
The glob patterns in tower.yml are matched against the files or directories listed as outputs in the process.
Your process #1 emits the directory and not the directory and nested CSV files, so the only path being tested against the glob pattern is “data/” which does not match “*.csv”.
As Jordi suggested, switching the output pattern to “data/*.csv” now emits all of the csv files rather than just their parent directory and these will match the glob pattern in tower.yml.
Your downstream process might have to change the input pattern to something like the example shown below if your process expects the files to be written to a directory “data”:
There is an open PR to recursively walk the publication directory, but this implementation has some drawbacks that would need to be addressed before merging into Nextflow proper.
This line in the documentation is what I think was throwing me as I was writing my regexes with respect to the contents of the publishDir rather than what was listed in the output directive by the process.
Only the published files (using the Nextflow publishDir directive) are candidate files for Seqera reports. The path pattern is used to match published files to a report entry.
I had previously tried changing the output pattern in process #1 to data/*.csv and also listing the filepaths explicitly, e.g. data/sample_n.csv to no avail.
While this works for recognizing reports:
output:
path "data/*"
This does not:
output:
path "data"
path "data/*"
And so all of my previous attempts to change the output paths that included emitting “data” as the first output failed.