Reports section not displaying certain outputs

I have followed the guidance here to configure my tower.yml file to display the most relevant outputs of my custom pipeline in the reports section. After completion of the workflow, only the outputs from one process appear in the reports, and despite multiple attempts to reconfigure the tower.yml the files from the first process never appear.

Process 1
Runs a custom python script and generates multiple csv and png files in a data directory created in the present working directory. The output is specified as such.

output:
path "data", emit: data_dir

Process 2
Runs another python script that consolidates all of the csv files from the data_dir in the previous process into a single xlsx file.

input:
path data_dir

output:
path "data/compiled_data.xlsx"

tower.yml file

reports:
  "compiled_data.xlsx":
    display: "Compiled excel file"
  "*.csv":
    display: "CSV Output"
  "*.png":
    display: "Plots"

As of now, only compiled_data.xlsx appears in the reports section. Both processes have the publishDir set to the outdir param and process outputs are present in aws s3 in {params.outdir}/data after the run. I’ve tried writing the full path to the process 1 outputs as part of the output directive and in tower.yml, but cannot get those outputs to appear in the reports.

Any advice would be greatly appreciated.

-Jack

Give it a try to **/*.csv pattern.

No luck changing it to

reports:
  "compiled_data.xlsx":
    display: "Compiled excel file"
  "**/*.csv":
    display: "CSV Output"
  "**/*.png":
    display: "Plots"

I think that at some point, there was a bug fix related to this on Nextflow; give it a try to the latest Nextflow.

Also, if you define the output like this, the result may be different:

output:
path "data/*", emit: data_dir

Hi JT

The glob patterns in tower.yml are matched against the files or directories listed as outputs in the process.

Your process #1 emits the directory and not the directory and nested CSV files, so the only path being tested against the glob pattern is “data/” which does not match “*.csv”.

As Jordi suggested, switching the output pattern to “data/*.csv” now emits all of the csv files rather than just their parent directory and these will match the glob pattern in tower.yml.

Your downstream process might have to change the input pattern to something like the example shown below if your process expects the files to be written to a directory “data”:

input:
path "data/*"

output:
path "data/compiled_data.xlsx"

There is an open PR to recursively walk the publication directory, but this implementation has some drawbacks that would need to be addressed before merging into Nextflow proper.

1 Like

Hey Rob,

This line in the documentation is what I think was throwing me as I was writing my regexes with respect to the contents of the publishDir rather than what was listed in the output directive by the process.

Only the published files (using the Nextflow publishDir directive) are candidate files for Seqera reports. The path pattern is used to match published files to a report entry.

I had previously tried changing the output pattern in process #1 to data/*.csv and also listing the filepaths explicitly, e.g. data/sample_n.csv to no avail.

While this works for recognizing reports:

output:
path "data/*"

This does not:

output:
  path "data"
  path "data/*"

And so all of my previous attempts to change the output paths that included emitting “data” as the first output failed.