Publish AWS batch -> S3

Hello,

Sorry to ask, I’m sure what I’m trying is trivial but I cannot find an example in docs or on the forum.

I’ve got a nextflow script involving Dragen & AWS Batch where the outputs are written directly to the F1 instance I launch. I stream fastq input from S3.

I’ve a very simple config, with an out_dir like so:

params {
    out_dir = 's3://bucket/simpleDragen_out/'
}

and in my main process, it’s referenced like so


process simpleDragen {
 
    publishDir  "${params.out_dir}", mode: 'copy', overwrite: true

    dragen ..... --output-directory ./ .....
    
}

If afterwards, I do the below, I pick up all the inputs and outputs so I can see it exists. However nextflow is not pushing the results automatically. Am I missing a major step?


aws s3 sync ./ s3://bucket/simpleDragen_out/

Any guidance is appreciated, thanks :slight_smile:

From the Nextflow official documentation on publishDir:

Have you tested how long it takes for the files to appear in the publish directory? It shouldn’t take too long, but some time after the pipeline run is finished is expected.

1 Like

Sorry this is my fault entirely!!

Forgot to define an output so doing the below fixed it. I had thought it automatically picked up what the process produced!

Onwards and upwards

process simpleDragen {
    publishDir  "${params.out_dir}", mode: 'copy', overwrite: true
    output:
    path 'sample*'
    input:
    file fastq1
    file fastq2
    script:
     .............................

Oh, exactly.

Can you think of the TBs of data that would be transferred and stored somewhere if Nextflow transferred everything a task produces? 🫨 That’s why we need the output block to:

a) Throw a warning/error if the task doesn’t deliver what’s supposed to be delivered
b) Make sure the specific set of outputs required by the next task will be available for this next task
c) Guarantee only meaningful output is taken into consideration for reports (e.g. with MultiQC) and published in our results folder.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.