Publish AWS batch -> S3

fishcakess · October 15, 2024, 4:06pm

Hello,

Sorry to ask, I’m sure what I’m trying is trivial but I cannot find an example in docs or on the forum.

I’ve got a nextflow script involving Dragen & AWS Batch where the outputs are written directly to the F1 instance I launch. I stream fastq input from S3.

I’ve a very simple config, with an out_dir like so:

params {
    out_dir = 's3://bucket/simpleDragen_out/'
}

and in my main process, it’s referenced like so


process simpleDragen {
 
    publishDir  "${params.out_dir}", mode: 'copy', overwrite: true

    dragen ..... --output-directory ./ .....
    
}

If afterwards, I do the below, I pick up all the inputs and outputs so I can see it exists. However nextflow is not pushing the results automatically. Am I missing a major step?


aws s3 sync ./ s3://bucket/simpleDragen_out/

Any guidance is appreciated, thanks

mribeirodantas · October 15, 2024, 10:15pm

From the Nextflow official documentation on publishDir:

Have you tested how long it takes for the files to appear in the publish directory? It shouldn’t take too long, but some time after the pipeline run is finished is expected.

fishcakess · October 16, 2024, 10:16am

Sorry this is my fault entirely!!

Forgot to define an output so doing the below fixed it. I had thought it automatically picked up what the process produced!

Onwards and upwards

process simpleDragen {
    publishDir  "${params.out_dir}", mode: 'copy', overwrite: true
    output:
    path 'sample*'
    input:
    file fastq1
    file fastq2
    script:
     .............................

mribeirodantas · October 16, 2024, 12:32pm

Oh, exactly.

Can you think of the TBs of data that would be transferred and stored somewhere if Nextflow transferred everything a task produces? 🫨 That’s why we need the output block to:

a) Throw a warning/error if the task doesn’t deliver what’s supposed to be delivered
b) Make sure the specific set of outputs required by the next task will be available for this next task
c) Guarantee only meaningful output is taken into consideration for reports (e.g. with MultiQC) and published in our results folder.

system · October 23, 2024, 12:33pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ERROR ~ Unexpected error while finalizing task 'copyFilesToS3 (1)' - cause: Failed to create publish directory: s3://mybucket/fastq_standard/test Ask for help	0	48	November 15, 2024
Non-aws S3 bucket won't be picked up by nextflow Ask for help nextflow , aws , nf-core	2	37	October 21, 2024
Pipeline not working in AWS Batch because of a fusion problem Ask for help fusion , aws , platform	6	62	April 21, 2025
Stream reference directories from S3 Ask for help	1	22	October 24, 2024
Enabling `-resume` and `-log` on AWS Batch Ask for help aws	1	49	September 18, 2024

Publish AWS batch -> S3

Related topics