Stream reference directories from S3

Hello, this is my second question this week but truly lost here.

I’m trying to run a software on aws batch that requires a few reference directories / data packs. We’ve been advised to store these on S3 and stream them to the EC2 using nextflow instead of doing things manually in the process like "aws s3 cp s3://ref_dir ./ "

I cannot work out the best way to feed these into the processes. We did something like below for one of the software using a channel.

Channel.fromPath('s3://bucket/samplesheets/samples.csv')
    .splitCsv(header: true)
    .map {row ->
        def meta = [
            sample_id: row.sample_id,
            condition: row.condition

    ]

    // emit this meta data + an S3 path

    def s3_R1_path = "s3://bucket/fastqs/${row.sample_id}_R1_001.fastq.gz"
    def s3_R2_path = "s3://bucket/fastqs/${row.sample_id}_R2_001.fastq.gz"
    def ref_dir = "s3://bucket/hash/"

    tuple(meta, file(s3_R1_path), file(s3_R2_path), file(ref_dir), file(pcgr_dir))
    }
    .set { sample_channel }

I was striving for something where I could use params but have seen that this is not their intended use, and nextflow wont stream data if its not in the input.

Is the best way to attach the reference dirs to each sample, like in my first channel?

What if I’m combining a few data packs to be used across a few different samples, how can I reference one version out of multiple? (I guess could put in samplesheets)


process ANNOTATE_VCF {


    input: 
    tuple val(meta), path(raw_vcf) 

    script:
    
    software ... --ref_dir ${params.ref_dir}

    
}

Tried this too and had no luck


process ANNOTATE_VCF {


    input: 
    tuple val(meta), path(raw_vcf), path ('s3://bucket/new_dir/ref_dir'}

    script:
    
    software ... --ref_dir ./new_dir/ref_dir

    
}

What you need is something along these lines:

process ANNOTATE_VCF {
    input: 
    tuple val(meta),  path(raw_vcf)
    tuple val(meta2), path(ref_dir)

    script:
    software ... --ref_dir $ref_dir
}

Channel
    .fromPath('s3://bucket/new_dir/ref_dir', type: 'dir')
    .map { dir -> [ [id: 'my_reference'], dir ] }
    .first()
    .set { ch_reference }

ANNOTATE_VCF (
    ch_vcf,
    ch_reference
)