I’m encountering a strange error. I have two compressed files that default to being pulled from a remote server (the same server, NCBI), but can optionally be specified with a command-line parameter. They are then run through processes that extract a specified list of files (in parallel, presumably). If both archive files are left to be pulled remotely, they both fail with the error Unable to stage foreign file.,..Cause: Unable to access path (this is the most informative error I’ve been able to find). However, if either one of the files is staged locally and passed as a command-line parameter, the workflow proceeds with no problem. I’ve been able to reproduce this in both the latest release and the latest edge release.
Any ideas on what might be going on here?
I’ve included a short workflow at the end of this that reproduces the problem. I can reproduce the problem by running the attached workflow like this:
$ nextflow run stage_fail.nf
But it won’t fail if I give it one of the expected files locally:
$ curl -LO https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz
$ nextflow run stage_fail.nf --blast-taxdb taxdb.tar.gz
or
$ curl -LO https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.zip
$ nextflow run stage_fail.nf --ncbi-taxdump new_taxdump.zip
This looks like it might have something to do with concurrency, although I can’t quite figure out why. If I reduce the number of files each process is asked to decompress I get no errors (in the workflow reduce ntd_files and/or tdb_files to one element).
reprex (stage_fail.nf):
#!/usr/bin/env nextflow25
nextflow.enable.dsl=2
process extract_ncbi_taxdb {
input:
tuple path(archive), val(to_extract)
output:
path(to_extract), emit: file
path(archive), emit: zip
script:
"""
gunzip -c ${archive} | tar x ${to_extract}
"""
}
process extract_ncbi_taxonomy {
input:
tuple path(zipfile), val(f)
output:
path(f), emit: file
path(zipfile), emit: zip
script:
"""
unzip -p ${zipfile} ${f} > ${f}
"""
}
params.blastTaxdb = 'https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz'
params.ncbiTaxdump = 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.zip'
workflow {
def ntd_files = ['merged.dmp','nodes.dmp','taxidlineage.dmp','rankedlineage.dmp']
Channel.fromPath(params.ncbiTaxdump,glob:false) |
combine(Channel.of(ntd_files).flatten()) |
extract_ncbi_taxonomy
def tdb_files = ['taxdb.bti','taxdb.btd','taxonomy4blast.sqlite3']
Channel.fromPath(params.blastTaxdb,glob:false) |
combine(Channel.of(tdb_files).flatten()) |
extract_ncbi_taxdb
}