Failure using remote file channels

I’m encountering a strange error. I have two compressed files that default to being pulled from a remote server (the same server, NCBI), but can optionally be specified with a command-line parameter. They are then run through processes that extract a specified list of files (in parallel, presumably). If both archive files are left to be pulled remotely, they both fail with the error Unable to stage foreign file.,..Cause: Unable to access path (this is the most informative error I’ve been able to find). However, if either one of the files is staged locally and passed as a command-line parameter, the workflow proceeds with no problem. I’ve been able to reproduce this in both the latest release and the latest edge release.

Any ideas on what might be going on here?

I’ve included a short workflow at the end of this that reproduces the problem. I can reproduce the problem by running the attached workflow like this:

$ nextflow run stage_fail.nf

But it won’t fail if I give it one of the expected files locally:

$ curl -LO https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz
$ nextflow run stage_fail.nf --blast-taxdb taxdb.tar.gz

or

$ curl -LO https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.zip
$ nextflow run stage_fail.nf --ncbi-taxdump new_taxdump.zip

This looks like it might have something to do with concurrency, although I can’t quite figure out why. If I reduce the number of files each process is asked to decompress I get no errors (in the workflow reduce ntd_files and/or tdb_files to one element).

reprex (stage_fail.nf):

#!/usr/bin/env nextflow25

nextflow.enable.dsl=2

process extract_ncbi_taxdb {
  input:
    tuple path(archive), val(to_extract)
  
  output:
    path(to_extract), emit: file
    path(archive), emit: zip

  script:
  """
  gunzip -c ${archive} | tar x ${to_extract}
  """
}

process extract_ncbi_taxonomy {
  input:
    tuple path(zipfile), val(f)
  output:
    path(f), emit: file
    path(zipfile), emit: zip
  
  script:
  """
  unzip -p ${zipfile} ${f} > ${f}
  """
}

params.blastTaxdb = 'https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz'
params.ncbiTaxdump = 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.zip'

workflow {
  def ntd_files = ['merged.dmp','nodes.dmp','taxidlineage.dmp','rankedlineage.dmp']
  Channel.fromPath(params.ncbiTaxdump,glob:false) |
    combine(Channel.of(ntd_files).flatten()) |
    extract_ncbi_taxonomy  
  
  def tdb_files = ['taxdb.bti','taxdb.btd','taxonomy4blast.sqlite3']
  Channel.fromPath(params.blastTaxdb,glob:false) | 
    combine(Channel.of(tdb_files).flatten()) |
    extract_ncbi_taxdb
}

Can you post this as a Nextflow issue so that we can track it? You aren’t the only one to report issues with file staging, particularly with NCBI, but a reproducible error has proven to be elusive

sure thing!