Avoiding clobbers when process input and output file have same name

I have a process that ingests a user-provided file and generates a new output based on a user-provided sample_id tag. However this can lead to the input and output files having the same name and this causes clobbering. I’m leery of renaming the input file because I don’t know how that would interact with cache and resume.

Is there a convenient way to ensure that an input file won’t clobber the output, perhaps by giving it some sort of permanent temporary filename or something like that?

You can use ‘name’ or ‘stageAs’ to have an input named some specific way in the task work directory. For example:

path query_file, name: 'query.fa'

or, using a shorter syntax:

path 'query.fa'

You can read more about this in here.

That makes sense, thank you!

Actually, I just realized my need is slightly more complex. The program I’m calling parses the filename extension to determine if the input is gzipped. I therefore would need to stage the file in a way that takes the given input name into account. For example if the user supplies input.txt.gz, then I’d like to stage it as something like “__temp__input.txt.gz”. Is this possible?

Does the snippet below work as a solution to what you have?

process FOO {
  path ifile

  path ifile

  mv ${ifile} __tmp__${ifile}
  echo ./do_something_with __tmp__${ifile} --output ${ifile}
  touch ${ifile}

workflow {
    | FOO

Task directory:

tree work/66/248471eedbbc030568510c47fcc7c5/
├── __tmp__foo.txt -> /Users/mribeirodantas/foo.txt
└── foo.txt

1 directory, 2 files

Bash has a -C flag that prevents clobbering.

You can do:

process.shell = ['/bin/bash', '-Ceuo', 'pipefail']

for all processes, or in your bash code.

set -C