Manage datasets that are not stored in working directory

Naarsil · November 20, 2025, 11:52am

I’m a beginner in Nextflow. I have the following problem: Nextflow seems to be built for processing data that are in the same directory as the code being run, and as a consequence path object that are absolute are actually turned into relative paths when Nextflow runs.
However, in the HPC cluster I’m using, the rule is to store datasets in a shared dedicated directory, and to run code from a personal directory elsewhere. In consequence, I can’t use paths related to the Nextflow working directory, unless I do something obviously no ideal like ../../../../../../dataset_name.
Could you please explain to me what is the recommended, clean way to access data in my case?
For additional context, I have 2 datasets with the same internal structure that need to be processed using the same code, and I need to access several folders within each dataset by name. For example, I need to access dataset1/wavs, dataset2/segments/*_file.txt and so on. In bash, I would just define the dataset directory as a variable, and then append the subdirectories to it. What is the Nextflow equivalent of that?

Here is a minimal working example of what I’m trying to achieve (not functional):

params.dataset_dir = '/very/long/absolute/path/to/dataset'

process mfccs {
	publishDir "results/mfccs", mode: 'copy'
	input:
	tuple path(dataset_dir), file(segments)
	output:
	file "${segments.baseName}.mat"
	script:
	"""
	my_python_script.py \\
	--segment-file ${dataset_dir}/segments_files/${segments} \\ # absolute path is lost
	--wav-dir "${dataset_dir}/wavs" \\
	--output-file "${segments.baseName}.mat"
	"""
}

workflow {
	segments_files = Channel.fromPath('*_segments.txt')
	segments_ch = segments_files.map { file -> tuple(params.dataset_dir, segments_files) }.view()
	segments_ch | mfccs
}

Thanks in advance for your help!

yinshiyi · November 20, 2025, 11:03pm

once you create the channel factory using your

segments_files = Channel.fromPath("$dataset_dir/*_segments.txt")

you dont need to call $dataset_dir ever again, nextflow will handle the file ingest (using symlink by default reduce storage usage).

I would just ingest

params.wavs = 'very/long/path/dataset1/wavs/'
params.dataset_dir = '/very/long/absolute/path/to/dataset2/segments/'

workflow {
segments_files = Channel.fromPath("${dataset_dir}/*_segments.txt")
wavs = Channel.fromPath("${wavs}/wavs.txt")
}

be careful with single quote and double quotes, single quote does not unpack the variable. The following probably does not work.

segments_files = Channel.fromPath('$dataset_dir/*_segments.txt')

Naarsil · November 21, 2025, 4:03pm

That helped me solve my problem, thank you!

system · November 28, 2025, 4:03pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding relative paths Ask for help nextflow	4	102	May 28, 2025
Changing nextflow work directory Ask for help nextflow	2	387	April 22, 2024
How to pass a directory path as input and build a channel from it in nextflow? Ask for help nextflow	3	592	January 14, 2025
How to connect workflow variables to input Ask for help nextflow	5	44	September 24, 2025
workDir in sarek doesn't get set and get error Process input file target path must be relative Ask for help nextflow , google-cloud	0	57	August 3, 2025

Manage datasets that are not stored in working directory

Related topics