Help including function from separate "utils.nf" into main.nf with DSL2

Hello,

I have written a function (below) to help manage inputs for my nextflow DSL2 workflow. It works how I want, but I’d ideally like to put it in a separate utils.nf file and then include it in my main.nf.

I have tried numerous things, including “evaluate()” from this discussion but my understanding is that only works for “pure groovy” code and won’t work with my function that uses files() which is nextflow and not groovy.

My question is whether or not it’s possible to move this “resolve” function to a utils type file, and I guess if there’s an alternate approach to my issue of collecting input files that is more idiomatic nextflow.

Thank you for your help,
-Rob

// Function to resolve a list of file paths with a {tag} placeholder for a {value}
// - example input args:
//   - globs = ["data/modern/modern_chr{chr}.vcf.gz", "data/*/archaic_chr{chr}.vcf.gz"]
//   - tag = "{chrom}"
//   - value = "1"
//
// - If all files exist with the given tag replace by the value, then a list of resolved file paths is returned
//   prefixed with the value of the tag
// - example output:
//   - ["1", "data/modern/modern_chr1.vcf.gz", "data/archaic_chr1.vcf.gz"]
//
// - If any glob pattern does not contain the tag, a warning is logged and null is returned
//
// - NOTE: glob patterns may contain wildcards, e.g. "data/*/archaic_chr{chrom}.vcf.gz" and
//         if the path resolves to a single file, it will be used as normal. However, if it resolves to
//         multiple files, a warning is logged and null is returned.
def resolve = { globs, tag, value ->

    // Helper function to resolve a single glob pattern
    def resolve_glob = { glob ->
        if (!glob.contains(tag)) {
            log.error "Glob '${glob}' does not contain expected tag '${tag}', skipping"
            return null
        }

        def matches = files(glob.replace(tag, value))
        if (matches.size() != 1) {
            log.warn "Expected 1 match for '${glob}' with ${tag}=${value}, found ${matches.size()} matches"
            return null
        }
        return matches.first()
    }

    def resolved_files = globs.collect { glob -> resolve_glob(glob) }

    if (resolved_files.any { it == null || !it.exists() }) {
        log.warn "Skipping ${tag}=${value} due to missing or ambiguous files"
        return null
    }
    return [value] + resolved_files
}


// Code using this function
workflow {


    // Create a channel for each autosomal chromosome with the modern and archaic VCFs with their TBI files
    // - if any file is missing for a chromosome, that chromosome will be skipped
    // - each tuple element in the channel will be:
    //   [val(chromosome), path(modern_vcf), path(modern_tbi), path(archaic_vcf), path(archaic_tbi)]
    per_chrom_vcfs = Channel
        .fromList(autosome_num_list)
        .map { chrom ->
            def vcf_paths = [
                params.modern_vcf_glob,
                params.modern_vcf_glob + ".tbi",
                params.arc_vcf_glob,
                params.arc_vcf_glob + ".tbi",
            ]
            resolve(vcf_paths, "{chrom}", chrom)
        }
        .filter { it != null }

}

I should say that I wrote this function because I wanted to be able to support input from a directory structure like the following, but I want to be very flexible and allow users to specify inputs like the following. I was able to get some of the functionality I wanted with .fromFilePairs(), but couldn’t figure out how to nicely a channel of tuples of ["1", "data/modern/modern_chr1.vcf.gz", "data/archaic_chr1.vcf.gz"] for example

modern_vcf_glob:  "example_input/modern_vcfs/*_chr{chrom}.vcf.gz"
arc_vcf_glob:     "example_input/*/archaic_chr{chrom}.vcf.gz"
genetic_map_glob: "example_input/genetic_maps/chr{chrom}.map"
vcf_mask_glob:    "example_input/masks/chr{chrom}_mask.bed"
example_input/
├── archaic_vcfs
│   ├── archaic_chr1.vcf.gz
│   ├── archaic_chr1.vcf.gz.tbi
│   ├── archaic_chr2.vcf.gz
│   ├── archaic_chr2.vcf.gz.tbi
│   ├── archaic_chr3.vcf.gz
│   └── archaic_chr3.vcf.gz.tbi
├── genetic_maps
│   ├── chr1.map
│   ├── chr2.map
│   └── chr3.map
├── masks
│   ├── chr1_mask.bed
│   ├── chr2_mask.bed
│   └── chr3_mask.bed
├── modern_vcfs
│   ├── modern_chr1.vcf.gz
│   ├── modern_chr1.vcf.gz.tbi
│   ├── modern_chr2.vcf.gz
│   ├── modern_chr2.vcf.gz.tbi
│   ├── modern_chr3.vcf.gz
│   └── modern_chr3.vcf.gz.tbi

There’s already a resolve function in Nextflow that’s used with files, so it wouldn’t be a good idea to use that name.

You can include functions in the same way you include process definitions. Define a file under say functions folder, and then

include { my_function } from './functions/utils.nf'

Regarding your solution, I would probably supply the values of {chrom} as a list. Then inside a map, you can use file to take the name and replace the {chrom} part of the string.

    ch_chrs = Channel.fromList(params.chr_list)
        .map { chr ->
            tuple (
                chr,
                file(params.modern_vcf_glob.replace('{chrom}',chr), checkIfExists: true),
                file(params.arc_vcf_glob.replace('{chrom}',chr), checkIfExists: true),
            )
        }
        .view()

where your params.yml looks like:

chr_list:
  - "chr1"
  - "chr2"
  - "chr3"
modern_vcf_glob:  "example_input/modern_vcfs/*.vcf.gz"
arc_vcf_glob:     "example_input/*/archaic_*.vcf.gz"
genetic_map_glob: "example_input/genetic_maps/*.map"
vcf_mask_glob:    "example_input/masks/*_mask.bed"