Method to generate a ReadGroup string

In a pipeline to align NGS data I have a helper method to create the string that will be used to inform the Read Group line to the aligner.
Usually I run the pipeline for several dozens of samples (even hundreds) simultaneously.
The issue is that, for some samples, the RG line appears with duplicate parts.
The method I’m using is:

def generate_rg_line(def meta) {
    rg_line = "@RG"
    rg_line = "${rg_line}\\tID:${meta.id}"
    rg_line = "${rg_line}\\tSM:${meta.Sample}"
    rg_line = "${rg_line}\\tLB:${meta.Library}"
    rg_line = "${rg_line}\\tPU:${meta.PlatformUnit}"
    rg_line = meta.Center        ? "${rg_line}\\tCN:${meta.Center}"        : rg_line
    rg_line = meta.Description   ? "${rg_line}\\tDS:${meta.Description}"   : rg_line
    rg_line = meta.Date          ? "${rg_line}\\tDT:${meta.Date}"          : rg_line
    rg_line = meta.FlowOrder     ? "${rg_line}\\tFO:${meta.FlowOrder}"     : rg_line
    rg_line = meta.Program       ? "${rg_line}\\tPG:${meta.Program}"       : rg_line
    rg_line = meta.Platform      ? "${rg_line}\\tPL:${meta.Platform}"      : rg_line
    rg_line = meta.PlatformModel ? "${rg_line}\\tPM:${meta.PlatformModel}" : rg_line
    rg_line = meta.Barcode       ? "${rg_line}\\tBC:${meta.Barcode}"       : rg_line
    rg_line = meta.KeySequence   ? "${rg_line}\\tKS:${meta.KeySequence}"   : rg_line
    rg_line = meta.InsertSize    ? "${rg_line}\\tPI:${meta.InsertSize}"    : rg_line

    return rg_line
}

And I found generated files with the following data:

@RG     ID:555555       SM:ABCDEF       LB:GHIJK   PU:HHHHHDSXC_4_1 CN:CENTER DS:Homo sapiens DT:2024-10-11T14:15:17  PL:ILLUMINA     PM:NS6000       DS:Homo sapiens DT:2024-10-11T14:15:17  PL:ILLUMINA     PM:NS6000

You can see that the first half is fine, but then we have a second copy of DS:Homo sapiens DT:2024-10-11T14:15:17 PL:ILLUMINA PM:NS6000, which is the same as before and also not a whole copy, just four tags.

Any idea of why I could be getting this weird behavior?

Pretty sure this is caused by not defining your rg_line variable as scoped locally to the function. So different function calls will all write to the same global variable. Try defining rg_line as a local variable: def rg_line = "@RG"

Make sense, but this is not explained anywhere in the docs, right? It looks like a Groovy thing, and I’ve been told I don’t need Groovy to get good pipelines in Nextflow :-/

BTW, this seems to have solved the issue, because so far I could not see that issue repeating again :slight_smile: Hope this was the root issue! :slight_smile: Thanks!

It almost certainly was the root cause. It’s not that well documented, but there is some information here: Caching and resuming — Nextflow documentation

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.