In a pipeline to align NGS data I have a helper method to create the string that will be used to inform the Read Group line to the aligner.
Usually I run the pipeline for several dozens of samples (even hundreds) simultaneously.
The issue is that, for some samples, the RG line appears with duplicate parts.
The method I’m using is:
def generate_rg_line(def meta) {
rg_line = "@RG"
rg_line = "${rg_line}\\tID:${meta.id}"
rg_line = "${rg_line}\\tSM:${meta.Sample}"
rg_line = "${rg_line}\\tLB:${meta.Library}"
rg_line = "${rg_line}\\tPU:${meta.PlatformUnit}"
rg_line = meta.Center ? "${rg_line}\\tCN:${meta.Center}" : rg_line
rg_line = meta.Description ? "${rg_line}\\tDS:${meta.Description}" : rg_line
rg_line = meta.Date ? "${rg_line}\\tDT:${meta.Date}" : rg_line
rg_line = meta.FlowOrder ? "${rg_line}\\tFO:${meta.FlowOrder}" : rg_line
rg_line = meta.Program ? "${rg_line}\\tPG:${meta.Program}" : rg_line
rg_line = meta.Platform ? "${rg_line}\\tPL:${meta.Platform}" : rg_line
rg_line = meta.PlatformModel ? "${rg_line}\\tPM:${meta.PlatformModel}" : rg_line
rg_line = meta.Barcode ? "${rg_line}\\tBC:${meta.Barcode}" : rg_line
rg_line = meta.KeySequence ? "${rg_line}\\tKS:${meta.KeySequence}" : rg_line
rg_line = meta.InsertSize ? "${rg_line}\\tPI:${meta.InsertSize}" : rg_line
return rg_line
}
And I found generated files with the following data:
@RG ID:555555 SM:ABCDEF LB:GHIJK PU:HHHHHDSXC_4_1 CN:CENTER DS:Homo sapiens DT:2024-10-11T14:15:17 PL:ILLUMINA PM:NS6000 DS:Homo sapiens DT:2024-10-11T14:15:17 PL:ILLUMINA PM:NS6000
You can see that the first half is fine, but then we have a second copy of DS:Homo sapiens DT:2024-10-11T14:15:17 PL:ILLUMINA PM:NS6000
, which is the same as before and also not a whole copy, just four tags.
Any idea of why I could be getting this weird behavior?