Hi there,
I work with patient data (whole exome, RNA), where there can be multiple samples from different time points.
Patient_ID | Sample name | DNA_N_R1 | DNA_N_R2 | DNA_T_R1 | DNA_T_R2 | RNA_T_R1 | RNA_T_R2 |
---|---|---|---|---|---|---|---|
patient1 | patient1-3 | DNA1_N_R1 | DNA1_N_R2 | DNA_T_T01_R1 | DNA_T_T01_R2 | RNA_T_T01_R1 | RNA_T_T01_R2 |
patient1 | patient1-4 | DNA1_N_R1 | DNA1_N_R2 | DNA_T_T02_R1 | DNA_T_T02_R2 | RNA_T_T02_R1 | RNA_T_T02_R2 |
patient1 | patient1-5 | DNA1_N_R1 | DNA1_N_R2 | DNA_T_T03_R1 | DNA_T_T03_R2 | RNA_T_T03_R1 | RNA_T_T03_R3 |
the normal in DNA will remain constant but tumor will vary.
There’s no RNA normal, only tumor.
How do I create any channel/data structure to manage this type?
Earlier I was working with one row per patient and was under the impression unique tumor, normal DNA samples and only one timepoint for RNA for which I had following code:
Channel.fromPath( csv_file )
.splitCsv( )
.multiMap { row →
def rna_reads=tuple(file(row[5]), file(row[6]))
def tumor_reads = tuple( file( row[3]) ,file(row[4]))
def normal_reads = tuple(file(row[1]),file(row[2]) )tumor: tuple( row[0], tumor_reads ) normal: tuple( row[0], normal_reads ) rna: tuple(row[0],rna_reads) }.set{samples}
Then they can be sent for analysis:
align(both.out.reads_tumor) ariba(align.out.aligned_star) markduplicates(align.out.aligned_star) featurecounts(markduplicates.out.markduplicate_bam)
This worked fine with following input data:
Patient_ID | DNA_N_R1 | DNA_N_R2 | DNA_T_R1 | DNA_T_R2 | RNA_T_R1 | RNA_T_R2 |
---|---|---|---|---|---|---|
sample1 | DNA1_N_R1 | DNA1_N_R2 | DNA1_T_T01_R1 | DNA1_T_T01_R2 | RNA1_T_T01_R1 | RNA1_T_T01_R2 |
sample2 | DNA2_N_R1 | DNA2_N_R2 | DNA2_T_T02_R1 | DNA2_T_T02_R2 | RNA2_T_T02_R1 | RNA2_T_T02_R2 |
sample3 | DNA3_N_R1 | DNA3_N_R2 | DNA3_T_T03_R1 | DNA3_T_T03_R2 | RNA3_T_T03_R1 | RNA3_T_T03_R3 |