I’ve a thread earlier (link in the end) but it’s messed up now, so creating a new one.
The input data is multi columnar separated by comma:
batch | timepoint | normal_WES_R1 | normal_WES_R2 | tumor_WES_R1 | tumor_WES_R2 | RNA_R1 | RNA_R2 |
---|---|---|---|---|---|---|---|
batch1 | sample11 | /path/wes/wes_normal_sample11_R1.fastq.gz | /path/wes/wes_normal_sample11_R2.fastq.gz | /path/wes/wes_tumor_sample11_R1.fastq.gz | /path/wes/wes_tumor_sample11_R2.fastq.gz | /path/rna/rna_tumor_sample11_R1.fastq.gz | /path/rna/rna_tumor_sample11_R2.fastq.gz |
batch1 | sample1 | /path/wes/wes_normal_sample1_R1.fastq.gz | /path/wes/wes_normal_sample1_R2.fastq.gz | /path/wes/wes_tumor_sample1_R1.fastq.gz | /path/wes/wes_tumor_sample1_R2.fastq.gz | /path/rna/rna_tumor_sample1_R1.fastq.gz | /path/rna/rna_tumor_sample1_R2.fastq.gz |
batch2 | sample2 | /path/wes/wes_normal_sample2_R1.fastq.gz | /path/wes/wes_normal_sample2_R2.fastq.gz | /path/wes/wes_tumor_sample2_R1.fastq.gz | /path/wes/wes_tumor_sample2_R2.fastq.gz | /path/rna/rna_tumor_sample2_R1.fastq.gz | /path/rna/rna_tumor_sample2_R2.fastq.gz |
batch3 | sample3 | /path/wes/wes_normal_sample3_R1.fastq.gz | /path/wes/wes_normal_sample3_R2.fastq.gz | /path/wes/wes_tumor_sample3_R1.fastq.gz | /path/wes/wes_tumor_sample3_R2.fastq.gz | NA | NA |
batch3 | sample4 | /path/wes/wes_normal_sample4_R1.fastq.gz | /path/wes/wes_normal_sample2_R4.fastq.gz | /path/wes/wes_tumor_sample4_R1.fastq.gz | /path/wes/wes_tumor_sample4_R2.fastq.gz | /path/rna/rna_tumor_sample4_R1.fastq.gz | /path/rna/rna_tumor_sample4_R2.fastq.gz |
That is batch,timepoint,normal_WES_R1,normal_WES_R2,tumor_WES_R1,tumor_WES_R2,RNA_R1,RNA_R2
There can be missing seventh and eighth column, not all samples may have RNA sequenced.
There are two workflows: RNA and WES based on data type.
There are two scenarios:
-
Process WES - normal and tumor regardless when RNA is present or not
-
process RNA when RNA data are available. In the given example batch3,sample3 will be skipped. But I need to process WES (tumor and normal) files for this sample.
I’ve tried following code that uses map and branch:
def createTupleOrString(fileString) {
if (fileString == "NA") {
return "NA"
} else {
return file(fileString)
}
}
Channel.fromPath(file("temp_timestamp_NA.csv"))
.splitCsv(sep: ',').map{ row ->
// Extract relevant information
def batch_info = row[0]
def time_point=row[1]
def normal_reads = tuple((row[2]),(row[3]))
def tumor_reads = tuple((row[4]), (row[5]))
def rna_reads = tuple(createTupleOrString(row[6]), createTupleOrString(row[7]))
[
[type: "tumor", data: tumor_reads],
[type: "normal", data: normal_reads],
[type: "rna", data: rna_reads]
]
}.branch { type, reads ->
tumor: type == "tumor"
normal: type == "normal"
rna: type == "rna" && reads.data[0] != 'NA'
}.set {hello}
// hello.tumor.view { "$it is a tumor" }
hello.rna | view { "Tumor: $it"}
I get error as:
ERROR ~ Invalid method invocation
doCall
with arguments:
I cannot see/view the variable hello created.
I do not know how batch_info
and timepoint
metadata are stored/collected and sent as I’m unable see the content of hello variable.
Sorry for a redundant post.
Link of older post: How to handle NA in file/path in map and process?