Hi there,
I’ve multi-columnar CSV file as:
MM-0245,T-01,/path/foo/DNA-N-01-01_joined_R1.fastq.gz,/path/foo/DNA-N-01-01_joined_R2.fastq.gz,DNA-T-01-01_L003_R1_001.fastq.gz,/path/foo/DNA-T-01-01_L003_R2_001.fastq.gz,/path/foo/RNA-T-01-01_L003_R1_001.fastq.gz,/path/foo/RNA-T-01-01_L003_R2_001.fastq.gz
MM-0245,T-02,/path/foo/DNA-N-01-01_joined_R1.fastq.gz,/path/foo/DNA-N-01-01_joined_R2.fastq.gz,DNA-T-02-01_joined_R1.fastq.gz,/path/foo/DNA-T-02-01_joined_R2.fastq.gz,/path/foo/RNA-T-02-01_L003_R1_001.fastq.gz,/path/foo/RNA-T-02-01_L003_R2_001.fastq.gz
MM-0245,T-03,/path/foo/DNA-N-01-01_joined_R1.fastq.gz,/path/foo/DNA-N-01-01_joined_R2.fastq.gz,DNA-T-02-01_joined_R1.fastq.gz,/path/foo/DNA-T-02-01_joined_R2.fastq.gz,NA,NA
It follows structure as: patient,timepoint,normal_WES_R1,normal_WES_R2,tumor_WES_R1,tumor_WES_R2,RNA_R1,RNA_R2
There can be multiple patients in CSV file, a patient can have multiple samples/timepoints as in the example.
However, not all the time trios (DNA-normal, DNA-tumor and RNA) would be present. In the e.g. above MM-0245,T-03 has NA in rna forward read and reverse read
How do I avoid any errors when processing reaches NA?
I’ve main.nf as:
workflow {
if (params.analysis=="both"){
wes()
rna()
}
if (params.analysis=="wes"){
wes()
}
if (params.analysis=="rna"){
rna()
}
}
I’ve rna.nf as:
include { arriba} from '../modules/rna/arriba.nf'
include { fastp_rna} from '../modules/rna/fastp_rna.nf'
workflow rna {
def csvFile = params.input_csvFile
Channel.fromPath( csvFile )
.splitCsv( )
.map { row ->
def patient_info = row[0]
def sample_info=row[1]
def normal_reads = tuple((row[2]),(row[3]))
def tumor_reads = tuple((row[4]), (row[5]))
def rna_reads = tuple((row[6]), (row[7]))
return [patient: patient_info, sample:sample_info,normal: normal_reads, tumor: tumor_reads, rna: rna_reads ]
}
.set { samples }
fastp_rna(samples)
}
I’ve rna - fastp.nf as:
process fastp_rna {
conda '/data1/software/miniconda/envs/MMRADAR/'
maxForks 3
debug true
errorStrategy 'retry'
maxRetries 2
label 'low_mem'
publishDir path: "${params.outdir}/${patient_id}/${sample_id}/RNA/fastp/tumor/", mode: 'copy', pattern: '*_T*'
input: tuple val(patient_id), val(sample_id),
path(normal_reads, stageAs: 'fastp_normal_reads/*'),
path(tumor_reads, stageAs: 'fastp_tumor_reads/*'),
path(rna_reads, stageAs: 'rna_reads/*')
output:
tuple val(patient_id_tumor), val(sample_id), path("${patient_id_tumor}_trim_{1,2}.fq.gz"), emit: reads_tumor
path("${patient_id_tumor}.fastp.json"), emit: json_tumor
path("${patient_id_tumor}.fastp.html"), emit: html_tumor
script:
patient_id_tumor=patient_id+"_T"
def(r1_tumor,r2_tumor)=rna_reads
"""
/data1/software/miniconda/envs/MMRADAR/bin/fastp --in1 "${r1_tumor}" --in2 "${r2_tumor}" \
-q 20 -u 20 -l 40 --detect_adapter_for_pe --out1 "${patient_id_tumor}_trim_1.fq.gz" \
--out2 "${patient_id_tumor}_trim_2.fq.gz" --json "${patient_id_tumor}.fastp.json" \
--html "${patient_id_tumor}.fastp.html" --thread 20
"""
}
workflow.onComplete {
log.info ( workflow.success ? "Done rna fastp!" : "Oops .. something went wrong in rna fastp" )
}
Where do I put check to not process anything for the NA RNA - reads?