Little discrepancies between results from normal and its optimized workflow

First i must say that it is truly amazing how https://tower.nf/ is able to orchestrate the slurm jobs :pray: It just took away a lot of pain from work :ok_hand:and kind of magical to see that it could optimize it even more, from

to

incredible, maybe not so much saving in terms of Time and RAM but looking at CPU utilization, amazing to say the least!

Coming to the actual results though, there are some differences i am noticing, like

diff /cluster/projects/nn9036k/scripts/tower_TK60/star_salmon/salmon.merged.gene_lengths.tsv /cluster/projects/nn9036k/scripts/tower_TK60optim/star_salmon/salmon.merged.gene_lengths.tsv | wc
125510 4138758 55790914

looks a lot in pure diff but scatter plot show they are relatively minor in log2 space, for example

image

overall picture looks like following where diagonals are same samples from two different runs and the one circled with red is shown above

I am wondering if full reproducibility is even possible?

Following are the diff of parameters between the two runs:

diff TK60.param TK60optim.param

1,5c1,5
< nextflow run 'https://github.com/nf-core/rnaseq'
<                -name awesome_gilbert
<                -params-file 'https://api.tower.nf/ephemeral/6NJIJTOHyD3bvvODdbfOOw.json'
<                -with-tower
<                -r b89fac32650aacc86fcda9ee77e00612a1d77066
---
> nextflow run 'https://github.com/nf-core/rnaseq' \
>                -name cheesy_raman \
>                -params-file 'https://api.tower.nf/ephemeral/uAQRv6xxE3zGXg4y_VmYwg.json' \
>                -with-tower \
>                -r b89fac32650aacc86fcda9ee77e00612a1d77066 \
18c18
<     "ribo_database_manifest": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/rrna-db-defaults.txt",
---
>     "ribo_database_manifest": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/rrna-db-defaults.txt",
89c89
<             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/GRCh37-blacklist.bed"
---
>             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/GRCh37-blacklist.bed"
101c101
<             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/hg38-blacklist.bed"
---
>             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/hg38-blacklist.bed"
122c122
<             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/GRCm38-blacklist.bed"
---
>             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/GRCm38-blacklist.bed"
356c356
<             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/hg38-blacklist.bed"
---
>             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/hg38-blacklist.bed"
369c369
<             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/hg19-blacklist.bed"
---
>             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/hg19-blacklist.bed"
382c382
<             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/mm10-blacklist.bed"
---
>             "blacklist": "/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/mm10-blacklist.bed"
512c512
<     "outdir": "/cluster/projects/nn9036k/scripts/tower_TK60",
---
>     "outdir": "/cluster/projects/nn9036k/scripts/tower_TK60optim",
522,523c522,523
< /cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/nextflow.config
< /cluster/projects/nn9036k/scripts/nf-1Bpr9p1bhYVOBt.config         
---
> /cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/nextflow.config
> /cluster/projects/nn9036k/scripts/nf-OPLchP8holdFu.config
559c559
<    ribo_database_manifest = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/rrna-db-defaults.txt'
---
>    ribo_database_manifest = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/rrna-db-defaults.txt'
599c599
<    outdir = '/cluster/projects/nn9036k/scripts/tower_TK60'
---
>    outdir = '/cluster/projects/nn9036k/scripts/tower_TK60optim'
637c637
<          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/GRCh37-blacklist.bed'
---
>          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/GRCh37-blacklist.bed'
649c649
<          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/hg38-blacklist.bed'
---
>          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/hg38-blacklist.bed'
670c670
<          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/GRCm38-blacklist.bed'
---
>          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/GRCm38-blacklist.bed'
904c904
<          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/hg38-blacklist.bed'
---
>          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/hg38-blacklist.bed'
917c917
<          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/hg19-blacklist.bed'
---
>          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/hg19-blacklist.bed'
930c930
<          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/e32e93a6/nf-core/rnaseq/assets/blacklists/mm10-blacklist.bed'
---
>          blacklist = '/cluster/projects/nn9036k/scripts/.nextflow/pipelines/ce3c27ec/nf-core/rnaseq/assets/blacklists/mm10-blacklist.bed'
1058,1059c1058,1059
<    errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
<    maxRetries = 1
---
>    errorStrategy = 'retry'
>    maxRetries = 2
1373a1374,1565
>    withName:'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT' {
>       cpus = { 6 * task.attempt }
>       memory = { 6.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN' {
>       cpus = { 12 * task.attempt }
>       memory = { 37.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:PICARD_MARKDUPLICATES' {
>       cpus = { 2 * task.attempt }
>       memory = { 27.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:SAMTOOLS_INDEX' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_BAMSTAT' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_INFEREXPERIMENT' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_INNERDISTANCE' {
>       cpus = { 2 * task.attempt }
>       memory = { 2.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_JUNCTIONANNOTATION' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_JUNCTIONSATURATION' {
>       cpus = { 1 * task.attempt }
>       memory = { 3.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_READDISTRIBUTION' {
>       cpus = { 1 * task.attempt }
>       memory = { 2.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_READDUPLICATION' {
>       cpus = { 1 * task.attempt }
>       memory = { 22.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_FORWARD:UCSC_BEDCLIP' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_FORWARD:UCSC_BEDGRAPHTOBIGWIG' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_REVERSE:UCSC_BEDCLIP' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_REVERSE:UCSC_BEDGRAPHTOBIGWIG' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:BEDTOOLS_GENOMECOV' {
>       cpus = { 2 * task.attempt }
>       memory = { 11.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:DESEQ2_QC_STAR_SALMON' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:DUPRADAR' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC' {
>       cpus = { 2 * task.attempt }
>       memory = { 2.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE' {
>       cpus = { 8 * task.attempt }
>       memory = { 6.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX' {
>       cpus = { 3 * task.attempt }
>       memory = { 19.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT' {
>       cpus = { 2 * task.attempt }
>       memory = { 18.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:MULTIQC' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:MULTIQC_CUSTOM_BIOTYPE' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:CUSTOM_GETCHROMSIZES' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF2BED' {
>       cpus = { 1 * task.attempt }
>       memory = { 4.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF_FILTER' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA' {
>       cpus = { 1 * task.attempt }
>       memory = { 2.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:STAR_GENOMEGENERATE' {
>       cpus = { 5 * task.attempt }
>       memory = { 65.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:QUALIMAP_RNASEQ' {
>       cpus = { 1 * task.attempt }
>       memory = { 16.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_QUANT' {
>       cpus = { 5 * task.attempt }
>       memory = { 12.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_GENE' {
>       cpus = { 2 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_GENE_LENGTH_SCALED' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_GENE_SCALED' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_TRANSCRIPT' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TX2GENE' {
>       cpus = { 1 * task.attempt }
>       memory = { 1.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:TXIMPORT' {
>       cpus = { 1 * task.attempt }
>       memory = { 2.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:STRINGTIE_STRINGTIE' {
>       cpus = { 2 * task.attempt }
>       memory = { 2.GB * task.attempt }
>    }
>    withName:'NFCORE_RNASEQ:RNASEQ:SUBREAD_FEATURECOUNTS' {
>       cpus = { 3 * task.attempt }
>       memory = { 2.GB * task.attempt }
>    }
1414c1606
<    file = 'timeline-1Bpr9p1bhYVOBt.html'
---
>    file = 'timeline-OPLchP8holdFu.html'
1419c1611
<    file = '/cluster/projects/nn9036k/scripts/tower_TK60/pipeline_info/execution_report_2024-01-28_18-10-32.html'
---
>    file = '/cluster/projects/nn9036k/scripts/tower_TK60optim/pipeline_info/execution_report_2024-01-29_10-05-01.html'
1424c1616
<    file = '/cluster/projects/nn9036k/scripts/tower_TK60/pipeline_info/execution_trace_2024-01-28_18-10-32.txt'
---
>    file = '/cluster/projects/nn9036k/scripts/tower_TK60optim/pipeline_info/execution_trace_2024-01-29_10-05-01.txt'
1429c1621
<    file = '/cluster/projects/nn9036k/scripts/tower_TK60/pipeline_info/pipeline_dag_2024-01-28_18-10-32.html'
---
>    file = '/cluster/projects/nn9036k/scripts/tower_TK60optim/pipeline_info/pipeline_dag_2024-01-29_10-05-01.html'
1456c1648
< runName = 'awesome_gilbert'
---
> runName = 'cheesy_raman'

This doesn’t seem to be a Nextflow problem. Pipeline optimization only changes resource requests, so I can’t see how it would affect a process that is deterministic by nature. The issue seems to be that some quantification approaches have a certain amount of uncertainty in them, so sometimes, there’s not much one can do. Transcript quantification is one example. There’s a question on Biostars with an answer to how you can quantify this uncertainty in Salmon.

1 Like