Questions about nextflow cutandrun

Hi everyone — I’m running the nf-core/cutandrun pipeline on a RAD21 CUT&RUN dataset and comparing the results with peaks generated from our lab’s internal pipeline. Overall the signal-to-noise looks good, but we noticed that peaks expected at two known 3′CBE sites were not retained in the nf-core results. These loci have highly similar sequences, so I’m wondering whether the default alignment or filtering steps (e.g., bowtie2 handling of multi-mapping reads, duplicate filtering, or MACS2 thresholds) could cause reads at these locations to be discarded. Are there recommended parameter adjustments for CUT&RUN datasets where true binding sites occur at duplicated or highly similar genomic regions?

We also noticed that the bigWig tracks appear to have different total mapped read depths across samples, which required manual rescaling in IGV to compare signal between replicates. Could you clarify how the pipeline normalizes bigWig tracks by default and whether there is a recommended normalization approach for comparing samples with different sequencing depths?

Finally, when troubleshooting some failed tasks I found that logs were spread across many .command.sh files in different work directories. Is there a recommended way to generate a consolidated or easier-to-follow command log to help track what commands were executed and debug issues more efficiently?

best,

Taimor

best,

Taimor

the q3 is expected behavior of nextflow. it might be strange at first, but the nextflow design actually offers many benefits in debugging and error tracing. you are doing the right way, going into the workDir to debug the error. each workDir is an isolated env for reproducibility and portablity.

if there some control sample you could share or the regions of your interest. i could try help with q1 and q2.

Hi,

Thank you so much for your reply. I have talked with my collaborators and they are happy to share the data. I am happy to share the data and my data analysis work. Is there a preferred method for doing so? I can also meet over zoom anytime to discuss.

best,

Taimor

best,

Taimor

@Taimor_Williams ,

regarding your q2

there is the normalization flag, by default it is using spike-in, if you did not use spike-in, then it is not doing any normalization.

Hi,

So I ran.

”””
nextflow run nf-core/cutandrun \

-profile singularity \

--input sample_sheet.csv \

--peakcaller MACS2 \

--outdir xi_out_all_fastq \

--igg_control false \

--use_control false \

--fasta /storage/genomes/hg38_AID-VB1_8/hg38_AID-VB1_8.fa \

--gtf /storage/genomes/hg38/annotation/refGene.gtf \

--save_reference
”””

Also as stated more than happy to hsare my workflow and the data I used. We would like to get this pipeline working for our researchers.

best,

Taimor

try adding

--normalisation_mode BPM

Hi,

I ran,

It went normalization and with different peak settings I did,

”””
nextflow run nf-core/cutandrun \
-profile singularity \
–input different_group_samplesheet.csv \
–peakcaller seacr,macs2 \
–outdir xi_out_all_fastq_nfcore_tweaks \
–igg_control false \
–use_control false \
–fasta /storage/genomes/hg38_AID-VB1_8/hg38_AID-VB1_8.fa \
–gtf /storage/genomes/hg38/annotation/refGene.gtf \

--normalisation_mode BPM \
–save_reference
”””

yet still got the same result. Also it’s not normalized several of my samples are significantly different than the rest. As you can see the not nextflow is worse overall but it has the peaks where as the nextflow doesn’t have pekas at CBE1 and CBE2

not nexftlow:

nextflow: