Hi everyone — I’m running the nf-core/cutandrun pipeline on a RAD21 CUT&RUN dataset and comparing the results with peaks generated from our lab’s internal pipeline. Overall the signal-to-noise looks good, but we noticed that peaks expected at two known 3′CBE sites were not retained in the nf-core results. These loci have highly similar sequences, so I’m wondering whether the default alignment or filtering steps (e.g., bowtie2 handling of multi-mapping reads, duplicate filtering, or MACS2 thresholds) could cause reads at these locations to be discarded. Are there recommended parameter adjustments for CUT&RUN datasets where true binding sites occur at duplicated or highly similar genomic regions?
We also noticed that the bigWig tracks appear to have different total mapped read depths across samples, which required manual rescaling in IGV to compare signal between replicates. Could you clarify how the pipeline normalizes bigWig tracks by default and whether there is a recommended normalization approach for comparing samples with different sequencing depths?
Finally, when troubleshooting some failed tasks I found that logs were spread across many .command.sh files in different work directories. Is there a recommended way to generate a consolidated or easier-to-follow command log to help track what commands were executed and debug issues more efficiently?
the q3 is expected behavior of nextflow. it might be strange at first, but the nextflow design actually offers many benefits in debugging and error tracing. you are doing the right way, going into the workDir to debug the error. each workDir is an isolated env for reproducibility and portablity.
if there some control sample you could share or the regions of your interest. i could try help with q1 and q2.
Thank you so much for your reply. I have talked with my collaborators and they are happy to share the data. I am happy to share the data and my data analysis work. Is there a preferred method for doing so? I can also meet over zoom anytime to discuss.
yet still got the same result. Also it’s not normalized several of my samples are significantly different than the rest. As you can see the not nextflow is worse overall but it has the peaks where as the nextflow doesn’t have pekas at CBE1 and CBE2