Runnig Nextflow Chipseq

Hi everyone! My name is Taimor I am a bionformatics RA at BCH. I am trying to run Nextflow chipseq pipeline for some of the researchers in my lab and am encountering some issuses. Currently the error i’m getting is related to the sample sheet i don’t understand how to make it. I’m currently using,

(env_nf) **\[**\~/Nextflow**\]**

taimor**@**RDT01154 $ cat chengxin_samplesheet.csv
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,/storage2/researchers/chengxian/ChIP_seq/AltN468/trimmed_fastq/CXL1239_R1.trimmed.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP,/storage2/researchers/chengxian/ChIP_seq/AltN468/trimmed_fastq/CXL1239_R2.trimmed.fastq.gz,,2,BCATENIN,WT_INPUT,2

and the error i get is,

Command output:

  ERROR: Please check samplesheet -> Control identifier and replicate has to match a provided sample identifier and replicate!

  Control: 'WT_INPUT_REP1'

Command error:

  WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf \[files\]: /etc/resolv.conf doesn't exist in container

  ERROR: Please check samplesheet -> Control identifier and replicate has to match a provided sample identifier and replicate!

  Control: 'WT_INPUT_REP1'

The exact command i run is,

nextflow run nf-core/chipseq \\
    -profile singularity \\
    --input chengxin_samplesheet.csv \\
    --outdir chengxin_out \\
    --fasta /storage/genomes/mm9_AJ851868ins/mm9_AJ851868ins.fa \\
    --gtf /storage/genomes/mm9_AJ851868ins/annotation/genes.mm9_AJ851868ins.gtf \\
    --read_length 100 \\
    --save_reference

I am new to this as this is my 1st time any help would be appreciated!

best,

Taimor

Hi @Taimor_Williams,

Firstly, welcome to Nextflow and welcome to the communtiy!

The key is this line:

ERROR: Please check samplesheet -> Control identifier and replicate has to match a provided sample identifier and replicate!

If you check the sample sheet headers, you can see that the sample identifier is the first column, then the control identifiers and replicate are the last two:

sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate

Your sample IDs are WT_BCATENIN_IP and your control is WT_INPUT, with control replicate 1 and 2. The latter two don’t match any sample identifiers, so it’s throwing an error.

I guess you’ve taken this from the nf-core/chipseq docs, however the example samplesheet there is longer:

group,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP,BLA203A25_S16_L002_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A49_S40_L001_R1_001.fastq.gz,,3,BCATENIN,WT_INPUT,3
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
WT_INPUT,BLA203A30_S21_L002_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,

Note that the final 3 rows here have sample ID WT_INPUT, which matches the control identifier in the top 3 rows. That’s why it works in the example there. It throws an error for you because you truncated those rows.

I hope that makes sense! If you have more questions I’d recommend joining the nf-core Slack (Join nf-core) and finding the #chipseq channel, which is specific to this pipeline.

Cheers,

Phil

Hi Phil,

Thanks so much for the reply. I will join the slack. I am still a bit confused on how to make a sample sheet but i think its clear whats wrong. If it helps I know both are WT. Otheriwise im not sure on labels.

best,

Taimor

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.