Hi All
I am having a bit of trouble analyzing my PacBio Kinnex full length 16s rRNA data. The data I received from the sequencing center were already demultiplexed with primers removed. I tried pivoting to the BacBio HifI workflow which uses nextflow. The benefit here being that you can skip the cut-adapter step and the pipeline is recommend for this type of data. However, running this on the cluster has been problematic. The cluster uses SGE and my job_script might be giving problem as this is my first time working on a cluster. The script in question after trying to troubleshoot for the past month (I also attached the nextflow.config file):
`#!/bin/bash
#$ -N HiFi16SJob
#$ -cwd
#$ -pe smp 64
#$ -l h_vmem=128G
#$ -q bigmem
#$ -j y
# Initialize conda in the current shell environment
__conda_setup="$('/home/ICE/jbeer/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/ICE/jbeer/anaconda3/etc/profile.d/conda.sh" ]; then
. "/home/ICE/jbeer/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/ICE/jbeer/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
# Activate the conda environment
conda activate nextflow
# Change to the directory containing your Nextflow pipeline
cd /home/ICE/jbeer/pb-16S-nf
# Run Nextflow with the main.nf script and specify the input data, metadata, and the skip_primer_trim parameter
nextflow run main.nf \
--input /home/ICE/jbeer/pb-16S-nf/test_data/testing.tsv \
--metadata /home/ICE/jbeer/pb-16S-nf/test_data/test_metadata.tsv \
--skip_primer_trim true \
--VSEARCH_threads 30 \
--DADA2_threads 30 \
--cutadapt_threads 4 \
-profile conda
`
Just to provide some more details on issues I have encountered:
Failed to submit process to grid scheduler for execution
Command executed:
qsub -terse .command.run
Command exit status:
1
Command output:
Unable to run job: “job” denied: use parallel environments instead of requesting slots explicitly
Exiting.
Work dir:
/home/ICE/jbeer/pb-16S-nf/work/8d/a94282f9449ce56b8e96a0532739b2
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
– Check ‘.nextflow.log’ file for details
====== begin epilog ======
Job finished at 2024-08-27_11:46:07.236063476 (UTC+02)
Exit status = 1
====== end epilog ======
After adding process.penv = 'smp'
to the nextflow.config file, I got the error:
Command wrapper:
====== begin prolog ======
Job name = nf-pb16S_QC_fastq_(1), Job-ID = 57199, owner = jbeer
Workdir = /home/ICE/jbeer/pb-16S-nf/work/f6/aab9e7089e31df3261805845c57d0f
PE = smp, slots = 64, queue = bigmem
Running on clu-blade14.ice.mpg.de, started at 2024-09-03_13:42:29.519821037 (UTC+02)
====== end prolog ======
/opt/sge/default/spool/clu-blade14/job_scripts/57199: 21: /opt/sge/default/spool/clu-blade14/job_scripts/57199: [[: not found
/opt/sge/default/spool/clu-blade14/job_scripts/57199: 30: /opt/sge/default/spool/clu-blade14/job_scripts/57199: Syntax error: redirection unexpected
====== begin epilog ======
Job finished at 2024-09-03_13:42:29.713545426 (UTC+02)
Exit status = 2
====== end epilog ======
Work dir:
/home/ICE/jbeer/pb-16S-nf/work/f6/aab9e7089e31df3261805845c57d0f
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
– Check ‘.nextflow.log’ file for details
====== begin epilog ======
Job finished at 2024-09-03_13:49:35.383213442 (UTC+02)
Exit status = 1
====== end epilog ======
“HiFi16SJob.o57198” 267L, 11470C
Any input on what I am doing wrong with the job_script or cluster would be greatly appreciated! Please let me know if you have any advice. I have worked with 16s amplicon data before, but I am fairly inexperienced with PacBio/long read amplicon data analysis.
Kind regards,
Johann
nextflow.config (1.5 KB)