Hello,
Firstly I’m new to nextflow (and programmation). I learned to do a pipeline during a course where we worked on a cluster.
Now I’m trying to put my scripts from a project into a nextflow pipeline, and I (try to) run it locally.
The first process use Jellyfish, for which i found a container (at quay.io, the 2.3.1–py310h184ae93_5 version). When I look at the Manifest layers, it seems bash is available.
For my pipeline, the workflow calls the processes in moduls (with the channels as inputs), and they in turn call a script located in a bin directory (in the repo). I’m used to script starting with “#!/usr/bin/env bash”, but here i get :
“env: can’t execute 'bash
': No such file or directory”
More informations:
I’m on Windows and i use wsl2 with Ubuntu 24.04.1
I use “-profile singularity”, made in the nextflow.config with singularity.enabled and .autoMounts = true
I don’t know what I’m missing. I guessed it could be because env doesn’t exist in my repo, but there is no issue for “#!/usr/bin/env nextflow”. I tried to use “#!/usr/bin/bash”, but then i get the error “cannot execute: required file not found”.
Sorry if this post is messy, if you need more information I shall provide.
include { JELLYFISH_COUNT } from './modules/local/jellyfish/count/main.nf'
[...]
pool_male_r1_ch = Channel.fromPath("${params.pool_m_r1_path}")
pool_male_r2_ch = Channel.fromPath("${params.pool_m_r2_path}")
suf_m=Channel.of("male")
[...]
workflow {
JELLYFISH_COUNT(pool_male_r1_ch,pool_male_r2_ch,suf_m)
}
"
The two files are just one sequence randomly generated of 1000bp and its reverse complement.
The file called is:
#!/usr/bin/env nextflow
/*
* Count kmers in a fasta file
*/
process JELLYFISH_COUNT{
label 'lowmem'
tag 'JELLYFISH_COUNT'
publishDir "${params.resultdir}/00_01_jellyfish_count", mode: 'copy', pattern: '00_01_jellyfish_count/*.jf'
input:
path(fasta_1)
path(fasta_2)
val(suffix)
output:
tuple path("00_01_jellyfish_count/*.jf"), val(suffix), emit: count_tuple
path("00_01_jellyfish_count/*.log")
path("00_01_jellyfish_count*.log")
path("00_01_jellyfish_count*.cmd")
script:
"""
jellyfish_count.sh ${fasta_1} ${fasta_2} ${suffix} ${task.cpus} jellyfish_count.cmd >& jellyfish_count.log 2>&1
"""
}
And the script called is:
#!/usr/bin/bash
##################################################################################
## ##
## Count the kmers ##
## ##
##################################################################################
# Get script arguments coming from modules/00_01_jellyfish_count.nf process
args=("$@")
DATA1=${args[1]}
DATA2=${args[2]}
SUFFIX=${args[3]}
NCPUS=${args[4]}
LOGCMD=${args[5]}
# Commands to execute
CMD="jellyfish count -C -m 30 -s 1000000000 -t 16 <(zcat ${DATA1}) <(zcat ${DATA2}) -o reads_${SUFFIX}.jf"
# Save commands in log
echo ${CMD} > ${LOGCMD}
# Execution
eval ${CMD}
I looked at your link, but i’m not quite sure how they resolved the issue. I heard about something similar with tab spacing, between Windows and Linux, but i don’t know how to fix this (if it’s that).
I changed my files from CRLF to LF and it seems it did the trick. I also had some other issues (like the counting of the arguments), and I still have an error ("Missing output file(s) jellyfish_count/*.jf expected by process JELLYFISH_COUNT (JELLYFISH_COUNT)) but I should be able to resolve this one.
Thank you for your help!
As for missing output file(s), your Nextflow process seems to be expecting output files within a folder named 00_01_jellyfish_count/ but I don’t see where/how this folder is created. Maybe the output files are being dumped at the root of the task folder instead and you just need to get rid of the 00_01_jellyfish_count/ in the output part of the process.