Hi everyone,
I am writing a script that aims to process each FASTA file in ${params.assemblies} separately and completes all associated tasks before moving to the next file.
This is the current script that I have, which I anticipate should work:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
println "Assemblies: ${params.assemblies}"
println "Query: ${params.query}"
// Define input channels
assembly_ch = Channel.fromPath(params.assemblies)
.ifEmpty { error "Cannot find any assemblies matching: ${params.assemblies}" }
query_ch = Channel.value(params.query)
// Function to run the entire workflow for a given assembly
process runWorkflowForAssembly {
input:
path fasta_file
path query_file
output:
path "blast_output/${query_file.baseName}_vs_${fasta_file.baseName}.out",
path "blast_output_filtered/${query_file.baseName}_vs_${fasta_file.baseName}_filtered.out",
path "extracted_sequences/${fasta_file.baseName}_extracted.fasta",
path "extracted_sequences/${query_file.baseName}_coords_with_strand.txt"
script:
"""
# Create a BLAST database for the FASTA file
db_prefix="blastdb/${fasta_file.baseName}_blastdb"
mkdir -p ${db_prefix}
makeblastdb -in ${fasta_file} -dbtype nucl -out ${db_prefix}/${fasta_file.baseName}
# Perform BLAST search
blastn -query ${query_file} -db ${db_prefix}/${fasta_file.baseName} -out blast_output/${query_file.baseName}_vs_${fasta_file.baseName}.out -outfmt 6
awk '\$4 >= 50' blast_output/${query_file.baseName}_vs_${fasta_file.baseName}.out > blast_output_filtered/${query_file.baseName}_vs_${fasta_file.baseName}_filtered.out
# Extract sequences based on filtered BLAST output
coords_file="extracted_sequences/${query_file.baseName}_coords_with_strand.txt"
forward_coords_file="forward_coords_${fasta_file.baseName}.txt"
reverse_coords_file="reverse_coords_${fasta_file.baseName}.txt"
temp_reverse_file="temp_reverse_${fasta_file.baseName}.fasta"
mkdir -p extracted_sequences
# Extract coordinates and store in a file with forward and reverse strand info
awk '{
if (\$9 < \$10) {
start = (\$9-50 < 1) ? 1 : \$9-50;
print \$2":"start"-"(\$10+50)"\\tforward";
} else {
start = (\$10-50 < 1) ? 1 : \$10-50;
print \$2":"start"-"(\$9+50)"\\treverse";
}
}' blast_output_filtered/${query_file.baseName}_vs_${fasta_file.baseName}_filtered.out > \$coords_file
# Separate into forward and reverse coordinate files
grep forward \$coords_file | cut -f1 > \$forward_coords_file
grep reverse \$coords_file | cut -f1 > \$reverse_coords_file
samtools faidx ${fasta_file}
# Ensure the output file for extracted sequences is created or appended properly
touch extracted_sequences/${fasta_file.baseName}_extracted.fasta
# Check if forward coordinates exist and extract forward sequences
if [ -s \$forward_coords_file ]; then
samtools faidx ${fasta_file} -r \$forward_coords_file >> extracted_sequences/${fasta_file.baseName}_extracted.fasta || {
echo "Error extracting forward sequences from ${fasta_file}" >> extracted_sequences/errors.log
}
fi
# Check if reverse coordinates exist and extract reverse complement sequences
if [ -s \$reverse_coords_file ]; then
samtools faidx ${fasta_file} -r \$reverse_coords_file > \$temp_reverse_file || {
echo "Error extracting reverse sequences from ${fasta_file}" >> extracted_sequences/errors.log
}
# Use Biopython to reverse complement the sequences and append them to the same file
python3 -c "
import sys
from Bio import SeqIO
with open('\$temp_reverse_file') as fasta_in, open('extracted_sequences/${fasta_file.baseName}_extracted.fasta', 'a') as fasta_out:
for record in SeqIO.parse(fasta_in, 'fasta'):
record.seq = record.seq.reverse_complement()
SeqIO.write(record, fasta_out, 'fasta')
" || {
echo "Error in Biopython reverse complement" >> extracted_sequences/errors.log
}
rm \$temp_reverse_file
fi
"""
}
// Workflow definition
workflow {
assembly_ch
.combine(query_ch) // Combine each assembly with the query
.set { pairs ->
pairs.map { fasta_file, query_file ->
runWorkflowForAssembly(fasta_file, query_file)
}
}
}
But I keep on getting what seems to be a syntax error:
N E X T F L O W ~ version 24.04.2
Launching test.nf
[dreamy_tesla] DSL2 - revision: 75d4027402
ERROR ~ Script compilation error
- file : /rds/projects/m/mcdonamc-sept-protect-wp4/avrstb6_alignment/test.nf
- cause: Unexpected input: ‘{’ @ line 15, column 32.
process runWorkflowForAssembly {
^
1 error
NOTE: If this is the beginning of a process or workflow, there may be a syntax error in the body, such as a missing or extra comma, for which a more specific error message could not be produced.
– Check ‘.nextflow.log’ file for details
What am I missing here?
Thanks very much in advance!