Hi I am new to Nextflow and AWS batch. I was able to run the example nf-core pipeline on AWS batch no problem. But I am trying to run my pipeline which requires the UCSC util bigWigToBedGraph.
I made a docker container to that installs it in /usr/local/bin:
# Install UCSC utilities including bigWigToBedGraph
RUN wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64.v385/bigWigToBedGraph -O /usr/local/bin/bigWigToBedGraph \
&& chmod +x /usr/local/bin/bigWigToBedGraph
Then I specify the docker container to use in my nextflow config:
I even specify the full path to the executable, which I double checked by runnign the container interactively on EC2:
// Define the process to convert BigWig to BedGraph
process bigWigToBedGraph {
tag "${bw.baseName}"
publishDir "${params.output_dir}", mode: 'copy'
input:
path bw
output:
path "${bw.baseName}.bedGraph"
script:
"""
echo "running bigWigToBedGraph for file: ${bw.baseName}.bedGraph"
/usr/local/bin/bigWigToBedGraph $bw ${bw.baseName}.bedGraph
"""
}
But I still get an error:
running bigWigToBedGraph for file: HG002_rep1.hg38.pbmm2.combined.bedGraph
Error: bigWigToBedGraph not found. Please make sure it's installed and in your PATH.
I am not really sure what is going on and would appreciate any pointers. It seems specifying the full path in the nextflow file should work.
Have you tried running your container (with the same URI you’re passing to Nextflow) in interactive mode and checking if the binary is there and with correct permissions?
That’s unusual. Could you try running which bigWigToBedGraph in the Nextflow process? Or perhaps ls -lha /usr/local/bin which might indicate something wrong with the binary. I’m assuming this is all on x86 and not ARM (Apple silicon).
My only other thought is that it could be user permissions, with some difference between the user on AWS Batch and your local Docker environment.
This is something more fundamentally wrong with your set up. What is your AWS Batch configuration? How did you set the compute environment and queue up? What AWS machine type is the process running on?
I was following along the steps described here to set up my AWS batch. The machine type is AWS Linux with compute environment having
AWSServiceRoleForBatch and ecsInstanceRole attached.
Anyways, I solved my original problem with help from @robsyme on a call. The path to my aws-cli was in /usr/local/bin in my nextflow config. This was mounting /usr/local/bin from the AMI and blocking whatever in my docker container /usr/local/bin (like the bigWigToBedGraph in my docker image)
I updated where my aws-cli was and pointed to a more inncoucrous path in my config and the pipeline ran just fine.