I am trying to run a nextflow workflow works great on my EC2 instance, but when I try and run it on AWS batch, I get the error regarding awscli not being able to find the libz library.
/usr/local/aws-cli/v2/current/bin/aws: error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or directory
My .command.log:
#!/bin/bash -ue
# Run VEP
vep --input_file hifi_10k.aligned.vcf.gz --output_file hifi_10k_annotated.vcf.gz --format vcf --vcf --compress_output bgzip --cache_version 107 --dir_cache vep_cache --fasta Homo_sapiens-GCA_009914755.4-softmasked.fa --offline --symbol --species homo_sapiens_gca009914755v4 --force_overwrite
# Index the annotated VCF
tabix -p vcf hifi_10k_annotated.vcf.gz
# Create versions.yml file
cat <<-END_VERSIONS > versions.yml
"vep":
vep: $(vep --help 2>&1 | grep "Versions:" | sed 's/^.*Versions: //; s/ .*$//')
END_VERSIONS
My Docker image used by the compute environment in AWS Batch is loading the libz library, AFAIK:
RUN apt-get update && apt-get install -y \
wget \
build-essential \
gcc \
g++ \
git \
cpanminus \
zlib1g
zlib1g-dev \
default-libmysqlclient-dev \
libdbd-mysql-perl \
bedtools \
libkrb5-3 \
samtools \
&& wget $UCSC_URL -O /usr/local/bin/bigWigToBedGraph \
&& chmod +x /usr/local/bin/bigWigToBedGraph \
&& wget $MOSDEPTH_URL -O /usr/local/bin/mosdepth \
&& chmod +x /usr/local/bin/mosdepth \
&& wget $PBCPG_URL -O /usr/local/bin/pb-CpG-tools.tar.gz \
&& tar -xzf /usr/local/bin/pb-CpG-tools.tar.gz -C /usr/local/bin/ \
&& rm /usr/local/bin/pb-CpG-tools.tar.gz \
&& ln -s /usr/local/bin/pb-CpG-tools-v2.3.2-x86_64-unknown-linux-gnu/bin/aligned_bam_to_cpg_scores /usr/local/bin/aligned_bam_to_cpg_scores \
&& r m -rf /var/lib/apt/lists/*
The confusing thing is that I am running VEP from a container image, so I don’t know where the issue is, the Docker image used in my compute environment on AWS Batch, or the VEP container I use to run VEP:
process vep {
container 'quay.io/biocontainers/ensembl-vep:112.0--pl5321h2a3209d_0'
publishDir "${params.output_dir}", mode: 'copy'
input:
path vcf
path reference
path vep_cache
output:
path "${vcf.simpleName}_annotated.vcf.gz", emit: annotated_vcf
path "${vcf.simpleName}_annotated.vcf.gz.tbi", emit: annotated_vcf_tbi
path "versions.yml", emit: versions
script:
"""
# Run VEP
vep --input_file ${vcf} \
--output_file ${vcf.simpleName}_annotated.vcf.gz \
--format vcf \
--vcf \
--compress_output bgzip \
--cache_version 107 \
--dir_cache ${vep_cache} \
--fasta ${reference} \
--offline \
--symbol \
--species homo_sapiens_gca009914755v4 \
--force_overwrite
# Index the annotated VCF
tabix -p vcf ${vcf.simpleName}_annotated.vcf.gz
# Create versions.yml file
cat <<-END_VERSIONS > versions.yml
"${task.process}":
vep: \$(vep --help 2>&1 | grep "Versions:" | sed 's/^.*Versions: //; s/ .*\$//')
END_VERSIONS
"""
}