Our pipeline normally operates on paired FASTQ files with read lengths of 42bp and 26bp in R1 and R2 respectively.
Recently, we’ve used an alternative sequencing provider and now our R1 and R2 are 150bp in length. This greatly increases the size of our FASTQ files and downstream files like BAM files.
For larger samples, I am now consistently getting an error on our Platform-launched, Wave/Fusion, Batch / Amazon Compute Environment pipeline for any process that is renaming a large file I.e. using a mv
command. E.g:
process initial_feature_count{
tag "$sample_id"
input:
tuple val(sample_id), path(sorted_bam)
path(gtf)
output:
tuple val(sample_id), path("${sample_id}.sortedByCoord.featureCounts.bam"), emit: feature_count_bam
script:
"""
# Run feature counts on the sorted STAR bam including strandedness and annotation of multimappers
featureCounts -a $gtf -o ${sample_id}.star.featureCounts.gene.txt -R BAM $sorted_bam -T 4 -t transcript -g gene_id --fracOverlap 0.5 --extraAttributes gene_name -s 1 -M
mv ${sample_id}.sortedByCoord.out.bam.featureCounts.bam ${sample_id}.sortedByCoord.featureCounts.bam
"""
}
In these cases, the file being renamed is approximately >5GB. The tasks are suitably provisioned for RAM and CPU.
An excerpt from the fusion logs looks like this:
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 5YGM83ZNH2TGCS9C, HostID: gZhCR02LS3k1Ub663gYRtiu4T9NAjtKAOFBPuSL/Kq8XIab0D1ve+W6szfAVBWAtgS7QHLDxBEk=, api error InvalidArgument: Range specified is not valid for source object of size: 251658240","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.0.chk","bucket":"csg-tower-bucket","time":1720040365224810836,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 5YGN5H8XFJTKV06B, HostID: LFHRtuYPzQGrcVqRBCpczQY9+zsLrjNAGPDrTKZ15HZt0s29T7mPscS8B6CCzMG4L2x3PA4UkiY=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.F000000.chk","bucket":"csg-tower-bucket","time":1720040365368268383,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 5YGNZEHS3KCAQKFF, HostID: +uzQR4wDUF/4+VTwKKD+aY7uUFUQm8pwDiKZOhWOrl15M42ix2GSkyut8qVoAVHll4QKWbiVbGs=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.1E000000.chk","bucket":"csg-tower-bucket","time":1720040365512161407,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 5YGRQQ7JDGKRYEAA, HostID: 9Hl1n6wpWT9D7DWAVt2sdGYEOuz2BdMSK6b8E6januC4eDWAm0JJroZh0OvPtr95+bwLQ6T72V8=, api error InvalidArgument: Range specified is not valid for source object of size: 251658240","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.2D000000.chk","bucket":"csg-tower-bucket","time":1720040365654444895,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 5YGJP4N6ATBFY5K3, HostID: 4Qd6Kt8VhNyQ0D/T05TaFBglt/VsclmpK6NU/LDLYotJ4cs5lJAfawRcaldx3Ggn0vTuFZesI9E=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.3C000000.chk","bucket":"csg-tower-bucket","time":1720040365796822564,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 5YGYDMHM4VCBEV7Q, HostID: py8YSd0hbz9X/1AxGaJiCeHEzm0fnYQc9+h/vWjGW0XOMuSi6vfbPfpwdgb8W6xrjU9bIzrCHHk=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.4B000000.chk","bucket":"csg-tower-bucket","time":1720040365950307733,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 6TMFR6ADJT5STWCC, HostID: Gg8uZvZkauU4xlph4UrRAnJ1Ewfq7ENhSqBaUPfpPPPGv5/zX21MNjEXWGUZFLjbYN74Ow2TqhU=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.5A000000.chk","bucket":"csg-tower-bucket","time":1720040366114178281,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 6TM6987B5R4N19HQ, HostID: +MGwLqDrYcwaBcNbHjREuc7ccuwbJJvQ1SOc/jB3V0V3bg62D/oTGmC9fusrGmtGIsorCVll7RQ=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.69000000.chk","bucket":"csg-tower-bucket","time":1720040366235555129,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 6TMBJ6WSSPPC3RR6, HostID: hDkUDgqulU9/JOLLsF+u1NUc8BEko8oVFn+2fzN/LUT7IgURCby8GQdt0pr0KrOVJhOGV1Lp2yo=, api error InvalidArgument: Range specified is not valid for source object of size: 251658240","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.78000000.chk","bucket":"csg-tower-bucket","time":1720040366384261492,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 6TMCMZEDMPSHE1CE, HostID: bqI1QT2IkDGfY5SwS+C1+ELk1yOtJKMW0pkk7i41VVpv7hhnRKBXfuZ6uDGHOep+KSJYFGgl/wU=, api error InvalidArgument: Range specified is not valid for source object of size: 251658240","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.87000000.chk","bucket":"csg-tower-bucket","time":1720040366552076383,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 6TM28GTG4ZT5G8DX, HostID: pbUO9UKB43zn7+oUtJ4CpJeDsqAU42VZ6t3pg977J4cD1YjkjDc7/lIHIgR3BtNXpSCY2aEzskU=, api error InvalidArgument: Range specified is not valid for source object of size: 251658240","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.96000000.chk","bucket":"csg-tower-bucket","time":1720040366688332934,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 6TM1W0JV2P9KW1A1, HostID: QrGv4OsGxHJ3UjLzTq0vWow5kRB8g8kp9fas+QCR5Nhn7NXhVVkUBhrmZmmxRyPffYHCx/EbiKY=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.A5000000.chk","bucket":"csg-tower-bucket","time":1720040366827712574,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: 6TMEJ6KG7J7DGW8K, HostID: WtZfEwCaBEDkpk3b2koXIJXgI3yNyPawyFiFa2HAJJLDc6krmUng20vdj0+sRFBLNUJsedBeHbg=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.B4000000.chk","bucket":"csg-tower-bucket","time":1720040366979928113,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: RP1624QCYZ4C85AB, HostID: roEs8AZaJFo3wWxAuNMVxk8qVKhEwhrxA12g8jvUZpva5yFKl1rz4pXDyfPWK9CW9rc7mSI3shY=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.C3000000.chk","bucket":"csg-tower-bucket","time":1720040367107262578,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: RP17KQWKK97XQVMQ, HostID: TAf5Z2C4EO4s0TxHcOF+emJHgi627Ry5OMl0PVyiPAfcD2G9F0kpqTuFba5XkujWzqP2wposgUA=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.D2000000.chk","bucket":"csg-tower-bucket","time":1720040367237245797,"caller":"utils.go:112","message":"at entries complete"}
{"level":"error","error":"operation error S3: UploadPartCopy, https response error StatusCode: 400, RequestID: RP15CVRR81PEDQ71, HostID: BMO89kHHdvyfs1KxZx5NOkfBDGaJm7vWOCGA3X7mIR5wizdANSvFYT+94HhkZDH+gVfIzqkrvlQ=, api error InvalidRequest: The specified copy range is invalid for the source object size","key":"scratch/16TqtvmXjo26kk/7b/9a79c2a440742c65d8a1a781d66974/Sample129_S129.sortedByCoord.featureCounts.bam.E1000000.chk","bucket":"csg-tower-bucket","time":1720040367502392537,"caller":"utils.go:112","message":"at entries complete"}
In some of the processes, I’ve been able to rewrite them so that the mv
is not necessary. But in others, like the one above, the mv
is the most sensible approach.
Anybody know why this is happening and how I can fix it?
Here’s the fusion log from one of the tasks in question.
fusion.txt (664.8 KB)