Hi all,
I have a Nextflow pipeline that I run on Google Cloud. Initially, I used the Google Cloud Life Sciences, but I recently migrated to Google Batch due to the deprecation of Life Sciences. The pipeline worked fine after the migration. However, when running the pipeline with new samples, it fails with Google Batch, while it still works with Google Cloud Life Sciences.
With Google Batch, I get the following error message from Nextflow:
[de/8cd1ba] NOTE: Process `preprocess (1)` terminated for an unknown reason -- Likely it has been terminated by the external system -- Error is ignored
Checking the Google Batch job logs, I found this error message:
severity: "ERROR"
textPayload: "Error: daemonize.Run: readFromProcess: sub-process: Error while mounting gcsfuse: mountWithArgs: mountWithStorageHandle: fs.NewServer: create file system: SetUpBucket: BucketHandle: storageLayout call failed: rpc error: code = InvalidArgument desc = Bucket is a requester pays bucket but no user project provided."
To troubleshoot, I tried using FUSE and Wave, but then encountered a permission issue:
"level":"error","error":"googleapi: Error 403: miparrama@project.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist)
I guess these permissions are not necessary for Google Life Sciences. The samples are stored in a user payed bucket of a third party and I wont be able to obtain the adequate permissions.
I hope someone can help me out finding a solution here.
Thanks in advance.
Miguel
Here I attach my nextflow.config that I have used:
// Process parameters
process {
errorStrategy = { task.exitStatus in [14] ? 'retry' : 'ignore' }
maxRetries = 5
withName: preprocess {
executor = 'google-batch'
container = 'atlas-filter:latest'
machineType = 'n1-highcpu-8'
maxForks = 15
disk = { 375.GB }
}
withName: pathseq {
executor = 'google-batch'
container = 'gatk4'
machineType = 'n1-highmem-16'
maxForks = 40
}
withName: kraken2 {
executor = 'google-batch'
container = 'kraken2'
machineType = 'n1-highmem-16'
maxForks = 30
}
}
// Specify Google Lifescience processes
google {
project = 'project'
location = 'europe-west4'
enableRequesterPaysBuckets = true
batch.bootDiskSize = 50.GB
batch.serviceAccountEmail = 'miparrama@project.iam.gserviceaccount.com'
batch.spot = false
}
fusion.enabled = true
wave.enabled = true
// Capture report with NF-Tower
tower {
accessToken = 'token'
enabled = true
}