Hello Fellow community members,
I’ve been encountering an error while attempting to run a Seurat R script within a Nextflow pipeline, and despite trying various approaches, I haven’t been able to resolve it. The error message states ‘Missing output file(s) seurat_analysis.rds expected by process runSeurat (1)’. However, I’ve manually verified that the RDS file is indeed generated in the defined outdir
and is functioning correctly.
Here’s an overview of my scripts and the approaches I’ve tried:
- Calling R script from Nextflow:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
params.h5_file = "/res.h5"
params.outdir = "/results"
params.r_script = "/run_seurat_doubletfinder.R"
// Define the input channel
Channel.fromPath(params.h5_file).map { file ->
def sample_name = file.getParent().getName()
return tuple(sample_name, file)
}.set { h5_ch }
// Define the workflow
process runSeurat {
publishDir "${params.outdir}", mode: 'copy', overwrite: true
input:
tuple val(sample_name), path(h5_file)
output:
tuple val(sample_name), path("seurat_analysis_${sample_name}.rds"), emit:rds_file
script:
"""
source /home/anaconda/etc/profile.d/conda.sh
conda activate ir413
Rscript ${params.r_script} ${h5_file} ${params.outdir}/seurat_analysis_${sample_name}.rds
"""
}
// Define workflow
workflow {
h5_ch | runSeurat
}
My R script:
# Load necessary libraries
args <- commandArgs(trailingOnly = TRUE)
input_path <- args[1]
output_path <- args[2]
# Read the CellBender output data
data.file <- Read_CellBender_h5_Mat(file_name = input_path)
object <- CreateSeuratObject(counts = data.file, project = "seurat_project", min.cells = 3, min.features = 200)
# Some R codes...
# Save the Seurat object
saveRDS(object, file = output_path)
- Including code inside the pipeline:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
params.h5_file = "path/to/input_file.h5"
params.outdir = "path/to/output_directory"
// Define the input channel
Channel.fromPath(params.h5_file).map { file ->
def sample_name = file.getParent().getName()
return tuple(sample_name, file)
}.set { h5_ch }
// Define the workflow
process runSeurat {
publishDir "${params.outdir}", mode: 'copy', overwrite: true
input:
tuple val(sample_name), path(h5_file)
output:
tuple val(sample_name), path("seurat_analysis_${sample_name}.rds"), emit:rds_file
script:
"""
source /path/to/anaconda3/etc/profile.d/conda.sh
conda activate ir413
Rscript -e \"
# Load necessary libraries
process_sample <- function(sample_name, h5_file, outdir) {
cat('Processing sample:', sample_name, '\\n')
# Read CellBender output
data.file <- Read_CellBender_h5_Mat(file_name = h5_file)
# Create Seurat object
obj <- CreateSeuratObject(counts = data.file, project = sample_name, min.cells = 3, min.features = 200)
# More processing codes...
# Save the Seurat object
rds_file_path <- file.path(outdir, paste0('seurat_analysis_', sample_name, '.rds'))
saveRDS(obj, rds_file_path)
if (file.exists(rds_file_path)) {
cat('RDS file successfully saved at:', rds_file_path, '\\n')
} else {
cat('Failed to save RDS file at:', rds_file_path, '\\n')
}
}
\"
"""
}
// Workflow definition
workflow {
h5_ch | runSeurat
}
I do get message that RDS file successfully saved at’ given location.
I would really appreciate your help. I am unable to spot the cause of the error.
Thanks,
Sonal