Hi, I have this process in nextflow DSL2:
process permuting_scores {
input:
tuple val(PERMS), path(BINS), path(SCORES), path(HHOTNET)
output:
path "scores_${PERMS}.tsv", emit: scores
script:
"""
python ${HHOTNET}/src/permute_scores.py \
-i ${SCORES} \
-bf ${BINS} \
-s "${PERMS}" \
-o scores_${PERMS}.tsv
"""
}
Which works fine and the pipeline runs in reasonable time (~6 hours), however, I need to increase the number of permutations (PERMS
) to 100 (currently, it’s set to 5). I launched the pipeline and after 10 days was still running. One idea is to use parallel
GNU command, but I haven’t achieved it. I tried this:
process permuting_scores {
input:
tuple val(PERMS), path(BINS), path(SCORES), path(HHOTNET)
output:
path "scores_*.tsv", emit: scores
script:
"""
parallel -j ${task.cpus} --bar \
python ${HHOTNET}/src/permute_scores.py \
-i ${SCORES} \
-bf ${BINS} \
-s {} \
-o scores_{}.tsv \
::: \$(seq ${PERMS})
"""
}
But I got this error message multiple times:
executor > local (40)
[bf/f66d92] data_formatting | 1 of 1 ✔
[25/d11c16] similarity_matrix | 1 of 1 ✔
[db/c4aa4d] permuting_network (2) | 0 of 4
[e6/d2825e] find_permutation_bins | 1 of 1 ✔
[22/32a6bc] permuting_scores (36) | 21 of 100
[- ] construct_hierarchies | 0 of 56
[- ] processing_hierarchies -
[- ] performing_consensus -
ERROR ~ Invalid method invocation `call` with arguments: [work/a0/c50bf04ce2a2e75d6faf42a07064b2/scores_1.tsv, work/a0/c50bf04ce2a2e75d6faf42a07064b2/scores_2.tsv, work/a0/c50bf04ce2a2e75d
6faf42a07064b2/scores_3.tsv, work/25/d11c16a64dc28763b949e97548e1f9/similarity_matrix.h5, work/bf/f66d9224caca980a3068d57a4f7882/index_gene.tsv, 0] (java.util.LinkedList) on _closure30 type
Likewise, I have this code which works perfectly after the previous permuting_scores
process without using parallel
:
process construct_hierarchies {
input:
tuple val(PERMS), path(SIMMATRIX), path(IDXGENE), path(SCORES), path(HHOTNET)
output:
tuple path("hierarchy_edge_list_${PERMS}.tsv"), path("hierarchy_index_gene_${PERMS}.tsv")
script:
"""
python ${HHOTNET}/src/construct_hierarchy.py \
-smf ${SIMMATRIX} \
-igf ${IDXGENE} \
-gsf ${SCORES} \
-helf hierarchy_edge_list_${PERMS}.tsv \
-higf hierarchy_index_gene_${PERMS}.tsv
"""
}
And its dual parallel
version is:
process construct_hierarchies {
input:
tuple val(PERMS), path(SIMMATRIX), path(IDXGENE), path(SCORES), path(HHOTNET)
output:
tuple path("hierarchy_edge_list_*.tsv"), path("hierarchy_index_gene_*.tsv")
script:
"""
parallel -j ${task.cpus} --bar \
python ${HHOTNET}/src/construct_hierarchy.py \
-smf ${SIMMATRIX} \
-igf ${IDXGENE} \
-gsf ${SCORES}_{} \
-helf hierarchy_edge_list_{}.tsv \
-higf hierarchy_index_gene_{}.tsv \
::: \$(seq 0 ${PERMS})
"""
}
How can I incorporate parallel
to my code?