I have data that looks like this:
[[id:test2], [replicate:replicate1, full:test2_replicate1.idr.normed.bed.full, entropy:test2_replicate1.entropy.full], [replicate:replicate2, full:test2_replicate2.idr.normed.bed.full, entropy:test2_replicate2.entropy.full], test2.idr]
but then when I try to put that into my process:
process GET_REPRODUCING_PEAKS {
publishDir "${params.outdir}/reproducingPeaks", mode: 'copy'
container 'docker://brianyee/merge_peaks:0.1.0'
input:
tuple val(meta), path(replicate1), path(replicate2), path(idr)
output:
tuple val(meta), path("*.full"), emit: full
tuple val(meta), path("*.bed"), emit: bed
script:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
get_reproducing_peaks.pl ${replicate1.full} ${replicate2.full} ${prefix}.${replicate1.replicate}.final.full ${prefix}.${replicate1.replicate}.final.full ${prefix}.${replicate1.replicate}.final.bed ${prefix}.${replicate1.replicate}.final.bed ${replicate1.entropy} ${replicate2.entropy} ${idr}
"""
}
I get:
ERROR ~ Error executing process > ‘NFCORE_ECLIPSEQ:ECLIPSEQ:GET_REPRODUCING_PEAKS (2)’
Caused by:
Not a valid path value type: java.util.LinkedHashMap ( [replicate:replicate1, full:test2_replicate1.idr.normed.bed.full, entropy:test2_replicate1.entropy.full])
why is this?
Thank you very much in advance!!!
Hey Ramiro.
Indenting a channel element when it consists of a tuple is a great way to have a clear understanding of its structure. Check the indented version of your channel element below:
[
[id:test2],
[
replicate:replicate1,
full:test2_replicate1.idr.normed.bed.full,
entropy:test2_replicate1.entropy.full
],
[
replicate:replicate2,
full:test2_replicate2.idr.normed.bed.full,
entropy:test2_replicate2.entropy.full
],
test2.idr
]
The input block in your GET_REPRODUCING_PEAKS
process says the channel element has four items (a val meta
, a path replicate1
, another path replicate2
and another path idr
), but as you can see above in the indented version of your channel element this is not the case.
replicate1
is a LinkedHashMap: [replicate:replicate1, full:test2_replicate1.idr.normed.bed.full, entropy:test2_replicate1.entropy.full]
, not a path, and that’s exactly what the error message is saying:
Not a valid path value type: java.util.LinkedHashMap ( [replicate:replicate1, full:test2_replicate1.idr.normed.bed.full, entropy:test2_replicate1.entropy.full])
1 Like
Thank you Marcel! Actually, I hadn’t really realized that this was a different class. The solution would just be to use val as the input type instead of path.
Not really. You can easily change your code by adding the val
type qualifier to the two middle items, and get rid of the error message, but Nextflow wouldn’t stage the files accordingly in the task work directory, so the solution to this issue is for you to change the channel element structure.
See the use of the map
operator below:
Channel
.of(
[
[id:'test2'],
[
replicate:'replicate1',
full:file('full_rep1.txt'),
entropy:file('entropy_rep1.txt')
],
[
replicate:'replicate2',
full:file('full_rep2.txt'),
entropy:file('entropy_rep2.txt')
],
file('test2.idr')
]
)
.map { tuple(it[0]['id'], it[1]['replicate'], it[1]['full'], it[1]['entropy'], it[2]['replicate'], it[2]['full'], it[2]['entropy'], it[3]) }
.view()
Output:
With that, you can have your input block like:
...
input:
tuple val(meta), val(rep1), path(full_rep1), path(entropy_rep1), val(rep2), path(full_rep2), path(entropy_rep2), path(idr)
...
PS: In your script block, you seem to be calling ${prefix}.${replicate1.replicate}.final.full
twice instead of ${prefix}.${replicate1.replicate}.final.full ${prefix}.${replicate2.replicate}.final.full
.
Thank you Marcel. But I am not clear, meta for example is usually a LinkedHashMap (e.g. [id:id1, sample:sample1, replicate:rep1] and is declared as val in the input section:
input:
tuple val(meta)
and then accessed through the various tags, for example in my case with the prefix:
script:
def prefix = "${meta.id}
and hence if I have
input:
tuple val(meta), val(replicate1)
and replicate1 is declared as a LinkedHashMap:
[
replicate:replicate1,
full:test2_replicate1.idr.normed.bed.full,
entropy:test2_replicate1.entropy.full
]
I could then access it like this in the script area:
"""
myProgram ${replicate1.full} ${replicate1.entropy}
"""
Wouldn’t this work?
You’re correct about the meta.id
. If you still wanna call it this way, you have to make it a dictionary, but you can just have meta
instead, with the code I provided.
About the val
, no, you shouldn’t do it this way. It works in your case, because everything is local. But if you’re in a more complex environment (HPC, cloud, etc) it won’t work, as the files are not being staged to the task work directory.
Thank you, this is really helpful. It does sound like I should rather not try to pass things such as:
[
replicate:'replicate1',
full:file('full_rep1.txt'),
entropy:file('entropy_rep1.txt')
]
(ie… a LinkHashMap) as an input, there is really no adequate structure that would receive this unless it were all plain text (such as in the meta), but rather something such as:
['replicate1', 'full_rep1.txt','entropy_rep1.txt']
and hence the input would be val, path, path
Looking forward to learning more! Barcelona here I come!
1 Like