Need syntax help declaring a closure or comparator for collectFile sort option

sort: true works

sort: { a, b -> a <=> b }, which looks to be simply the explicit version of true, fails with ERROR ~ Invalid method invocation. So, I’m missing some syntax that makes the closure understandable to collectFile(sort: )

A custom sorting criteria can be specified with a Closure or a Comparator object.

/*
  Version: 24.04.4 build 5917
  Created: 01-08-2024 07:05 UTC 
  System: Linux 5.14.0-362.24.1.el9_3.x86_64
  Runtime: Groovy 4.0.21 on OpenJDK 64-Bit Server VM 11.0.22+7-LTS
  Encoding: UTF-8 (UTF-8)
*/

workflow {
    channel.of(
        "A1,0000,c0",
        "A1,0000,c1",
        "A1,0000,c2",
        "A1,0000,c3",
        "A1,0000,c4",
        "B1,0000,c0",
        "B1,0000,c1",
        "B1,0000,c2",
        "B1,0000,c3",
        "B1,0000,c4",
        "A1,0001,c0",
        "A1,0001,c1",
        "A1,0001,c2",
        "B1,0001,c3",
        "B1,0001,c4",
    )
    .toSortedList { a, b -> a <=> b } .flatten().view()
    .collectFile(seed: "WELL,TILE,CYCLE", name: 'tile-well-list.csv', newLine: true,
        sort: 
        // true // works
        { a, b -> a <=> b } // similar to https://www.nextflow.io/docs/latest/operator.html#tosortedlist
                            // and demonstrated above
                            // ERROR ~ Invalid method invocation `doCall` with arguments: A1,0001,c0 (java.lang.String) on _closure5 type
    )
    .view().splitCsv().view()
}

Welcome to the forum, @Bill_Welch!

Can you share a minimal reproducible example? I can’t access these files in order to reproduce your code here. I’d also ask you not to describe your problem as comments within the code. It makes it harder to read. What do you mean exactly by “not the right order”?

sort: true works

sort: { a, b -> a <=> b }, which looks to be simply the explicit version of true, doesn’t. So, I’m missing some syntax that makes the closure understandable to collectFile(sort: )

/*
  Version: 24.04.4 build 5917
  Created: 01-08-2024 07:05 UTC 
  System: Linux 5.14.0-362.24.1.el9_3.x86_64
  Runtime: Groovy 4.0.21 on OpenJDK 64-Bit Server VM 11.0.22+7-LTS
  Encoding: UTF-8 (UTF-8)
*/

workflow {
    channel.of(
        "A1,0000,c0",
        "A1,0000,c1",
        "A1,0000,c2",
        "A1,0000,c3",
        "A1,0000,c4",
        "B1,0000,c0",
        "B1,0000,c1",
        "B1,0000,c2",
        "B1,0000,c3",
        "B1,0000,c4",
        "A1,0001,c0",
        "A1,0001,c1",
        "A1,0001,c2",
        "B1,0001,c3",
        "B1,0001,c4",
    )
    .toSortedList { a, b -> a <=> b } .flatten().view()
    .collectFile(seed: "WELL,TILE,CYCLE", name: 'tile-well-list.csv', newLine: true,
        sort: 
        // true // works
        { a, b -> a <=> b } // similar to https://www.nextflow.io/docs/latest/operator.html#tosortedlist
                            // and demonstrated above
                            // ERROR ~ Invalid method invocation `doCall` with arguments: A1,0001,c0 (java.lang.String) on _closure5 type
    )
    .view().splitCsv().view()
}

The example in the Nextflow docs for toSortedList is comparing integers. You’re comparing strings here, a whole different thing.

How should it be sorted based on your goal? If you make this clear, I can try to think of a way to get the items sorted for you.

The groovy spaceship operator is defined for strings as well as integers and it works exactly as expected for toSortedList in the example code I’ve provided, but throws some syntax error in collectFile:

        "B1,0001,c3",
        "B1,0001,c4",
    )
    .toSortedList { a, b -> a <=> b } .flatten().view()
 // this closure works ^^^^^^^^^ just fine with strings in toSortedList
    .collectFile(s

Your documentation for collectFile( sort: says:

A custom sorting criteria can be specified with a [Closure] or a [Comparator] object.

What is the exact syntax of specifying a closure or comparator to sort:?

First, I need to understand what you’re trying to do.

Second, the syntax is not necessarily incorrect. It just depends on what you want to do, which is not clear to me.

After explaining how you expect the strings to be sorted (with some examples), I’d like to understand why you’re sorting twice.

For example, this groovy code in nextflow console works:

csv = [        "A1,0000,c0",
        "A1,0000,c1",
        "A1,0000,c2",
        "A1,0000,c3",
        "A1,0000,c4",
        "B1,0000,c0",
        "B1,0000,c1",
        "B1,0000,c2",
        "B1,0000,c3",
        "B1,0000,c4",
        "A1,0001,c0",
        "A1,0001,c1",
        "A1,0001,c2",
        "B1,0001,c3",
        "B1,0001,c4",]
        
csv.each { println it }

println '=============== now sort strings with closure ================='

csv.sort {t1, t2 -> tt1 = t1.tokenize(','); tt2 = t2.tokenize(',')
          tt1[2] <=> tt2[2] ?: tt1[0] <=> tt2[0] ?: tt1[1] <=> tt2[1] } .each { println it }
groovy> csv = [        "A1,0000,c0", 
groovy>         "A1,0000,c1", 
groovy>         "A1,0000,c2", 
groovy>         "A1,0000,c3", 
groovy>         "A1,0000,c4", 
groovy>         "B1,0000,c0", 
groovy>         "B1,0000,c1", 
groovy>         "B1,0000,c2", 
groovy>         "B1,0000,c3", 
groovy>         "B1,0000,c4", 
groovy>         "A1,0001,c0", 
groovy>         "A1,0001,c1", 
groovy>         "A1,0001,c2", 
groovy>         "B1,0001,c3", 
groovy>         "B1,0001,c4",] 
groovy>          
groovy> csv.each { println it } 
groovy> println '=============== now sort strings with closure =================' 
groovy> csv.sort {t1, t2 -> tt1 = t1.tokenize(','); tt2 = t2.tokenize(',') 
groovy>           tt1[2] <=> tt2[2] ?: tt1[0] <=> tt2[0] ?: tt1[1] <=> tt2[1] } .each { println it } 
 
A1,0000,c0
A1,0000,c1
A1,0000,c2
A1,0000,c3
A1,0000,c4
B1,0000,c0
B1,0000,c1
B1,0000,c2
B1,0000,c3
B1,0000,c4
A1,0001,c0
A1,0001,c1
A1,0001,c2
B1,0001,c3
B1,0001,c4
=============== now sort strings with closure =================
A1,0000,c0
A1,0001,c0
B1,0000,c0
A1,0000,c1
A1,0001,c1
B1,0000,c1
A1,0000,c2
A1,0001,c2
B1,0000,c2
A1,0000,c3
B1,0000,c3
B1,0001,c3
A1,0000,c4
B1,0000,c4
B1,0001,c4
Result: [A1,0000,c0, A1,0001,c0, B1,0000,c0, A1,0000,c1, A1,0001,c1, B1,0000,c1, A1,0000,c2, A1,0001,c2, B1,0000,c2, A1,0000,c3, B1,0000,c3, B1,0001,c3, A1,0000,c4, B1,0000,c4, B1,0001,c4]

I think that’s what you’re looking for:

workflow {
    channel.of(
        "A1,0000,c0",
        "A1,0000,c1",
        "A1,0000,c2",
        "A1,0000,c3",
        "A1,0000,c4",
        "B1,0000,c0",
        "B1,0000,c1",
        "B1,0000,c2",
        "B1,0000,c3",
        "B1,0000,c4",
        "A1,0001,c0",
        "A1,0001,c1",
        "A1,0001,c2",
        "B1,0001,c3",
        "B1,0001,c4",
    )
    .collectFile(seed: "WELL,TILE,CYCLE", name: 'tile-well-list.csv', newLine: true,
      sort: { it -> it.tokenize(',')[2] }
    )
}
tree work

Output file:

WELL,TILE,CYCLE
A1,0000,c0
B1,0000,c0
A1,0001,c0
A1,0000,c1
B1,0000,c1
A1,0001,c1
A1,0000,c2
B1,0000,c2
A1,0001,c2
A1,0000,c3
B1,0000,c3
B1,0001,c3
A1,0000,c4
B1,0000,c4
B1,0001,c4

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.