How to print files names after collect in a process to a file

Hi there,
I’m interested in printing all the files names generated after collect in a file.

process print_value {

publishDir path: "/data1/users/nextflow/learn_nextflow/files_created/", mode: 'copy' 
    input:
    val (temp_file)

    output:
    stdout

    script:
    """
    echo "$temp_file"  
    """
}

process file_create {

publishDir path: "/data1/users/nextflow/learn_nextflow/files_created/", mode: 'copy' 
    input:
    val (temp_number)

    output:
    path("${temp_number}.txt")

    script:
    """
    echo "value is $temp_number" > $temp_number".txt"
    """

}

workflow {

numbers=Channel.of(1, 2, 3, 40, 50)

ch_temp_path=file_create(numbers).collect()
}

I’d like to print the files created in the work directory to a file.
For e.g.:

[/mnt/data1/users/nextflow/learn_nextflow/work/f9/085a49b37b3b75cd1cc5b031d68f8c/2.txt, /mnt/data1/users/nextflow/learn_nextflow/work/2d/0cfcebbee64cfc8563760ce62c7fe6/3.txt, /mnt/data1/users/nextflow/learn_nextflow/work/cd/236ea42f7246aea4b5b6f5e136f3e6/50.txt, /mnt/data1/users/nextflow/learn_nextflow/work/18/0a9d8f8131b5011a9c9a6fb0f84ef2/1.txt, /mnt/data1/users/nextflow/learn_nextflow/work/43/d5dfbf7e6e9b91af13afc831b98a8e/40.txt]

I want to print all these names in a list.txt

I’ll use this list.txt eventually using a bash script.

How do I code it?

You can use the collectFile channel operator. It writes the content of a channel to an output file. Read more about it here.

@mribeirodantas
Thanks.
How do I do what here?
Sorry, I didn’t understand how to use it after collect. or how to use it after passing the collected files.

Hey @complexgenome

There are examples with collectFile in the link I shared. Did you try running those examples? Did you try writing your version based on those examples? In what form is your adapted code version, and what error is it giving to you?

@mribeirodantas
Yes, I’m unable to write code for it thus I asked how do I do what here.
Nevertheless, I tried to write code, where I’m unable to understand how to proceed.


workflow {

numbers=Channel.of(1, 2, 3, 40, 50)

file_create(numbers).collect(flat: false).collectFile(name: 'sample.txt', newLine: true)
    .subscribe {
        println "\n Entries are saved to file: $it \n"
        //println "File content is: ${it.text}"
    }

}


I tried to find the “sample.txt” couldn’t find it in the work directory. Or, how do I access it in the nextflow?

Second the print overwrite the output. Please see attached screenshot.

@mribeirodantas
Can you please help?

Hi @complexgenome. I’m sorry you’re having a difficult time writing Nextflow code. Unfortunately, there’s only so much time our team can dedicate to individual users. It’s important for the good of the community that we prioritize work that helps everyone, like developing new training materials and documentation. Because of that, I’m not going to be able to answer all of your questions in as much detail as you want. It sounds like maybe you need to spend a bit more time going through the educational resources, or maybe find a collaborator in your local network who could help you.

You can find a lot of resources that will help you at https://docs.seqera.io (check the bottom of the page).

@mribeirodantas
I understand. I’m in the last leg of the pipeline therefore seeking help.

I tried what I could and shared the code. Unfortunately, it is counter intuitive to me on how to proceed further.

I’m sorry that I find Nextflow complicated for me and to work through. Since the pipeline is an in-house thing it’s difficult to craft through the documentation, examples as the case is usually unique.
I’ve come a long way starting from scratch in nextflow until here.

Thank you for your help and replies, as always.

1 Like

collectFile allows you to concatenate a list of text files into a single text file. But if you want to save the file paths to a file, like an index file, you’ll need to do that manually.

I suggest you pass the list of paths into an exec process and write the paths to an output file. Here are some relevant docs to help you along:

1 Like

@bentsherman
Thank you for your attention to my post.

I’m sorry as I’m not intelligent for nextflow’s advanced system. I’m to create pipeline in nextflow thus I use this platform to troubleshoot.
The tutorials/links are fine for a small example, however, they do not help with unique or some specific pipeline example.

For example:

process file_create {

publishDir path: "/data1/users/nextflow/learn_nextflow/files_created/", mode: 'copy' 
    input:
    val (temp_number)

    output:
    tuple val (temp_number),path("${temp_number}.txt"), emit: temp_file

    script:
    """
    echo "value is $temp_number" > $temp_number".txt"
    """

}

workflow {

numbers=Channel.of(1, 2, 3, 40, 50)
file_create(numbers).collectFile(name: 'sample.txt', newLine: true)

}

In this case the sample.txt isn’t generated.

However, if I remove “val(temp_number)” from the output of file_create process this sample.txt file is generated.

Sadly, I’m stuck at this particular step for over a week and none of the examples cover such situations, or, so to say I’m not able to build on top of the existing example.

Further, is there a way to pass this “sample.txt” around different processes?

Hey @complexgenome.

You need to extract the path from the tuple in your output channel before passing it to collectFile. See the snippet below:

process FOO {
  input:
  val x

  output:
  tuple val("${x}"), path("${x}.txt")

  script:
  """
  echo "Oi" > ${x}.txt
  """
}

workflow {
  Channel
    .of(1..5)
    | FOO
  FOO
    .out
    .map { it[1].name }
    .collectFile(name: "sample.txt", newLine: true)
}

As the operation occurs outside a process, you won’t find the file in a task directory, but in the temporary directory. Example below:

If you want another process to work on this file, you don’t need to know where it is located. Simply set to a channel and provide this channel as input to the other process. If you want to save it to a diffeent location, you can se this with the collectFile option storeDir.