Getting the file baseName depending on the file extension

It’s a common need in scripting is to want to retrieve the base name of a file, i.e. the name of the file without the extension. This is often used, for example, as a prefix to name other files you produce. This can be done using the file object getBaseName() function in Nextflow.

However, it’s also common that tools can read both plain text or compressed inputs by auto-detecting the file extension. However this can make scripting more cumbersome as getBaseName() only removes the last file extension.
e.g.

file("sampleA.ext").getBaseName()    -> "sampleA"
file("sampleA.ext.gz").getBaseName() -> "sampleA.ext"

The getBaseName() function, however can also take an integer argument, which is the number of extensions to remove.

file("sampleA.ext").getBaseName(1)    -> "sampleA"
file("sampleA.ext.gz").getBaseName(2) -> "sampleA"

Using a test on the name, we can then make this test dynamic based on the input to automatically get the correct base name.

myfile.getBaseName(myfile.name.endsWith('.gz')? 2: 1)

which uses a ternary operator to return 2 if the filename ends with ‘.gz’, otherwise it returns 1.

4 Likes

There is also the very useful getSimpleName(), which removes all file extensions.

1 Like

Good point. This really depends on how many . user’s use in their inputs.
getSimpleName() strips everything after the first period.

file("sampleA.deduped.normalised.fastq.gz").getSimpleName() -> "sampleA"
file("sampleA.deduped.normalised.fastq.gz").getBaseName(2) -> "sampleA.deduped.normalised"
1 Like