Escaping strings to prevent unexpected interpolation or code injection

Nextflow does not do anything special with strings when they are interpolated into scripts. This makes it a vector for code injection, in the worst case, or at the least, potentially unexpected output when strings contain special characters, etc.

For shell scripting, at least, Apache Commons contains functionality to perform escaping. This can be leveraged in Nextflow like so:

@Grab(group='org.apache.commons', module='commons-text', version='1.11.0')
import org.apache.commons.text.StringEscapeUtils

process escapeInput {
    input: val(unescaped)
    output: stdout

    script:
        escaped = StringEscapeUtils.escapeXSI(unescaped as String)

        """
        printf "Unescaped: [%s]\n" "${unescaped}"
        printf "Escaped:   [%s]\n" "${escaped}"
        """
}

workflow {
    // WARNING Code injection vector; use responsibly!
    Channel.of('"; echo INJECTION; #')
    | escapeInput
    | view
}

The output of this both demonstrates the problem – INJECTION is written to stdout – and the fix, with the escaped string:

N E X T F L O W  ~  version 22.10.6
Launching `./main.nf` [goofy_ekeblad] DSL2 - revision: 4243c4502c
executor >  local (1)
[72/405e25] process > escapeInput (1) [100%] 1 of 1 ✔
Unescaped: []
INJECTION
Escaped:   ["\;\ echo\ INJECTION\;\ \#]

Hi, @Xophmeister ! Thanks for bringing this up. Could you please open an issue in the GitHub repository here?

Of course, the challenge here is Nextflow is doing code injection; we want to run arbitrary code within the process to complete the workflow! Nextflow manages this by deferring to your infrastructure and your process will only have the permissions you grant it, but I’m sure we can refine this and reduce the scope to make this better.

Of course, for most use cases you’d want to sanitise the value, so adding a safe directive or operator makes sense but will need a way of turning it off.