Catching matlab output

I am having trouble getting the output of a matlab script to be found by my pipeline.

My process.nf:

process simpletest {
  input:
    path fileName
  output:
    path '*.txt'
"""
#!/usr/bin/bash
matlab -nodisplay -nosplash -nodesktop -r "content='filler'; run('/scratch/allocation/kd/bin/matlabtest.m'); exit;"
"""
}

workflow {
  files = Channel.fromPath(
    "$baseDir/datafolder/**/*_epo.mat",
    relative:false
  )
  files | simpletest
}

Which calls matlabtest.m:

content
save('testsave.txt', 'content', '-ascii')
%sends 0 exit code back to nextflow
quit(0)

I want to run simpletest() on all the files in the files Channel. This works, matlab is called and the script runs, but matlab is changing its working directory to the location of matlabtest.m and thatā€™s where the testsave.txt is being saved. It must because the working directory has been changed out of work/ā€¦ that the output file isnā€™t found.

[oberg@login02 kd]$ nextflow run processdb.nf -resume
Nextflow 24.10.2 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 24.10.1

Launching `processdb.nf` [distracted_visvesvaraya] DSL2 - revision: bf32395646

[-        ] simpletest -
executor >  local (1)                                           executor >  local (1)
[5a/9f7820] simpletest (1) | 0 of 1

executor >  local (1)
[5a/9f7820] simpletest (1) | 1 of 1, failed: 1 āœ˜
ERROR ~ Error executing process > 'simpletest (1)'

Caused by:
  Missing output file(s) `*.txt` expected by process `simpletest (1)`


Command executed:

  #!/usr/bin/bashkb2g-1/
  matlab -nodisplay -nosplash -nodesktop -r "content='filler'; run('/scratch/allocationcode/kd/bin/matlabtest.m'); exit;"

Command exit status:
  0

Command output:

                              < M A T L A B (R) >
                    Copyright 1984-2023 The MathWorks, Inc.
               R2023b Update 4 (23.2.0.2428915) 64-bit (glnxa64)
                                October 23, 2023


  To get started, type doc.
  For product information, visit www.mathworks.com.


  content =

      'filler'

Work dir:
  /scratch/allocationcode/kd/work/5a/9f78207bdf2365a3655f7ca04fcc91

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

Is there a way to give matlab the path to the work/ directory?

Hello @Martin_Oberg. Welcome to the community forum :slight_smile:

Ideally, youā€™ll use a package manager (such as conda) or a container technology (such as docker) to handle task dependencies.

If you want to stick with a custom script, you should place matlabtest.m (with the appropriate shebang) within the bin folder in your project folder (read more about it here). Nextflow will automatically stage this file in the task compute environment and everything else should work just fine :wink:.

Thank you for the reply!

Unfortunately, Iā€™ve been down the ā€œexecute a matlab script with a shebangā€ rabbit hole and moved on since I was getting some results with the above posted method.

But here are the results of your suggestion:

process shebangtest {
  input:
  path fileName
  output:
  path '*.txt'
  """
#!/usr/bin/env matlab
content=123
% saving ascii code numbers, okay for a test
save('testsave.txt', 'content', "-ascii")
quit(1)
  """
}

outputs:

[oberg@login02 kd]$ nextflow run processdb.nf -resume
Nextflow 24.10.2 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 24.10.1

Launching `processdb.nf` [kickass_cajal] DSL2 - revision: 2ec86ac572

[-        ] shebangtest -
[-        ] shebangtest | 0 of 1                                executor >  local (1)                                           executor >  local (1)
[6d/e75f13] sheā€¦ngtest (1) | 0 of 1
executor >  local (1)
[6d/e75f13] sheā€¦ngtest (1) | 1 of 1, failed: 1 āœ˜
ERROR ~ Error executing process > 'shebangtest (1)'

Caused by:                                                        Missing output file(s) `*.txt` expected by process `shebangtest (1)`


Command executed:

  #!/usr/bin/env matlab
  content=123
  % saving ascii code numbers, okay for a test
  save('testsave.txt', 'content', "-ascii")
  quit(1)

Command exit status:
  0

Command output:
  MATLAB is selecting SOFTWARE OPENGL rendering.

                              < M A T L A B (R) >
                    Copyright 1984-2023 The MathWorks, Inc.
               R2023b Update 4 (23.2.0.2428915) 64-bit (glnxa64)
                                October 23, 2023


  To get started, type doc.
  For product information, visit www.mathworks.com.


Work dir:
  /scratch/st-akb2g-1/kd/work/6d/e75f139195df41caa8d5ff31e027a5

No file is written, or logging of content=123

Apparently (in 2011), matlab didnā€™t have shebang support. See: Matlab Daemon - File Exchange - MATLAB Central. I havenā€™t tried this yet ā€“ it seemed outdated and lacking positive affirmations. Which is why Iā€™ve been working with running matlab from a shell command.

I havenā€™t used containers before, but I can learn if that will solve this problem. Itā€™s bizarre that with all my searching online I havenā€™t seen a single example of someone using nextflow with matlab. Does they just not work together due to an internal of the matlab interpreter? If that above Daemon solution seems like a good workaround I can try that too.

@mribeirodantas So I came up with a workaround. This has bash call matlab and passes $(pwd) to the program call so the script knows where to save to. The absolute path to the script is necessary ā€“ or hā€¦/ā€¦/ā€¦/bin/matlabtest.m works too.

process simpletest {
  input:
    path fileName
  output:
    path '*.txt'
"""
#!/usr/bin/bash
matlab -batch "folder='\$(pwd)'; content='$fileName'; run('/scratch/st-akb2g-1/kd/bin/matlabtest.m'); exit;"
"""
}

with matlabtest.m:

content                                                                                                                                                                                                                            % saving ascii code numbers, okay for a test
save([folder '/testsave.txt'], 'content', "-ascii")
quit(0)

Things are working as desired now with testsave.txt being saved in the cached work/ directory.

Are there any unintended consequences lurking around with this approach? It feels weird having to use $(pwd) like this.

(edit: the matlabtest.m file has always been in the the bin/ directory. I think that since the Process script doesnā€™t use a matlab shebang, that thatā€™s why it doesnā€™t get found.)

Are there any unintended consequences lurking around with this approach?

Yes. I think this prevents portability. The reason we should let Nextflow handle paths is because Nextflow will know when and how to change those depending on where the task is being run. HPC managed by SLURM? Torque? Cloud with AWS? Azure? Locally? And so on.

Itā€™s not clear to me if the way youā€™re doing it, passing the current working directory to MATLAB, will work across different platforms.

Thank you for your insights, @mribeirodantas.

For the time being, Iā€™m going to stick with this method. We are a University research lab and portability is not the number one concern - just being able to replicate past studies. We will be running the jobs on our HPC with Slurm, and donā€™t have plans to move the analysis anywhere else. Nextflow will simplify keeping track of parameter inputs our analysis uses, and for that it will be a huge help.

I found this resource: patterns/docs/process-get-workdir.md at 8341e94a61feaa2263fdd83ddf6287b54aebde20 Ā· nextflow-io/patterns Ā· GitHub which uses the solution I came up with. So itā€™s nice to see it suggested elsewhere.

1 Like