Fail in searching a tsv in a multiqc plugin

I am trying to develop a multiqc plugin to process a TSV file containing data from a PCA analysis. I want to retrieve the sample and score data from the file to create a scatterplot.

But when I configure the custom_code.py file to search for the *tsv pattern, the plugin is recognized and the custom_code.py file is run, but the test.tsv file is not recognized when searching.

    log.info("Running PCAplots MultiQC Plugin v{}".format(config.PCAplots_version))

    search_patterns = {
        'PCAplots/tsv_file': { 'fn': '*.tsv' }
    }

    # Add to the search patterns used by modules
    for pattern_name, pattern in search_patterns.items():
        if pattern_name not in config.sp:
            config.update_dict( config.sp, { pattern_name: pattern } )
            log.debug("Added {} to the search patterns".format(pattern_name))
            log.info("Added {} to the search patterns".format(pattern_name))

Here is a part of the code that I used in the custom_code.py

If you have any suggestions to solve this, I would appreciate it.

Many thanks.

Hi @FabianAndradeLozano,

Before we get into the weeds, is there a reason that you can’t use Custom Content? For a scatter plot like this, it’s usually enough to just rename the files to *_mqc.tsv

If you need a script though, would it be possible for you to create a minimal example with some test files for us to try to replicate? It sounds like a simple enough setup, it should be entirely possible.

Thanks,

Phil

Hi Phil,

Thank you for your recommendation and answer.
I decided to use a plugin because I want to create several tabs on the same scatter plot and could not do it with custom content.
Finally, I achieve the detection and processing of the tsv file by adding the following code to the multiqc_config.yaml file:

sp:
  PCAplots/PCA_data:
  - fn: "*data.tsv"

But I am afraid, I didn’t configure properly the custom_code.py, because the recognition patterns don’t work without this config file, no files are processed if I omit this config. Eventually, I want to add more recognition patterns to create a histogram plot of the PCA variance.

Here is the code for custom_code.py where I define the search of the patterns and update the config.sp dictionary to include this pattern:

def PCAplots_plugin_execution_start():
    """ Code to execute after the config files and
    command line flags have been parsedself.
    This setuptools hook is the earliest that will be able
    to use custom command line flags.
    """

    log.info("Running PCAplots MultiQC Plugin v{}".format(config.PCAplots_version))

    # Add to the main MultiQC config object.
    # User config files have already been loaded at this point
    #   so we check whether the value is already set. This is to avoid
    #   clobbering values that have been customised by users.

    search_patterns = {
        'PCAplots/PCA_data': { 'fn': '*data.tsv' }
    }

    # Add to the search patterns used by modules
    for pattern_name, pattern in search_patterns.items():
        if pattern_name not in config.sp:
            config.update_dict( config.sp, { pattern_name: pattern } )
            log.debug("Added {} to the search patterns".format(pattern_name))
        else:
            log.debug("Not adding {} to the search patterns as it is already set".format(pattern_name))

        

Then, once the tsv file is recognized, the PCAplots.py module creates a list of files and parses the table, parsed data is plotted using and scatterplot function.

class PCAplots(BaseMultiqcModule):

    def __init__(self):

        # Initialise the parent object
        super(PCAplots, self).__init__(
        name='PCAplots', 
        anchor='PCAplots',
        href="",
        info="is a plugin to plot several PCA data from a TSV file")

        # Find and load any PCAplots reports
        #self.PCAplots_data = dict()
        self.list_files=dict()
        for f in self.find_log_files('PCAplots/PCA_data'):
            log.info(f"Found PCAplots file: {f['s_name']}")
            # self.list_files.append(f)
            parsed_data=self.parse_pca_file(f)
            self.list_files[f['s_name']] = parsed_data
            self.pca_scatter_plot(parsed_data,f)

          
        if len(self.list_files) == 0:
            raise ModuleNoSamplesFound
        if len(self.list_files) > 0:
            log.info(f"Found {len(self.list_files)} reports")

Here is the code for the setup.py:

setup(
    name='PCAplots',
    version='0.1.0',
    packages=find_packages(),
    include_package_data=True,
    install_requires=requirements,
    keywords="multiqc PCA plots plugin",
    url="",
    license="",
    entry_points={
        "multiqc.modules.v1": [
            # Register this plugin so that MultiQC can discover it and loads de PCAplots.py module
            "PCAplots = PCAplots.modules.PCAplots:PCAplots"
        ],  # Define the entry point for the plugin, which is code in the custom_code.py
        'multiqc.hooks.v1': [
            'execution_start = PCAplots.custom_code:PCAplots_plugin_execution_start'
        ]
    },
    classifiers=[
        "Programming Language :: Python :: 3.6",
         "Topic :: Scientific/Engineering :: Bio-Informatics",
    ]
)

Here is an example of the TSV data:

PC1	PC2	PC3	PC4	group	condition	name
-0.579834073778605	0.32307138866649	8.2511740847973	0.928007574791617	K	K	S1
-30.4125668808053	1.8180569889519	-3.46115245049907	0.493115736971639	K	K	S2
3.59760666014257	3.87347391126008	-4.13400633594466	-7.66434347450737	K	K	S3
1.60397260920033	3.61563079268798	7.08380009771478	-1.99534270750999	K	K	S4
-0.424863477592924	1.10360944842521	7.62007227015581	0.433476725890417	K	K	S5
3.11592055206072	-3.91575267212451	-0.942363201098642	-3.0669774205849	K	K	S6
7.23834382284686	0.0831357557011024	-5.81240468510949	3.07692266059875	K	K	S7
6.15571111277734	13.2951302333	-1.91565311382379	4.2022364572263	K	K	S8
6.20814800688657	0.381373964657366	-5.68221472941897	0.371812903802094	K	K	S9
1.48462582230259	-9.51304056965226	-0.707657319115273	1.67003765319915	W	W	S10
2.01293584595985	-11.0646892418734	-0.299594617657986	1.5510538901223	W	W	S11

Thank you very much for your time and help !

Kind regards,

Hi @FabianAndradeLozano,

Thanks for the detailed response. I’ll add it to our board to take a look at this when we can. In the mean time, you might find the example plugin repo useful if you haven’t already seen it:

I’ve just merged a couple of PRs to update it, as it was quite old. It might still be missing one or two bits but it should give a rough template at least.

Phil