Needing help on how to download a suitable dataset

Hello everyone, I went through the foundational training, and I want to make a project for practice purposes, I am kind of lost on how to download a dataset that is suitable for practice, ideally I would prefer to build a pipeline that treat cancer data, I would also love to get a truth dataset so I can know if my pipeline work correctly

Hi @Obscure_byteX ! Welcome to the Seqera Community Forum :slight_smile:

The nf-core project provides plenty of data for testing your pipeline to make sure it works before going to a real full dataset. You can find instructions here. If you want real, complete and public data, you will need to be more specific so that I can try to point you in the right direction.

What type of data are we talking about? Whole Genomic Sequencing (WGS)? RNAseq? ATAC-seq? Other?

2 Likes

Hi! Thanks for the welcome :slightly_smiling_face:

I’m specifically looking for cell-free DNA (cfDNA) sequencing data, ideally from whole genome sequencing (WGS). My goal is to get one cfDNA dataset along with its corresponding truth set (VCF + BED of confident regions) if possible, so I can test variant calling pipelines in a controlled way.

Do you know of any public datasets that fit this description?