How to Analyse Datasets from GEO
1
0
Entering edit mode
@andrejstollde-23490
Last seen 3.9 years ago

Hello,

I am no bioinformatician but did a lot of reading and experimenting on RNAseq over the last 2 years and I think I developed quite some understanding about the necessary steps and possible pitfalls etc. Recently, a couple of times I found interesting datasets on gene expression omnibus (GEO) and after downloading I realized that supplied metric for gene expression was TPM. This seems to be the case with a lot of datasets on GEO. As to my understanding TPM is not a good metric when it comes to differential expression analysis. Also DESeq2 won't accept TPM as input as values are not integer. The only truly clean way I can think of for performing the analysis would be downloading raw files from sra and doing the whole QC, alignment and counting from scratch.

So my question is what would be an elegant/simple and clean way of analysing such GEO datasets?

geo rnaseq deseq2 tpm • 1.5k views
ADD COMMENT
1
Entering edit mode
Kevin Blighe ★ 3.9k
@kevin
Last seen 3 hours ago
Republic of Ireland

The tutorial by my Biostars colleague, ATpoint, is quite useful for downloading FASTQ data from ENA. I noticed recently, however, that even SRA is now hosting FASTQ files, but they can be difficult to obtain in any automated fashion. Note that a lot of studies have a record on ENA, GEO, and SRA. To find the ENA record, go by the 'BioProject ID'.

I have come across others who are content to work with TPM by transforming them to pseudo-count via log10(TPM + 1); however, as to which you imply, obtaining the FASTQs will provide the ultimate flexibility in your analysis.

Kevin

ADD COMMENT
1
Entering edit mode

During my search I also came across the suggestion of using log10(TPM + 1). Maybe one might use this approach in order to get a first glimps at the data and depeneding on that decide whether it's worth while doing the analysis from scratch.

I don't have access to a lot of computing power as I'm not a bioinformatician, so mainly simple office hardware. This is why I am trying to avoid doing the whole alignment as it takes me about 1-2h per 10 Million reads.

Thanks for your answer!

ADD REPLY

Login before adding your answer.

Traffic: 681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6