Question

How to analyze pre-processed RNA-Seq data from GEO based on the deseq R package for filtering and DE analysis

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 14 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Community,

based on a validation project, I have downloaded some processed RNA-Seq data from GEO, as I would like to test very quickly, if a specific gene signature, is found:

1) Expressed above a minimal threshold 2) Differentially expressed between cancer and normal samples

The relative link for the processed dataset is the following:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60052

A very small import to R:

dataset <- read.csv("GSE60052_79tumor.7normal.normalized.log2.data.Rda.tsv", sep = "\t", header = T, row.names = 1, check.names = F)

 head(dataset)
                  11A       12A      13A       14A      15A       16A       17A       18A
    5S_rRNA  0.000000  0.000000 0.000000  0.000000 0.000000  0.000000  0.000000  5.402481
    7SK     10.828115 11.803973 9.608837 10.419233 9.859020 11.048513 10.533401 10.255479
    A1BG     5.615586  7.337469 3.867909  1.772974 5.634852  5.493924  5.964345  0.000000
    A1CF     0.000000  0.000000 0.000000  0.000000 0.000000  0.000000  0.000000  3.080553
    A2LD1    1.952621  0.000000 3.924493  0.000000 0.000000  0.000000  4.716418  5.080553
    A2M      6.827090  5.587447 7.393978  5.473414 6.300433  5.567925  5.964345  9.124947

The only description I found for the analysis of the RNA-Seq data from the relative paper was the following:

"For RNASeq data, the average read count per mate was 50 million. RNA reads were mapped to the human genome (UCSC hg19; Feb 2009 release; Genome Reference Consortium GRCh37) using TopHat2 (v2.0.9) and the human reference gtf annotation file (GRCh37.68). Transcript counts were calculated and normalized using htseq-count and DESeq (v1.12.1). The DESeq negative binomial distribution was used to calculate the p-value and fold changes between 48 lung tumor and 6 normal adjacent lung samples using adjusted p<0.05 and |fold change|>2 as a threshold"

My questions are the following:

1) Based on the above processed data with DESeq, can i perform initially a simple expression filtering analysis ? based on a log2 expression cutoff ? similarly like microarrays ? It is different by the newer DESeq2 versions ?

2) Can I directly use the processed data for DE analysis ? Or the more appropriate way would be to analyze from fastq files ?

Best,

Efstathios

deseq rna-seq filtering DE deseq2 • 2.5k views

ADD COMMENT • link updated 4.7 years ago by Michael Love 43k • written 4.7 years ago by svlachavas ▴ 840

score 1 · Answer 1 · 2020-04-01

1

Entering edit mode

ATpoint ★ 4.6k

@atpoint-13662

Last seen 22 hours ago

Germany

You could use the signed -log10(nominal p-values) as ranking metric and perform GSEA with the gene signature you have as gene set. From what I understand you have the statistics from the original DESeq output? On Bioc the fgsea package is helpful for this (among many others). This does not require any custom filtering as GSEA takes the full expression profile as input (represented by the ranking metric).

ADD COMMENT • link 4.7 years ago ATpoint ★ 4.6k

0

Entering edit mode

Dear ATpoint, thanks for your suggestion-i will check fgsea-but my initial target is to further narrow down this gene signature by checking which genes are expressed based on this dataset and/or DE, and then apply functional analysis.

ADD REPLY • link 4.7 years ago svlachavas ▴ 840

score 1 · Answer 2 · 2020-04-01

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

Use of DESeq2 would be on original counts, where the column sum equals the number of fragments aligned to the genes.

ADD COMMENT • link 4.7 years ago Michael Love 43k

0

Entering edit mode

Dear Michael,

thanks for pointing this out !! Regarding the above transformation mentioned- that is log2 transformation and normalization by an older version of the original deseq algorithm-you think that any filtering or de analysis could be still applied ? for example even some z-scores ?

ADD REPLY • link 4.7 years ago svlachavas ▴ 840

1

Entering edit mode

This input data isn’t appropriate for DESeq2, which is what I provide user support for.

ADD REPLY • link 4.7 years ago Michael Love 43k