Dear Community,
based on a validation project, I have downloaded some processed RNA-Seq data from GEO, as I would like to test very quickly, if a specific gene signature, is found:
1) Expressed above a minimal threshold 2) Differentially expressed between cancer and normal samples
The relative link for the processed dataset is the following:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60052
A very small import to R:
dataset <- read.csv("GSE60052_79tumor.7normal.normalized.log2.data.Rda.tsv", sep = "\t", header = T, row.names = 1, check.names = F)
head(dataset)
11A 12A 13A 14A 15A 16A 17A 18A
5S_rRNA 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 5.402481
7SK 10.828115 11.803973 9.608837 10.419233 9.859020 11.048513 10.533401 10.255479
A1BG 5.615586 7.337469 3.867909 1.772974 5.634852 5.493924 5.964345 0.000000
A1CF 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 3.080553
A2LD1 1.952621 0.000000 3.924493 0.000000 0.000000 0.000000 4.716418 5.080553
A2M 6.827090 5.587447 7.393978 5.473414 6.300433 5.567925 5.964345 9.124947
The only description I found for the analysis of the RNA-Seq data from the relative paper was the following:
"For RNASeq data, the average read count per mate was 50 million. RNA reads were mapped to the human genome (UCSC hg19; Feb 2009 release; Genome Reference Consortium GRCh37) using TopHat2 (v2.0.9) and the human reference gtf annotation file (GRCh37.68). Transcript counts were calculated and normalized using htseq-count and DESeq (v1.12.1). The DESeq negative binomial distribution was used to calculate the p-value and fold changes between 48 lung tumor and 6 normal adjacent lung samples using adjusted p<0.05 and |fold change|>2 as a threshold"
My questions are the following:
1) Based on the above processed data with DESeq, can i perform initially a simple expression filtering analysis ? based on a log2 expression cutoff ? similarly like microarrays ? It is different by the newer DESeq2 versions ?
2) Can I directly use the processed data for DE analysis ? Or the more appropriate way would be to analyze from fastq files ?
Best,
Efstathios
Dear ATpoint, thanks for your suggestion-i will check fgsea-but my initial target is to further narrow down this gene signature by checking which genes are expressed based on this dataset and/or DE, and then apply functional analysis.