Limma - RNA-Seq DE genes - Quantile normalized log transformed RPKM data
1
0
Entering edit mode
Avinash S ▴ 40
@avinash-s-4979
Last seen 9.6 years ago
Hello All, I'm trying to understand the method of differential expression analysis described in : * * *Stem cell transcriptome profiling via massive-scale mRNA sequencing* *Nicole Cloonan et al* *NATURE METhODS | VOL.5 NO.7 | JULY 2008 | 613* http://www.nature.com/nmeth/journal/v5/n7/abs/nmeth.1223.html *Section: Statistical Analysis * To calculate differential expression of SQRL tag data we analyzed the normalized gene signals (tags per Refseq transcript, length- normalized) for each library using the Limma package in R. After Quantile normalization, we used Limma to fit a linear model to the log2-transformed data using an empirical Bayes method32 to moderate standard errors. Differentially expressed genes were defined as those with a B statistic > zero. I have quantile normalized log2-transformed RPKM data and I wanted to find DE genes based on B statistic and log2foldchange. *Sample Rawdata:* GeneModel CON1 CON2 TR1 TR2 1s00200.1 2.723945276 3.775939811 3.623211245 3.717795434 1s00210.1 4.999354495 3.738129253 3.268468778 3.822220668 1s00220.1 1.450861588 1.265013193 0.942706046 0.744551693 I'm using the following R-script: library(limma) raw.data <- read.delim("INPUT-QuantileNormalizedLog2Transformed-RPKM-Data.txt") attach(raw.data) names(raw.data) d <- raw.data[, 2:5] rownames(d) <- raw.data[, 1] Group <- factor(c("CON","CON","MYC","MYC"), levels=c("CON","MYC")) design <- model.matrix(~0+Group) colnames(design) <- c("CON","MYC") fit <- lmFit(d, design) fit <- eBayes(fit) *Warning message:* *Zero sample variances detected, have been offset* topTable(fit,coef=2,number=1000,genelist=fit$genes) - THIS LISTS THE GENES ALL POSITIVE LOGFC VALUES - NON -NEGATIVE. R version 2.14.0 (2011-10-31) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.10.0 Can someone please let me know what I'm doing wrong here. Is it in the array design or input data? Thank you, Avinash [[alternative HTML version deleted]]
limma limma • 2.3k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 16 minutes ago
WEHI, Melbourne, Australia

Dear Avinash,

Briefly, you have used

design <- model.matrix(~0+Group)

when you needed

design <- model.matrix(~Group)

However, I do not recommend a limma analysis of RPKM values, because it does not respect the mean-variance relationship inherent in the count data. Nicole Cloonan's excellent Nature Methods paper was written four years ago, and our understanding of the statistical analysis RNA-Seq data has moved on considerably since then. Please see the last case study in the limma User's Guide, which analyses the Nigerian HapMap data, for how I recommend limma be used to analyse RNA-Seq data.

Best wishes
Gordon

PS. This answer was originally posted 4 December 2011: Limma - RNA-Seq DE genes - Quantile normalized log transformed RPKM data

ADD COMMENT

Login before adding your answer.

Traffic: 555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6