Limma - RNA-Seq DE genes - Quantile normalized log transformed RPKM data
0
0
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia
Dear Avinash, Briefly, you have used design <- model.matrix(~0+Group) when you needed design <- model.matrix(~Group) However, I do not recommend a limma analysis of RPKM values, because it does not respect the mean-variance relationship inherent in the count data. Nicole Cloonan's excellent Nature Methods paper was written four years ago, and our understanding of the statistical analysis RNA-Seq data has moved on considerably since then. Please see the last case study in the limma User's Guide, which analyses the Nigerian HapMap data, for how I recommend limma be used to analyse RNA-Seq data. Best wishes Gordon > Date: Fri, 2 Dec 2011 10:31:26 -0600 > From: Avinash S <avins.s at="" googlemail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] Limma - RNA-Seq DE genes - Quantile normalized log > transformed RPKM data > > Hello All, > > I'm trying to understand the method of differential expression analysis > described in : > * > * > *Stem cell transcriptome profiling via massive-scale mRNA sequencing* > *Nicole Cloonan et al* > *NATURE METhODS | VOL.5 NO.7 | JULY 2008 | 613* > http://www.nature.com/nmeth/journal/v5/n7/abs/nmeth.1223.html > *Section: Statistical Analysis * > To calculate differential expression of SQRL tag data we analyzed the > normalized gene signals (tags per Refseq transcript, length- normalized) for > each library using the Limma package in R. After Quantile normalization, we > used Limma to fit a linear model to the log2-transformed data using an > empirical Bayes method32 to moderate standard errors. Differentially > expressed genes were defined as those with a B statistic > zero. > > I have quantile normalized log2-transformed RPKM data and I wanted to find > DE genes based on B statistic and log2foldchange. > *Sample Rawdata:* > > GeneModel CON1 CON2 TR1 TR2 > 1s00200.1 2.723945276 3.775939811 3.623211245 3.717795434 > 1s00210.1 4.999354495 3.738129253 3.268468778 3.822220668 > 1s00220.1 1.450861588 1.265013193 0.942706046 0.744551693 > > I'm using the following R-script: > > library(limma) > raw.data <- > read.delim("INPUT-QuantileNormalizedLog2Transformed-RPKM-Data.txt") > attach(raw.data) > names(raw.data) > d <- raw.data[, 2:5] > rownames(d) <- raw.data[, 1] > Group <- factor(c("CON","CON","MYC","MYC"), levels=c("CON","MYC")) > design <- model.matrix(~0+Group) > colnames(design) <- c("CON","MYC") > fit <- lmFit(d, design) > fit <- eBayes(fit) > *Warning message:* > *Zero sample variances detected, have been offset* > topTable(fit,coef=2,number=1000,genelist=fit$genes) > - THIS LISTS THE GENES ALL POSITIVE LOGFC VALUES - NON -NEGATIVE. > > R version 2.14.0 (2011-10-31) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.10.0 > > > Can someone please let me know what I'm doing wrong here. Is it in the > array design or input data? > > Thank you, > Avinash ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
HapMap limma HapMap limma • 3.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6