Question

Limma - RNA-Seq DE genes - Quantile normalized log transformed RPKM data

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Dear Avinash, Briefly, you have used design <- model.matrix(~0+Group) when you needed design <- model.matrix(~Group) However, I do not recommend a limma analysis of RPKM values, because it does not respect the mean-variance relationship inherent in the count data. Nicole Cloonan's excellent Nature Methods paper was written four years ago, and our understanding of the statistical analysis RNA-Seq data has moved on considerably since then. Please see the last case study in the limma User's Guide, which analyses the Nigerian HapMap data, for how I recommend limma be used to analyse RNA-Seq data. Best wishes Gordon > Date: Fri, 2 Dec 2011 10:31:26 -0600 > From: Avinash S <avins.s at="" googlemail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] Limma - RNA-Seq DE genes - Quantile normalized log > transformed RPKM data > > Hello All, > > I'm trying to understand the method of differential expression analysis > described in : > * > * > *Stem cell transcriptome profiling via massive-scale mRNA sequencing* > *Nicole Cloonan et al* > *NATURE METhODS | VOL.5 NO.7 | JULY 2008 | 613* > http://www.nature.com/nmeth/journal/v5/n7/abs/nmeth.1223.html > *Section: Statistical Analysis * > To calculate differential expression of SQRL tag data we analyzed the > normalized gene signals (tags per Refseq transcript, length- normalized) for > each library using the Limma package in R. After Quantile normalization, we > used Limma to fit a linear model to the log2-transformed data using an > empirical Bayes method32 to moderate standard errors. Differentially > expressed genes were defined as those with a B statistic > zero. > > I have quantile normalized log2-transformed RPKM data and I wanted to find > DE genes based on B statistic and log2foldchange. > *Sample Rawdata:* > > GeneModel CON1 CON2 TR1 TR2 > 1s00200.1 2.723945276 3.775939811 3.623211245 3.717795434 > 1s00210.1 4.999354495 3.738129253 3.268468778 3.822220668 > 1s00220.1 1.450861588 1.265013193 0.942706046 0.744551693 > > I'm using the following R-script: > > library(limma) > raw.data <- > read.delim("INPUT-QuantileNormalizedLog2Transformed-RPKM-Data.txt") > attach(raw.data) > names(raw.data) > d <- raw.data[, 2:5] > rownames(d) <- raw.data[, 1] > Group <- factor(c("CON","CON","MYC","MYC"), levels=c("CON","MYC")) > design <- model.matrix(~0+Group) > colnames(design) <- c("CON","MYC") > fit <- lmFit(d, design) > fit <- eBayes(fit) > *Warning message:* > *Zero sample variances detected, have been offset* > topTable(fit,coef=2,number=1000,genelist=fit$genes) > - THIS LISTS THE GENES ALL POSITIVE LOGFC VALUES - NON -NEGATIVE. > > R version 2.14.0 (2011-10-31) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.10.0 > > > Can someone please let me know what I'm doing wrong here. Is it in the > array design or input data? > > Thank you, > Avinash ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

HapMap limma HapMap limma • 2.7k views

ADD COMMENT • link 12.4 years ago Gordon Smyth 50k