Question: Differential expression of RNA-seq data using limma and voom()
1
5.4 years ago by
Jon Bråte150
Norway
Jon Bråte150 wrote:

Hi everyone,

I have a count matrix of FPKM values and I want to estimate differentially expressed genes between two conditions. First I used DESeq2, but I realized that this is not good for FPKM values. I then transformed the counts using voom() in the limma package and then used:

fit <- lmFit(myVoomData,design)
fit <- eBayes(fit)
options(digits=3)
writefile = topTable(fit,n=Inf,sort="none", p.value=0.01)
write.csv(writefile, file="file.csv")

My problem is that all of the 6156 genes are differentially expressed (p-value 0.01). Only a few hundred were differentially expressed using DESe2, but I guess that can't be trusted.

I am new to this type of analysis, and to R, but is it ok to simply transform the data by voom()? Can I use the transformed data in DESeq2? Any other ways I can use FPKM counts to estimate differentially expressed genes?

Thank you,

Jon Bråte

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C

attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils  datasets  methods   base

other attached packages:
[1] limma_3.18.3         cummeRbund_2.4.0     Gviz_1.6.0 rtracklayer_1.22.0   GenomicRanges_1.14.3 XVector_0.2.0
[7] IRanges_1.20.6       fastcluster_1.1.11   reshape2_1.2.2 ggplot2_0.9.3.1      RSQLite_0.11.4       DBI_0.2-7
[13] BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0   BSgenome_1.30.0        Biobase_2.22.0 Biostrings_2.30.1      Formula_1.1-1
[6] GenomicFeatures_1.14.2 Hmisc_3.13-0           MASS_7.3-29 RColorBrewer_1.0-5        RCurl_1.95-4.1
[11] Rsamtools_1.14.2       XML_3.95-0.2           biomaRt_2.18.0 biovizBase_1.10.4      bitops_1.0-6
[16] cluster_1.14.4         colorspace_1.2-4       dichromat_2.0-0 digest_0.6.3          gtable_0.1.2
[21] labeling_0.2           lattice_0.20-24        latticeExtra_0.6-26 munsell_0.4.2     plyr_1.8
[26] proto_0.3-10           scales_0.2.3           splines_3.0.2 stats4_3.0.2            stringr_0.6.2
[31] survival_2.37-4        tools_3.0.2            zlibbioc_1.8.0 

----------------------------------------------------------------
Jon Bråte

Microbial Evolution Research Group (MERG)
Department of Biosciences
University of Oslo
P.B. 1066 Blindern
N-0316, Norway
Email: jon.brate@ibv.uio.no
Phone: 922 44 582
Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html

limma • 12k views
modified 2.7 years ago by Gordon Smyth37k • written 5.4 years ago by Jon Bråte150
Answer: Differential expression of RNA-seq data using limma and voom()
6
5.4 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

Dear Jon,

No, it is absolutely not ok to input FPKM values to voom. The results will be nonsense. Same goes for edgeR and DESeq, for the reasons explained in the documentation for those packages.

Note that a matrix of FPKM is not a matrix of counts.

It seems to me that you have three options in decreasing order of desirability:

1. Get the actual integer counts from which the FPKM were computed and do a proper analysis, for example using voom or edgeR.
2. Get the gene lengths and library sizes used to compute the FPKM and convert the FPKM back to counts.
3. If FPKM is really all you have, then convert the values to a log2 scale and do an ordinary limma analysis as you would for microarray data, using eBayes() with trend=TRUE.  Do not use voom, do not use edgeR, do not use DESeq.  (Do not pass go and do not collect \$200.)  This isn't 100% ideal, but is probably the best analysis available.

The third option is similar to the "limma-trend" analysis described in the limma preprint, except that it is applied to the logFPKM instead of logCPM. Statistically this will not perform as well as it would applied to the logCPM.

Best wishes
Gordon

Hello Gordon:

Are there any results that describe what happens when one switches from CPM to RPKM, TMM etc, when using limma.

Regards,

Nik

Answer: Differential expression of RNA-seq data using limma and voom()
1
5.4 years ago by
Michael Love23k
United States
Michael Love23k wrote:
hi Jon, If you are new to R and Bioconductor, please take some time to read over the vignettes that accompany every software package on Bioconductor. We try to pack them full of useful information! In addition, there is a lot more technical information available in the reference manuals. from the command line you can type, e.g. browseVignettes(package="DESeq2") On the first page of the DESeq2 vignette, we discuss why you should only use raw counts as input to our software, not rounded normalized values or FPKM values. Also to help with discussion, by 'count matrix', we refer to a matrix of non-negative integers 0,1,2,..., which were produced by counting reads or fragments, which are the units of evidence of expression in RNA-Seq. So we would avoid referring to a 'count matrix of FPKM values', because these counts have been divided by gene length and library size. Mike On Tue, Nov 26, 2013 at 4:37 PM, Jon BrÃ¥te <jon.brate@ibv.uio.no> wrote: > Hi everyone, > > I have a count matrix of FPKM values and I want to estimate differentially > expressed genes between two conditions. First I used DESeq2, but I realized > that this is not good for FPKM values. > I then transformed the counts using voom() in the limma package and then > used: > > fit <- lmFit(myVoomData,design) > fit <- eBayes(fit) > options(digits=3) > writefile = topTable(fit,n=Inf,sort="none", p.value=0.01) > write.csv(writefile, file="file.csv") > > My problem is that all of the 6156 genes are differentially expressed > (p-value 0.01). Only a few hundred were differentially expressed using > DESe2, but I guess that can't be trusted. > > I am new to this type of analysis, and to R, but is it ok to simply > transform the data by voom()? Can I use the transformed data in DESeq2? Any > other ways I can use FPKM counts to estimate differentially expressed genes? > > Thank you, > > Jon BrÃ¥te > > > > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] C > > attached base packages: > [1] grid parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] limma_3.18.3 cummeRbund_2.4.0 Gviz_1.6.0 > rtracklayer_1.22.0 GenomicRanges_1.14.3 XVector_0.2.0 > [7] IRanges_1.20.6 fastcluster_1.1.11 reshape2_1.2.2 > ggplot2_0.9.3.1 RSQLite_0.11.4 DBI_0.2-7 > [13] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.24.0 BSgenome_1.30.0 Biobase_2.22.0 > Biostrings_2.30.1 Formula_1.1-1 > [6] GenomicFeatures_1.14.2 Hmisc_3.13-0 MASS_7.3-29 > RColorBrewer_1.0-5 RCurl_1.95-4.1 > [11] Rsamtools_1.14.2 XML_3.95-0.2 biomaRt_2.18.0 > biovizBase_1.10.4 bitops_1.0-6 > [16] cluster_1.14.4 colorspace_1.2-4 dichromat_2.0-0 > digest_0.6.3 gtable_0.1.2 > [21] labeling_0.2 lattice_0.20-24 latticeExtra_0.6-26 > munsell_0.4.2 plyr_1.8 > [26] proto_0.3-10 scales_0.2.3 splines_3.0.2 > stats4_3.0.2 stringr_0.6.2 > [31] survival_2.37-4 tools_3.0.2 zlibbioc_1.8.0 > > > ---------------------------------------------------------------- > Jon BrÃ¥te > > Microbial Evolution Research Group (MERG) > Department of Biosciences > University of Oslo > P.B. 1066 Blindern > N-0316, Norway > Email: jon.brate@ibv.uio.no > Phone: 922 44 582 > Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]