Search
Question: How can I get the normalized read counts from TMM?
1
gravatar for Reanne Bowlby
5.1 years ago by
Reanne Bowlby20 wrote:
Hello, I have a question about TMM normalization used in EdgeR. Q. How can I get the normalized read counts from TMM? I understand that calcNormFactors() produces two columns of information. The first is lib.size and the second is norm.factors. From what I have read, multiplying these two columns together gives an effective library size. Should I then be dividing the raw read counts by the effective library size to get normalized read counts? I am trying to do differential expression analysis on paired data using Fisher's exact test. By paired I mean I have two sets of data, sequenced on different platforms, but from the same patient. So I am looking for DEG caused by platform difference. Originally I was using RPKM values, but I am wondering if TMM would be better. The article Normalization methods for Illumina high-throughput RNA sequencing data analysis describes normalized read counts in the following way. "To obtain normalized read counts, these normalization factors are re- scaled by the mean of the normalized library sizes. Normalized read counts are obtained by dividing raw read counts by these re-scaled normalization factors." I would love some clarification of TMM as well as any opinions on my use of Fisher's exact test. Thanks for the help in advance. Reanne Bowlby [[alternative HTML version deleted]]
ADD COMMENTlink modified 5.1 years ago by Mark Robinson870 • written 5.1 years ago by Reanne Bowlby20
1
gravatar for Mark Robinson
5.1 years ago by
Mark Robinson870
Mark Robinson870 wrote:
Hi Reanne, I replied offline, but I reply again on-list. > I have a question about TMM normalization used in EdgeR. > > Q. How can I get the normalized read counts from TMM? See the cpm() function ... for example: counts.per.m <- cpm(d, normalized.lib.sizes=TRUE) if 'd' is a DGEList object. Also, see ?cpm > I understand that calcNormFactors() produces two columns of information. The first is lib.size and the second is norm.factors. From what I have read, multiplying these two columns together gives an effective library size. Should I then be dividing the raw read counts by the effective library size to get normalized read counts? I am trying to do differential expression analysis on paired data using Fisher's exact test. By paired I mean I have two sets of data, sequenced on different platforms, but from the same patient. So I am looking for DEG caused by platform difference. Originally I was using RPKM values, but I am wondering if TMM would be better. Your pairing is not the usual "biological" pairing, but you might consider reading the "3.4.1 Paired Samples" Section of the edgeR user's guide: http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/do c/edgeRUsersGuide.pdf Do you have multiple pairs and want to look for consistent platform differences? If so, this above option may be what you want. Alternatively, if you just have a single pair, you might consider using the binomTest() -- vaguely similar to Fisher's exact -- and manually alter the n1 and n2 arguments to be the effective library sizes; this is how "normalization" is achieved, by modifying offsets, not the data. See ?binomTest Note that, fisher.test() and binomTest() and all the edgeR testing methods should take counts as input, not TMM-normalized values or RPKMs. Best, Mark > > > The article Normalization methods for Illumina high-throughput RNA sequencing data analysis describes normalized read counts in the following way. > "To obtain normalized read counts, these normalization factors are re-scaled by the mean of the normalized library sizes. Normalized read counts are obtained by dividing raw read counts by these re-scaled normalization factors." > > I would love some clarification of TMM as well as any opinions on my use of Fisher's exact test. Thanks for the help in advance. > > Reanne Bowlby > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ---------- http://www.fgcz.ch/Bioconductor2012
ADD COMMENTlink written 5.1 years ago by Mark Robinson870
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 141 users visited in the last hour