How can I get the normalized read counts from TMM?
1
1
Entering edit mode
@reanne-bowlby-5546
Last seen 10.4 years ago
Hello, I have a question about TMM normalization used in EdgeR. Q. How can I get the normalized read counts from TMM? I understand that calcNormFactors() produces two columns of information. The first is lib.size and the second is norm.factors. From what I have read, multiplying these two columns together gives an effective library size. Should I then be dividing the raw read counts by the effective library size to get normalized read counts? I am trying to do differential expression analysis on paired data using Fisher's exact test. By paired I mean I have two sets of data, sequenced on different platforms, but from the same patient. So I am looking for DEG caused by platform difference. Originally I was using RPKM values, but I am wondering if TMM would be better. The article Normalization methods for Illumina high-throughput RNA sequencing data analysis describes normalized read counts in the following way. "To obtain normalized read counts, these normalization factors are re- scaled by the mean of the normalized library sizes. Normalized read counts are obtained by dividing raw read counts by these re-scaled normalization factors." I would love some clarification of TMM as well as any opinions on my use of Fisher's exact test. Thanks for the help in advance. Reanne Bowlby [[alternative HTML version deleted]]
Normalization edgeR Normalization edgeR • 10k views
ADD COMMENT
1
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 6.2 years ago
Hi Reanne, I replied offline, but I reply again on-list. > I have a question about TMM normalization used in EdgeR. > > Q. How can I get the normalized read counts from TMM? See the cpm() function ... for example: counts.per.m <- cpm(d, normalized.lib.sizes=TRUE) if 'd' is a DGEList object. Also, see ?cpm > I understand that calcNormFactors() produces two columns of information. The first is lib.size and the second is norm.factors. From what I have read, multiplying these two columns together gives an effective library size. Should I then be dividing the raw read counts by the effective library size to get normalized read counts? I am trying to do differential expression analysis on paired data using Fisher's exact test. By paired I mean I have two sets of data, sequenced on different platforms, but from the same patient. So I am looking for DEG caused by platform difference. Originally I was using RPKM values, but I am wondering if TMM would be better. Your pairing is not the usual "biological" pairing, but you might consider reading the "3.4.1 Paired Samples" Section of the edgeR user's guide: http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/do c/edgeRUsersGuide.pdf Do you have multiple pairs and want to look for consistent platform differences? If so, this above option may be what you want. Alternatively, if you just have a single pair, you might consider using the binomTest() -- vaguely similar to Fisher's exact -- and manually alter the n1 and n2 arguments to be the effective library sizes; this is how "normalization" is achieved, by modifying offsets, not the data. See ?binomTest Note that, fisher.test() and binomTest() and all the edgeR testing methods should take counts as input, not TMM-normalized values or RPKMs. Best, Mark > > > The article Normalization methods for Illumina high-throughput RNA sequencing data analysis describes normalized read counts in the following way. > "To obtain normalized read counts, these normalization factors are re-scaled by the mean of the normalized library sizes. Normalized read counts are obtained by dividing raw read counts by these re-scaled normalization factors." > > I would love some clarification of TMM as well as any opinions on my use of Fisher's exact test. Thanks for the help in advance. > > Reanne Bowlby > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ---------- http://www.fgcz.ch/Bioconductor2012
ADD COMMENT

Login before adding your answer.

Traffic: 1011 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6