question about normalization of RNAseq by tweeDEseq using TMM from edgeR

0

Entering edit mode

Sermsawat Tunlaya-anukit ▴ 90

@sermsawat-tunlaya-anukit-4848

Last seen 9.6 years ago

I have some question about normalization in package tweeDEseq which using TMM method in edgeR to normalize count data. I run normalization as manual and found something unusual. The read count before normalization of gene 4 sample X1 and X2 is 0, but after normalization it turn to 4 and 3. Why normalization add count into 0 count? Did it effect from tagwise dispersions? I post my code under here for more information. Thank you in advance. Sermsawat Tunlaya-anukit > library(tweeDEseq) > y <- read.table("rawcount.txt", header=T ) > group <- c(1,1,1,2,2,2,2,3,3,3,4,4) > yN <- normalizeCounts(y, group) Using edgeR normalization methods. Calculating library sizes from column totals. Calculating normalization factors with the TMM method. Estimating common dispersion. Estimating tagwise dispersions. Calculating effective library sizes. Adjusting counts to effective library sizes using tagwise dispersions. > head(y) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 1 0 0 0 1 11 18 16 12 9 12 25 19 2 14 28 84 56 54 40 114 86 43 91 150 83 3 12 8 18 15 12 10 32 19 27 31 44 21 4 0 0 0 0 0 0 0 0 0 0 0 0 5 4 6 8 3 7 12 22 44 14 1 1 2 6 899 725 1563 1342 173 129 1072 1607 172 1184 720 524 > head(yN) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 [1,] 1 1 0 1 13 22 7 7 13 8 13 17 [2,] 39 64 81 56 63 51 49 53 65 58 77 76 [3,] 29 18 17 15 14 13 13 11 39 20 22 19 [4,] 4 3 0 0 0 1 0 0 1 0 0 0 [5,] 10 13 8 3 8 15 10 28 21 0 0 2 [6,] 2306 1652 1497 1342 201 164 468 1001 261 752 363 476 > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] tweeDEseq_1.0.11 loaded via a namespace (and not attached): [1] MASS_7.3-16 edgeR_2.4.3 limma_3.10.2 tools_2.14.1 [[alternative HTML version deleted]]

Normalization edgeR tweeDEseq Normalization edgeR tweeDEseq • 1.7k views

ADD COMMENT • link updated 12.2 years ago by Robert Castelo ★ 3.3k • written 12.2 years ago by Sermsawat Tunlaya-anukit ▴ 90

0

Entering edit mode

Robert Castelo ★ 3.3k

@rcastelo

Last seen 4 days ago

Barcelona/Universitat Pompeu Fabra

Dear Sermsawat, the way in which "normalizeCounts()" uses edgeR-TMM normalization is analogous to the edgeR function "exactTest()" which equalizes library sizes using "equalizeLibSizes()" resulting in these changes in the table of counts. let me warn you, however, that you should *not* use the function normalizeCounts() from the tweeDEseq package to input later the resulting table on some other package for differential expression analysis, such as edgeR or DESeq. if you're going to use some other package for DE analysis then you should go to its specific documentation to see how to input and normalize your data. cheers, robert. On Mon, 2012-02-13 at 00:54 -0500, Sermsawat Tunlaya-Anukit wrote: > I have some question about normalization in package tweeDEseq which using > TMM method in edgeR to normalize count data. I run normalization as manual > and found something unusual. The read count before normalization of gene 4 > sample X1 and X2 is 0, but after normalization it turn to 4 and 3. Why > normalization add count into 0 count? Did it effect from tagwise > dispersions? I post my code under here for more information. Thank you in > advance. > > Sermsawat Tunlaya-anukit > > > library(tweeDEseq) > > y <- read.table("rawcount.txt", header=T ) > > group <- c(1,1,1,2,2,2,2,3,3,3,4,4) > > yN <- normalizeCounts(y, group) > Using edgeR normalization methods. > Calculating library sizes from column totals. > Calculating normalization factors with the TMM method. > Estimating common dispersion. > Estimating tagwise dispersions. > Calculating effective library sizes. > Adjusting counts to effective library sizes using tagwise dispersions. > > head(y) > X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 > 1 0 0 0 1 11 18 16 12 9 12 25 19 > 2 14 28 84 56 54 40 114 86 43 91 150 83 > 3 12 8 18 15 12 10 32 19 27 31 44 21 > 4 0 0 0 0 0 0 0 0 0 0 0 0 > 5 4 6 8 3 7 12 22 44 14 1 1 2 > 6 899 725 1563 1342 173 129 1072 1607 172 1184 720 524 > > head(yN) > X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 > [1,] 1 1 0 1 13 22 7 7 13 8 13 17 > [2,] 39 64 81 56 63 51 49 53 65 58 77 76 > [3,] 29 18 17 15 14 13 13 11 39 20 22 19 > [4,] 4 3 0 0 0 1 0 0 1 0 0 0 > [5,] 10 13 8 3 8 15 10 28 21 0 0 2 > [6,] 2306 1652 1497 1342 201 164 468 1001 261 752 363 476 > > sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/en_US.UTF-8/C/C/C/C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] tweeDEseq_1.0.11 > > loaded via a namespace (and not attached): > [1] MASS_7.3-16 edgeR_2.4.3 limma_3.10.2 tools_2.14.1 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 12.2 years ago Robert Castelo ★ 3.3k

0

Entering edit mode

Thank you for your answer. I just want to find normalization count for another analysis such as partial correlation. I used edgeR for calculate differential gene expression. I calculate normalize count by using raw count divide by effective library size (normalize factor multiple with library size) and multiple by 1000000. I saw tweeDEseq approach and try to use it. After i see result of normalize is different from my calculation, so i just want to know what happen? Best regards, Sermsawat T. On Mon, Feb 13, 2012 at 12:52 PM, Robert Castelo <robert.castelo@upf.edu>wrote: > Dear Sermsawat, > > the way in which "normalizeCounts()" uses edgeR-TMM normalization is > analogous to the edgeR function "exactTest()" which equalizes library > sizes using "equalizeLibSizes()" resulting in these changes in the table > of counts. let me warn you, however, that you should *not* use the > function normalizeCounts() from the tweeDEseq package to input later the > resulting table on some other package for differential expression > analysis, such as edgeR or DESeq. if you're going to use some other > package for DE analysis then you should go to its specific documentation > to see how to input and normalize your data. > > cheers, > robert. > > On Mon, 2012-02-13 at 00:54 -0500, Sermsawat Tunlaya-Anukit wrote: > > I have some question about normalization in package tweeDEseq which using > > TMM method in edgeR to normalize count data. I run normalization as > manual > > and found something unusual. The read count before normalization of gene > 4 > > sample X1 and X2 is 0, but after normalization it turn to 4 and 3. Why > > normalization add count into 0 count? Did it effect from tagwise > > dispersions? I post my code under here for more information. Thank you in > > advance. > > > > Sermsawat Tunlaya-anukit > > > > > library(tweeDEseq) > > > y <- read.table("rawcount.txt", header=T ) > > > group <- c(1,1,1,2,2,2,2,3,3,3,4,4) > > > yN <- normalizeCounts(y, group) > > Using edgeR normalization methods. > > Calculating library sizes from column totals. > > Calculating normalization factors with the TMM method. > > Estimating common dispersion. > > Estimating tagwise dispersions. > > Calculating effective library sizes. > > Adjusting counts to effective library sizes using tagwise dispersions. > > > head(y) > > X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 > > 1 0 0 0 1 11 18 16 12 9 12 25 19 > > 2 14 28 84 56 54 40 114 86 43 91 150 83 > > 3 12 8 18 15 12 10 32 19 27 31 44 21 > > 4 0 0 0 0 0 0 0 0 0 0 0 0 > > 5 4 6 8 3 7 12 22 44 14 1 1 2 > > 6 899 725 1563 1342 173 129 1072 1607 172 1184 720 524 > > > head(yN) > > X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 > > [1,] 1 1 0 1 13 22 7 7 13 8 13 17 > > [2,] 39 64 81 56 63 51 49 53 65 58 77 76 > > [3,] 29 18 17 15 14 13 13 11 39 20 22 19 > > [4,] 4 3 0 0 0 1 0 0 1 0 0 0 > > [5,] 10 13 8 3 8 15 10 28 21 0 0 2 > > [6,] 2306 1652 1497 1342 201 164 468 1001 261 752 363 476 > > > sessionInfo() > > R version 2.14.1 (2011-12-22) > > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > > > locale: > > [1] C/en_US.UTF-8/C/C/C/C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] tweeDEseq_1.0.11 > > > > loaded via a namespace (and not attached): > > [1] MASS_7.3-16 edgeR_2.4.3 limma_3.10.2 tools_2.14.1 > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 12.2 years ago Sermsawat Tunlaya-anukit ▴ 90

0

Entering edit mode

hi, On Mon, 2012-02-13 at 15:09 -0500, Sermsawat Tunlaya-Anukit wrote: > Thank you for your answer. I just want to find normalization count for > another analysis such as partial correlation. as far as i know "partial correlations" are defined for continuous data only, so i'm afraid you cannot directly calculate them from RNA-seq count data. > I used edgeR for calculate differential gene expression. I calculate > normalize count by using raw count divide by effective library size > (normalize factor multiple with library size) and multiple by 1000000. again, i'd recommend you to check edgeR documentation about how to use edgeR for normalization and differential gene expression analysis. what you say sounds like feeding RPMs (reads per million) into edgeR which i believe is all wrong. > I saw tweeDEseq approach and try to use it. tweeDEseq takes only a table of counts and a two-sample group indicator variable as input. the tweeDEseq package does not provide its own normalization approach and relies at the moment on the functionality of edgeR for this purpose through the tweeDEseq function 'normalizeCounts()', or any other package that can produce a normalized table of counts, such as the BioC packages cqn' or 'EDASeq'. therefore, in order to feed normalized RNA-seq count data into tweeDEseq one needs to obtain first a normalized table of counts, such as the one provided by the function 'normalizeCounts()', whose "normalized" counts may be different from the raw counts. > After i see result of normalize is different from my calculation, so i > just want to know what happen? when you transformed raw counts into normalized counts, these may change becoming larger or smaller. however, their interpretation should be restricted to the interpretation made by the corresponding differential expression analysis technique. in the case of tweeDEseq, normalized counts help to make more accurate calls of differential expression but i do not know whether normalized (transformed) counts are useful for other inferences on RNA-seq data. i do see a danger in making an isolate biological interpretation of a gene having a positive value of normalized counts while the raw value was zero. if you are interested in the issue of normalizing RNA-seq data, i'd recommend you to take a look to these papers and their corresponding BioC packages ('cqn' and 'EDASeq'): http://biostatistics.oxfordjournals.org/content/early/2012/01/24/biost atistics.kxr054.long http://www.biomedcentral.com/1471-2105/12/480/abstract cheers, robert. > > Best regards, > Sermsawat T. > > On Mon, Feb 13, 2012 at 12:52 PM, Robert Castelo > <robert.castelo at="" upf.edu=""> wrote: > Dear Sermsawat, > > the way in which "normalizeCounts()" uses edgeR-TMM > normalization is > analogous to the edgeR function "exactTest()" which equalizes > library > sizes using "equalizeLibSizes()" resulting in these changes in > the table > of counts. let me warn you, however, that you should *not* use > the > function normalizeCounts() from the tweeDEseq package to input > later the > resulting table on some other package for differential > expression > analysis, such as edgeR or DESeq. if you're going to use some > other > package for DE analysis then you should go to its specific > documentation > to see how to input and normalize your data. > > cheers, > robert. > > On Mon, 2012-02-13 at 00:54 -0500, Sermsawat Tunlaya-Anukit > wrote: > > I have some question about normalization in package > tweeDEseq which using > > TMM method in edgeR to normalize count data. I run > normalization as manual > > and found something unusual. The read count before > normalization of gene 4 > > sample X1 and X2 is 0, but after normalization it turn to 4 > and 3. Why > > normalization add count into 0 count? Did it effect from > tagwise > > dispersions? I post my code under here for more information. > Thank you in > > advance. > > > > Sermsawat Tunlaya-anukit > > > > > library(tweeDEseq) > > > y <- read.table("rawcount.txt", header=T ) > > > group <- c(1,1,1,2,2,2,2,3,3,3,4,4) > > > yN <- normalizeCounts(y, group) > > Using edgeR normalization methods. > > Calculating library sizes from column totals. > > Calculating normalization factors with the TMM method. > > Estimating common dispersion. > > Estimating tagwise dispersions. > > Calculating effective library sizes. > > Adjusting counts to effective library sizes using tagwise > dispersions. > > > head(y) > > X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 > > 1 0 0 0 1 11 18 16 12 9 12 25 19 > > 2 14 28 84 56 54 40 114 86 43 91 150 83 > > 3 12 8 18 15 12 10 32 19 27 31 44 21 > > 4 0 0 0 0 0 0 0 0 0 0 0 0 > > 5 4 6 8 3 7 12 22 44 14 1 1 2 > > 6 899 725 1563 1342 173 129 1072 1607 172 1184 720 524 > > > head(yN) > > X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 > > [1,] 1 1 0 1 13 22 7 7 13 8 13 17 > > [2,] 39 64 81 56 63 51 49 53 65 58 77 76 > > [3,] 29 18 17 15 14 13 13 11 39 20 22 19 > > [4,] 4 3 0 0 0 1 0 0 1 0 0 0 > > [5,] 10 13 8 3 8 15 10 28 21 0 0 2 > > [6,] 2306 1652 1497 1342 201 164 468 1001 261 752 363 476 > > > sessionInfo() > > R version 2.14.1 (2011-12-22) > > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > > > locale: > > [1] C/en_US.UTF-8/C/C/C/C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets > methods base > > > > other attached packages: > > [1] tweeDEseq_1.0.11 > > > > loaded via a namespace (and not attached): > > [1] MASS_7.3-16 edgeR_2.4.3 limma_3.10.2 tools_2.14.1 > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > >

ADD REPLY • link 12.2 years ago Robert Castelo ★ 3.3k

Login before adding your answer.