edgeR: handling missing values with Quantile normalisation
1
0
Entering edit mode
Sonika Tyagi ▴ 20
@sonika-tyagi-4829
Last seen 7.9 years ago
Hi there, I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data. The code syntax I am using is here: > targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) > targets files group description 1 Sample_xx_count.txt.raw control something 2 Sample_xx_count.txt.raw control something 3 Sample_xx_count.txt.raw Hi_Pos something 4 Sample_xx_count.txt.raw Hi_Pos something 5 Sample_xx_count.txt.raw control something 6 Sample_xx_count.txt.raw control something 7 ................ d <- readDGE(targets, skip = 0, comment.char = "#") d An object of class "DGEList" $samples files group description lib.size norm.factors 1 Sample_xx_count.txt.raw control something 498180513 1 2 Sample_xx_count.txt.raw control something 483775405 1 3 Sample_xx_count.txt.raw Hi_Pos something 368609647 1 4 Sample_xx_count.txt.raw Hi_Pos something 617334315 1 5 Sample_xx_count.txt.raw control something 678060765 1 13 more rows ... $counts 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Tag1 15923 20323 14867 23098 32484 17223 51579 29578 17408 24097 34470 31964 17583 17583 39460 0 30359 25416 Tag2 700 600 200 695 500 1300 1425 1775 700 1974 1300 2371 900 900 1689 0 898 1690 Tag3 0 0 100 0 0 0 0 0 0 0 0 0 0 0 100 0 100 0 Tag4 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 0 98369 79673 Tag5 19868 19385 25500 31215 56684 24096 51265 37492 27420 24496 32729 24722 24913 24913 50448 0 39755 55829 21887 more rows ... d <- calcNormFactors(d) Error in quantile.default(x, p = q) : missing values and NaN's not allowed if 'na.rm' is FALSE Could someone please suggest how to handle the missing values with edgeR normalisation methods ? Thank you Sonika ------------------- > sessionInfo() R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 [4] LC_NUMERIC=C LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.0.5 svIDE_0.9-50 loaded via a namespace (and not attached): [1] limma_3.6.9 svMisc_0.9-61 tcltk_2.12.2 tools_2.12.2 XML_3.2-0.2 [[alternative HTML version deleted]]
RNASeq edgeR RNASeq edgeR • 4.7k views
ADD COMMENT
0
Entering edit mode
Paul Leo ▴ 970
@paul-leo-2092
Last seen 9.6 years ago
HI Sonika It is probably not zero's that are causing the problem but NAs, Check through the counts array to see if it contains NA's ... someting like.. apply(d$counts,2,function(x) sumis.na(x))) should get back all zeros.... probably setting them to 0 is appropriate. Cheers Paul -----Original Message----- From: Sonika Tyagi <sonika.tyagi@agrf.org.au> To: 'bioconductor at r-project.org' <bioconductor at="" r-project.org=""> Subject: [BioC] edgeR: handling missing values with Quantile normalisation Date: Wed, 31 Aug 2011 10:02:26 +1000 Hi there, I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data. The code syntax I am using is here: > targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) > targets files group description 1 Sample_xx_count.txt.raw control something 2 Sample_xx_count.txt.raw control something 3 Sample_xx_count.txt.raw Hi_Pos something 4 Sample_xx_count.txt.raw Hi_Pos something 5 Sample_xx_count.txt.raw control something 6 Sample_xx_count.txt.raw control something 7 ................ d <- readDGE(targets, skip = 0, comment.char = "#") d An object of class "DGEList" $samples files group description lib.size norm.factors 1 Sample_xx_count.txt.raw control something 498180513 1 2 Sample_xx_count.txt.raw control something 483775405 1 3 Sample_xx_count.txt.raw Hi_Pos something 368609647 1 4 Sample_xx_count.txt.raw Hi_Pos something 617334315 1 5 Sample_xx_count.txt.raw control something 678060765 1 13 more rows ... $counts 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Tag1 15923 20323 14867 23098 32484 17223 51579 29578 17408 24097 34470 31964 17583 17583 39460 0 30359 25416 Tag2 700 600 200 695 500 1300 1425 1775 700 1974 1300 2371 900 900 1689 0 898 1690 Tag3 0 0 100 0 0 0 0 0 0 0 0 0 0 0 100 0 100 0 Tag4 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 0 98369 79673 Tag5 19868 19385 25500 31215 56684 24096 51265 37492 27420 24496 32729 24722 24913 24913 50448 0 39755 55829 21887 more rows ... d <- calcNormFactors(d) Error in quantile.default(x, p = q) : missing values and NaN's not allowed if 'na.rm' is FALSE Could someone please suggest how to handle the missing values with edgeR normalisation methods ? Thank you Sonika ------------------- > sessionInfo() R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 [4] LC_NUMERIC=C LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.0.5 svIDE_0.9-50 loaded via a namespace (and not attached): [1] limma_3.6.9 svMisc_0.9-61 tcltk_2.12.2 tools_2.12.2 XML_3.2-0.2 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Sonika You can calculate quantile.default independently with the argument quantile((x, p = q, na.rm = TRUE) and pass this value to the main function this will automatically take care of zeros. or alternatively you can try other methods ("TMM", "RLE", "quantile") for calcNormFactors, if that fits in your requirements. cheers Alok On Wed, Aug 31, 2011 at 6:37 AM, Paul Leo <p.leo@uq.edu.au> wrote: > > HI Sonika > It is probably not zero's that are causing the problem but NAs, > > Check through the counts array > to see if it contains NA's ... someting like.. > > apply(d$counts,2,function(x) sumis.na(x))) > > should get back all zeros.... > > probably setting them to 0 is appropriate. > > > Cheers > Paul > > > > -----Original Message----- > From: Sonika Tyagi <sonika.tyagi@agrf.org.au> > To: 'bioconductor@r-project.org' <bioconductor@r-project.org> > Subject: [BioC] edgeR: handling missing values with Quantile > normalisation > Date: Wed, 31 Aug 2011 10:02:26 +1000 > > Hi there, > > I am analysing RNAseq counts using edgeR package. But I am running into > problems because of 'zero' counts for certain tags in my data. > > The code syntax I am using is here: > > > targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) > > targets > files group description > 1 Sample_xx_count.txt.raw control something > 2 Sample_xx_count.txt.raw control something > 3 Sample_xx_count.txt.raw Hi_Pos something > 4 Sample_xx_count.txt.raw Hi_Pos something > 5 Sample_xx_count.txt.raw control something > 6 Sample_xx_count.txt.raw control something > 7 ................ > > d <- readDGE(targets, skip = 0, comment.char = "#") > d > > An object of class "DGEList" > $samples > files group description lib.size > norm.factors > 1 Sample_xx_count.txt.raw control something 498180513 1 > 2 Sample_xx_count.txt.raw control something 483775405 1 > 3 Sample_xx_count.txt.raw Hi_Pos something 368609647 1 > 4 Sample_xx_count.txt.raw Hi_Pos something 617334315 1 > 5 Sample_xx_count.txt.raw control something 678060765 1 > 13 more rows ... > > $counts > 1 2 3 4 5 6 7 8 9 > 10 11 12 13 14 15 16 17 18 > Tag1 15923 20323 14867 23098 32484 17223 51579 29578 17408 24097 34470 > 31964 17583 17583 39460 0 30359 25416 > Tag2 700 600 200 695 500 1300 1425 1775 700 1974 > 1300 2371 900 900 1689 0 898 1690 > Tag3 0 0 100 0 0 0 0 0 0 0 0 > 0 0 0 100 0 100 0 > Tag4 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 > 121393 86106 46197 46197 127290 0 98369 79673 > Tag5 19868 19385 25500 31215 56684 24096 51265 37492 27420 24496 > 32729 24722 24913 24913 50448 0 39755 55829 > 21887 more rows ... > > > d <- calcNormFactors(d) > Error in quantile.default(x, p = q) : > missing values and NaN's not allowed if 'na.rm' is FALSE > > Could someone please suggest how to handle the missing values with edgeR > normalisation methods ? > > Thank you > Sonika > ------------------- > > > sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 > LC_MONETARY=English_Australia.1252 > [4] LC_NUMERIC=C LC_TIME=English_Australia.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.0.5 svIDE_0.9-50 > > loaded via a namespace (and not attached): > [1] limma_3.6.9 svMisc_0.9-61 tcltk_2.12.2 tools_2.12.2 XML_3.2-0.2 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- ************************************************************ Alok Kumar Srivastava Ph.D scholar Centre of Computational Biology and Bioinformatics School of Computational and Integrative Sciences JNU, New Delhi ************************************************************ [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6