Bioconductor Digest, Vol 102, Issue 29
1
0
Entering edit mode
@davis-mccarthy-4794
Last seen 10.2 years ago
Sonika and Alok Just to confirm: zeros will not cause the the problem that you have reported (tested on dozens of datasets with zero counts). Like Paul, I suspect that you have some NAs in your count matrix. This is unusual. I haven't seen RNA-Seq results with NAs before. I suggest you follow Paul's suggestion. If you find NAs then you can make a decision about removing the tag or setting NAs to zero. If you don't find NAs then we can dig deeper. As an aside I also note that you are using an older version of R and edgeR. I strongly recommend updating to R 2.13 and the corresponding version of edgeR using biocLite(), which will give you edgeR 2.2.5. We have done a lot of development and improvement of the package in the last year. Best wishes Davis > To:?"'bioconductor at r-project.org'" <bioconductor at="" r-project.org=""> > Date:?Wed, 31 Aug 2011 10:02:26 +1000 > Subject:?[BioC] edgeR: handling missing values with Quantile normalisation > Hi there, > > I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data. > > The code syntax I am using is here: > >> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) >> targets > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description > 1 ?Sample_xx_count.txt.raw control ? something > 2 ?Sample_xx_count.txt.raw control ? something > 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something > 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something > 5 ?Sample_xx_count.txt.raw control ? something > 6 ?Sample_xx_count.txt.raw control ? something > 7 ? ................ > > d <- readDGE(targets, skip = 0, comment.char = "#") > d > > An object of class "DGEList" > $samples > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size norm.factors > 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ? ?1 > 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ? ?1 > 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ? ?1 > 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ? ?1 > 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ? ?1 > 13 more rows ... > > $counts > ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8 ? ? 9 ? ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18 > Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097 ?34470 31964 17583 17583 ?39460 ?0 30359 25416 > Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700 ?1974 ? 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690 > Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0 > Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 ?0 98369 79673 > Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420 24496 ?32729 24722 24913 24913 ?50448 ?0 39755 55829 > 21887 more rows ... > > > ?d <- calcNormFactors(d) > Error in quantile.default(x, p = q) : > ?missing values and NaN's not allowed if 'na.rm' is FALSE > > Could someone please suggest how to handle the missing values with edgeR normalisation methods ? > > Thank you > Sonika > ------------------- > >> sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_Australia.1252 ?LC_CTYPE=English_Australia.1252 ? ?LC_MONETARY=English_Australia.1252 > [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ? LC_TIME=English_Australia.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] edgeR_2.0.5 ?svIDE_0.9-50 > > loaded via a namespace (and not attached): > [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2 ?XML_3.2-0.2 > > ? ? ? ?[[alternative HTML version deleted]] > > > > > ---------- Forwarded message ---------- > From:?Paul Leo <p.leo at="" uq.edu.au=""> > To:?Sonika Tyagi <sonika.tyagi at="" agrf.org.au=""> > Date:?Wed, 31 Aug 2011 11:07:47 +1000 > Subject:?Re: [BioC] edgeR: handling missing values with Quantile normalisation > > HI Sonika > It is probably not zero's that are causing the problem but NAs, > > Check through the counts array > to see if it contains ?NA's ... someting like.. > > apply(d$counts,2,function(x) sumis.na(x))) > > should get back all zeros.... > > probably setting them to 0 is appropriate. > > > Cheers > Paul > > > > -----Original Message----- > From: Sonika Tyagi <sonika.tyagi at="" agrf.org.au=""> > To: 'bioconductor at r-project.org' <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR: handling missing values with Quantile > normalisation > Date: Wed, 31 Aug 2011 10:02:26 +1000 > > Hi there, > > I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data. > > The code syntax I am using is here: > >> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) >> targets > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description > 1 ?Sample_xx_count.txt.raw control ? something > 2 ?Sample_xx_count.txt.raw control ? something > 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something > 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something > 5 ?Sample_xx_count.txt.raw control ? something > 6 ?Sample_xx_count.txt.raw control ? something > 7 ? ................ > > d <- readDGE(targets, skip = 0, comment.char = "#") > d > > An object of class "DGEList" > $samples > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size norm.factors > 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ? ?1 > 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ? ?1 > 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ? ?1 > 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ? ?1 > 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ? ?1 > 13 more rows ... > > $counts > ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8 ? ? 9 ? ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18 > Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097 ?34470 31964 17583 17583 ?39460 ?0 30359 25416 > Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700 ?1974 ? 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690 > Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0 > Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 ?0 98369 79673 > Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420 24496 ?32729 24722 24913 24913 ?50448 ?0 39755 55829 > 21887 more rows ... > > > ?d <- calcNormFactors(d) > Error in quantile.default(x, p = q) : > ?missing values and NaN's not allowed if 'na.rm' is FALSE > > Could someone please suggest how to handle the missing values with edgeR normalisation methods ? > > Thank you > Sonika > ------------------- > >> sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_Australia.1252 ?LC_CTYPE=English_Australia.1252 ? ?LC_MONETARY=English_Australia.1252 > [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ? LC_TIME=English_Australia.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] edgeR_2.0.5 ?svIDE_0.9-50 > > loaded via a namespace (and not attached): > [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2 ?XML_3.2-0.2 > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > ---------- Forwarded message ---------- > From:?ALok <foralok at="" gmail.com=""> > To:?Paul Leo <p.leo at="" uq.edu.au=""> > Date:?Wed, 31 Aug 2011 10:47:22 +0530 > Subject:?Re: [BioC] edgeR: handling missing values with Quantile normalisation > Hi Sonika > > You can calculate quantile.default independently with the argument > quantile((x, p = q, ?na.rm = TRUE) > and pass this value to the main function > this will automatically take care of zeros. > > or alternatively you can try other methods ("TMM", "RLE", "quantile") for > calcNormFactors, if that fits in your requirements. > > cheers > Alok > > > On Wed, Aug 31, 2011 at 6:37 AM, Paul Leo <p.leo at="" uq.edu.au=""> wrote: > >> >> HI Sonika >> It is probably not zero's that are causing the problem but NAs, >> >> Check through the counts array >> to see if it contains ?NA's ... someting like.. >> >> apply(d$counts,2,function(x) sumis.na(x))) >> >> should get back all zeros.... >> >> probably setting them to 0 is appropriate. >> >> >> Cheers >> Paul >> >> >> >> -----Original Message----- >> From: Sonika Tyagi <sonika.tyagi at="" agrf.org.au=""> >> To: 'bioconductor at r-project.org' <bioconductor at="" r-project.org=""> >> Subject: [BioC] edgeR: handling missing values with Quantile >> normalisation >> Date: Wed, 31 Aug 2011 10:02:26 +1000 >> >> Hi there, >> >> I am analysing RNAseq counts using edgeR package. But I am running into >> problems because of 'zero' counts for certain tags in my data. >> >> The code syntax I am using is here: >> >> > targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) >> > targets >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description >> 1 ?Sample_xx_count.txt.raw control ? something >> 2 ?Sample_xx_count.txt.raw control ? something >> 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something >> 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something >> 5 ?Sample_xx_count.txt.raw control ? something >> 6 ?Sample_xx_count.txt.raw control ? something >> 7 ? ................ >> >> d <- readDGE(targets, skip = 0, comment.char = "#") >> d >> >> An object of class "DGEList" >> $samples >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size >> norm.factors >> 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ? ?1 >> 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ? ?1 >> 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ? ?1 >> 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ? ?1 >> 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ? ?1 >> 13 more rows ... >> >> $counts >> ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8 ? ? 9 >> ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18 >> Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097 ?34470 >> 31964 17583 17583 ?39460 ?0 30359 25416 >> Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700 ?1974 >> 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690 >> Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 >> ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0 >> Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 >> 121393 86106 46197 46197 127290 ?0 98369 79673 >> Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420 24496 >> ?32729 24722 24913 24913 ?50448 ?0 39755 55829 >> 21887 more rows ... >> >> >> ?d <- calcNormFactors(d) >> Error in quantile.default(x, p = q) : >> ?missing values and NaN's not allowed if 'na.rm' is FALSE >> >> Could someone please suggest how to handle the missing values with edgeR >> normalisation methods ? >> >> Thank you >> Sonika >> ------------------- >> >> > sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_Australia.1252 ?LC_CTYPE=English_Australia.1252 >> ?LC_MONETARY=English_Australia.1252 >> [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ? LC_TIME=English_Australia.1252 >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> other attached packages: >> [1] edgeR_2.0.5 ?svIDE_0.9-50 >> >> loaded via a namespace (and not attached): >> [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2 ?XML_3.2-0.2 >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > ************************************************************ > Alok Kumar Srivastava > Ph.D scholar > Centre of Computational Biology and Bioinformatics > School of Computational and Integrative Sciences > JNU, New Delhi > ************************************************************ > > ? ? ? ?[[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -- ----------------------------------------------------------- Davis J McCarthy DPhil Candidate University of Oxford E: davis.mccarthy at balliol.ox.ac.uk W: sites.google.com/site/davismcc
RNASeq edgeR RNASeq edgeR • 1.2k views
ADD COMMENT
0
Entering edit mode
Sonika Tyagi ▴ 20
@sonika-tyagi-4829
Last seen 8.6 years ago
Thanks Paul, Alok and Davis for your response! Yes, I had a look at the data again and I didn't find NAs in there. I am upgrading the R and edgeR versions and shall run the code again and see how it goes. Kind regards Sonika -----Original Message----- From: davismcc@googlemail.com [mailto:davismcc@googlemail.com] On Behalf Of Davis McCarthy Sent: Thursday, 1 September 2011 9:11 AM To: bioconductor at r-project.org; Sonika Tyagi; foralok at gmail.com Subject: Re: Bioconductor Digest, Vol 102, Issue 29 Sonika and Alok Just to confirm: zeros will not cause the the problem that you have reported (tested on dozens of datasets with zero counts). Like Paul, I suspect that you have some NAs in your count matrix. This is unusual. I haven't seen RNA-Seq results with NAs before. I suggest you follow Paul's suggestion. If you find NAs then you can make a decision about removing the tag or setting NAs to zero. If you don't find NAs then we can dig deeper. As an aside I also note that you are using an older version of R and edgeR. I strongly recommend updating to R 2.13 and the corresponding version of edgeR using biocLite(), which will give you edgeR 2.2.5. We have done a lot of development and improvement of the package in the last year. Best wishes Davis > To:?"'bioconductor at r-project.org'" <bioconductor at="" r-project.org=""> > Date:?Wed, 31 Aug 2011 10:02:26 +1000 > Subject:?[BioC] edgeR: handling missing values with Quantile normalisation > Hi there, > > I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data. > > The code syntax I am using is here: > >> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) >> targets > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description > 1 ?Sample_xx_count.txt.raw control ? something > 2 ?Sample_xx_count.txt.raw control ? something > 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something > 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something > 5 ?Sample_xx_count.txt.raw control ? something > 6 ?Sample_xx_count.txt.raw control ? something > 7 ? ................ > > d <- readDGE(targets, skip = 0, comment.char = "#") > d > > An object of class "DGEList" > $samples > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size norm.factors > 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ? ?1 > 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ? ?1 > 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ? ?1 > 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ? ?1 > 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ? ?1 > 13 more rows ... > > $counts > ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8 ? ? 9 ? ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18 > Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097 ?34470 31964 17583 17583 ?39460 ?0 30359 25416 > Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700 ?1974 ? 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690 > Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0 > Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 ?0 98369 79673 > Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420 24496 ?32729 24722 24913 24913 ?50448 ?0 39755 55829 > 21887 more rows ... > > > ?d <- calcNormFactors(d) > Error in quantile.default(x, p = q) : > ?missing values and NaN's not allowed if 'na.rm' is FALSE > > Could someone please suggest how to handle the missing values with edgeR normalisation methods ? > > Thank you > Sonika > ------------------- > >> sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_Australia.1252 ?LC_CTYPE=English_Australia.1252 ? ?LC_MONETARY=English_Australia.1252 > [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ? LC_TIME=English_Australia.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] edgeR_2.0.5 ?svIDE_0.9-50 > > loaded via a namespace (and not attached): > [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2 ?XML_3.2-0.2 > > ? ? ? ?[[alternative HTML version deleted]] > > > > > ---------- Forwarded message ---------- > From:?Paul Leo <p.leo at="" uq.edu.au=""> > To:?Sonika Tyagi <sonika.tyagi at="" agrf.org.au=""> > Date:?Wed, 31 Aug 2011 11:07:47 +1000 > Subject:?Re: [BioC] edgeR: handling missing values with Quantile normalisation > > HI Sonika > It is probably not zero's that are causing the problem but NAs, > > Check through the counts array > to see if it contains ?NA's ... someting like.. > > apply(d$counts,2,function(x) sumis.na(x))) > > should get back all zeros.... > > probably setting them to 0 is appropriate. > > > Cheers > Paul > > > > -----Original Message----- > From: Sonika Tyagi <sonika.tyagi at="" agrf.org.au=""> > To: 'bioconductor at r-project.org' <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR: handling missing values with Quantile > normalisation > Date: Wed, 31 Aug 2011 10:02:26 +1000 > > Hi there, > > I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data. > > The code syntax I am using is here: > >> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) >> targets > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description > 1 ?Sample_xx_count.txt.raw control ? something > 2 ?Sample_xx_count.txt.raw control ? something > 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something > 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something > 5 ?Sample_xx_count.txt.raw control ? something > 6 ?Sample_xx_count.txt.raw control ? something > 7 ? ................ > > d <- readDGE(targets, skip = 0, comment.char = "#") > d > > An object of class "DGEList" > $samples > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size norm.factors > 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ? ?1 > 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ? ?1 > 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ? ?1 > 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ? ?1 > 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ? ?1 > 13 more rows ... > > $counts > ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8 ? ? 9 ? ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18 > Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097 ?34470 31964 17583 17583 ?39460 ?0 30359 25416 > Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700 ?1974 ? 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690 > Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0 > Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 ?0 98369 79673 > Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420 24496 ?32729 24722 24913 24913 ?50448 ?0 39755 55829 > 21887 more rows ... > > > ?d <- calcNormFactors(d) > Error in quantile.default(x, p = q) : > ?missing values and NaN's not allowed if 'na.rm' is FALSE > > Could someone please suggest how to handle the missing values with edgeR normalisation methods ? > > Thank you > Sonika > ------------------- > >> sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_Australia.1252 ?LC_CTYPE=English_Australia.1252 ? ?LC_MONETARY=English_Australia.1252 > [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ? LC_TIME=English_Australia.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] edgeR_2.0.5 ?svIDE_0.9-50 > > loaded via a namespace (and not attached): > [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2 ?XML_3.2-0.2 > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > ---------- Forwarded message ---------- > From:?ALok <foralok at="" gmail.com=""> > To:?Paul Leo <p.leo at="" uq.edu.au=""> > Date:?Wed, 31 Aug 2011 10:47:22 +0530 > Subject:?Re: [BioC] edgeR: handling missing values with Quantile normalisation > Hi Sonika > > You can calculate quantile.default independently with the argument > quantile((x, p = q, ?na.rm = TRUE) > and pass this value to the main function > this will automatically take care of zeros. > > or alternatively you can try other methods ("TMM", "RLE", "quantile") for > calcNormFactors, if that fits in your requirements. > > cheers > Alok > > > On Wed, Aug 31, 2011 at 6:37 AM, Paul Leo <p.leo at="" uq.edu.au=""> wrote: > >> >> HI Sonika >> It is probably not zero's that are causing the problem but NAs, >> >> Check through the counts array >> to see if it contains ?NA's ... someting like.. >> >> apply(d$counts,2,function(x) sumis.na(x))) >> >> should get back all zeros.... >> >> probably setting them to 0 is appropriate. >> >> >> Cheers >> Paul >> >> >> >> -----Original Message----- >> From: Sonika Tyagi <sonika.tyagi at="" agrf.org.au=""> >> To: 'bioconductor at r-project.org' <bioconductor at="" r-project.org=""> >> Subject: [BioC] edgeR: handling missing values with Quantile >> normalisation >> Date: Wed, 31 Aug 2011 10:02:26 +1000 >> >> Hi there, >> >> I am analysing RNAseq counts using edgeR package. But I am running into >> problems because of 'zero' counts for certain tags in my data. >> >> The code syntax I am using is here: >> >> > targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) >> > targets >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description >> 1 ?Sample_xx_count.txt.raw control ? something >> 2 ?Sample_xx_count.txt.raw control ? something >> 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something >> 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something >> 5 ?Sample_xx_count.txt.raw control ? something >> 6 ?Sample_xx_count.txt.raw control ? something >> 7 ? ................ >> >> d <- readDGE(targets, skip = 0, comment.char = "#") >> d >> >> An object of class "DGEList" >> $samples >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size >> norm.factors >> 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ? ?1 >> 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ? ?1 >> 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ? ?1 >> 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ? ?1 >> 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ? ?1 >> 13 more rows ... >> >> $counts >> ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8 ? ? 9 >> ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18 >> Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097 ?34470 >> 31964 17583 17583 ?39460 ?0 30359 25416 >> Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700 ?1974 >> 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690 >> Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 >> ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0 >> Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 >> 121393 86106 46197 46197 127290 ?0 98369 79673 >> Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420 24496 >> ?32729 24722 24913 24913 ?50448 ?0 39755 55829 >> 21887 more rows ... >> >> >> ?d <- calcNormFactors(d) >> Error in quantile.default(x, p = q) : >> ?missing values and NaN's not allowed if 'na.rm' is FALSE >> >> Could someone please suggest how to handle the missing values with edgeR >> normalisation methods ? >> >> Thank you >> Sonika >> ------------------- >> >> > sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_Australia.1252 ?LC_CTYPE=English_Australia.1252 >> ?LC_MONETARY=English_Australia.1252 >> [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ? LC_TIME=English_Australia.1252 >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> other attached packages: >> [1] edgeR_2.0.5 ?svIDE_0.9-50 >> >> loaded via a namespace (and not attached): >> [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2 ?XML_3.2-0.2 >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > ************************************************************ > Alok Kumar Srivastava > Ph.D scholar > Centre of Computational Biology and Bioinformatics > School of Computational and Integrative Sciences > JNU, New Delhi > ************************************************************ > > ? ? ? ?[[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -- ----------------------------------------------------------- Davis J McCarthy DPhil Candidate University of Oxford E: davis.mccarthy at balliol.ox.ac.uk W: sites.google.com/site/davismcc
ADD COMMENT

Login before adding your answer.

Traffic: 542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6