Entering edit mode
Davis McCarthy
▴
20
@davis-mccarthy-4794
Last seen 10.2 years ago
Sonika and Alok
Just to confirm: zeros will not cause the the problem that you have
reported (tested on dozens of datasets with zero counts). Like Paul, I
suspect that you have some NAs in your count matrix. This is unusual.
I haven't seen RNA-Seq results with NAs before.
I suggest you follow Paul's suggestion. If you find NAs then you can
make a decision about removing the tag or setting NAs to zero. If you
don't find NAs then we can dig deeper.
As an aside I also note that you are using an older version of R and
edgeR. I strongly recommend updating to R 2.13 and the corresponding
version of edgeR using biocLite(), which will give you edgeR 2.2.5. We
have done a lot of development and improvement of the package in the
last year.
Best wishes
Davis
> To:?"'bioconductor at r-project.org'" <bioconductor at="" r-project.org="">
> Date:?Wed, 31 Aug 2011 10:02:26 +1000
> Subject:?[BioC] edgeR: handling missing values with Quantile
normalisation
> Hi there,
>
> I am analysing RNAseq counts using edgeR package. But I am running
into problems because of 'zero' counts for certain tags in my data.
>
> The code syntax I am using is here:
>
>> targets <- read.delim(file = "Targets.txt", stringsAsFactors =
FALSE)
>> targets
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description
> 1 ?Sample_xx_count.txt.raw control ? something
> 2 ?Sample_xx_count.txt.raw control ? something
> 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something
> 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something
> 5 ?Sample_xx_count.txt.raw control ? something
> 6 ?Sample_xx_count.txt.raw control ? something
> 7 ? ................
>
> d <- readDGE(targets, skip = 0, comment.char = "#")
> d
>
> An object of class "DGEList"
> $samples
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size
norm.factors
> 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ? ?1
> 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ? ?1
> 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ? ?1
> 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ? ?1
> 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ? ?1
> 13 more rows ...
>
> $counts
> ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8 ?
? 9 ? ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18
> Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097
?34470 31964 17583 17583 ?39460 ?0 30359 25416
> Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700
?1974 ? 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690
> Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ?
? ?0 ? ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0
> Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000
70124 121393 86106 46197 46197 127290 ?0 98369 79673
> Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420
24496 ?32729 24722 24913 24913 ?50448 ?0 39755 55829
> 21887 more rows ...
>
>
> ?d <- calcNormFactors(d)
> Error in quantile.default(x, p = q) :
> ?missing values and NaN's not allowed if 'na.rm' is FALSE
>
> Could someone please suggest how to handle the missing values with
edgeR normalisation methods ?
>
> Thank you
> Sonika
> -------------------
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252
?LC_CTYPE=English_Australia.1252 ? ?LC_MONETARY=English_Australia.1252
> [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ?
LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>
> other attached packages:
> [1] edgeR_2.0.5 ?svIDE_0.9-50
>
> loaded via a namespace (and not attached):
> [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2
?XML_3.2-0.2
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
>
>
> ---------- Forwarded message ----------
> From:?Paul Leo <p.leo at="" uq.edu.au="">
> To:?Sonika Tyagi <sonika.tyagi at="" agrf.org.au="">
> Date:?Wed, 31 Aug 2011 11:07:47 +1000
> Subject:?Re: [BioC] edgeR: handling missing values with Quantile
normalisation
>
> HI Sonika
> It is probably not zero's that are causing the problem but NAs,
>
> Check through the counts array
> to see if it contains ?NA's ... someting like..
>
> apply(d$counts,2,function(x) sumis.na(x)))
>
> should get back all zeros....
>
> probably setting them to 0 is appropriate.
>
>
> Cheers
> Paul
>
>
>
> -----Original Message-----
> From: Sonika Tyagi <sonika.tyagi at="" agrf.org.au="">
> To: 'bioconductor at r-project.org' <bioconductor at="" r-project.org="">
> Subject: [BioC] edgeR: handling missing values with Quantile
> normalisation
> Date: Wed, 31 Aug 2011 10:02:26 +1000
>
> Hi there,
>
> I am analysing RNAseq counts using edgeR package. But I am running
into problems because of 'zero' counts for certain tags in my data.
>
> The code syntax I am using is here:
>
>> targets <- read.delim(file = "Targets.txt", stringsAsFactors =
FALSE)
>> targets
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description
> 1 ?Sample_xx_count.txt.raw control ? something
> 2 ?Sample_xx_count.txt.raw control ? something
> 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something
> 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something
> 5 ?Sample_xx_count.txt.raw control ? something
> 6 ?Sample_xx_count.txt.raw control ? something
> 7 ? ................
>
> d <- readDGE(targets, skip = 0, comment.char = "#")
> d
>
> An object of class "DGEList"
> $samples
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size
norm.factors
> 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ? ?1
> 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ? ?1
> 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ? ?1
> 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ? ?1
> 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ? ?1
> 13 more rows ...
>
> $counts
> ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8 ?
? 9 ? ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18
> Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097
?34470 31964 17583 17583 ?39460 ?0 30359 25416
> Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700
?1974 ? 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690
> Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0 ?
? ?0 ? ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0
> Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000
70124 121393 86106 46197 46197 127290 ?0 98369 79673
> Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420
24496 ?32729 24722 24913 24913 ?50448 ?0 39755 55829
> 21887 more rows ...
>
>
> ?d <- calcNormFactors(d)
> Error in quantile.default(x, p = q) :
> ?missing values and NaN's not allowed if 'na.rm' is FALSE
>
> Could someone please suggest how to handle the missing values with
edgeR normalisation methods ?
>
> Thank you
> Sonika
> -------------------
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252
?LC_CTYPE=English_Australia.1252 ? ?LC_MONETARY=English_Australia.1252
> [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ?
LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>
> other attached packages:
> [1] edgeR_2.0.5 ?svIDE_0.9-50
>
> loaded via a namespace (and not attached):
> [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2
?XML_3.2-0.2
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> ---------- Forwarded message ----------
> From:?ALok <foralok at="" gmail.com="">
> To:?Paul Leo <p.leo at="" uq.edu.au="">
> Date:?Wed, 31 Aug 2011 10:47:22 +0530
> Subject:?Re: [BioC] edgeR: handling missing values with Quantile
normalisation
> Hi Sonika
>
> You can calculate quantile.default independently with the argument
> quantile((x, p = q, ?na.rm = TRUE)
> and pass this value to the main function
> this will automatically take care of zeros.
>
> or alternatively you can try other methods ("TMM", "RLE",
"quantile") for
> calcNormFactors, if that fits in your requirements.
>
> cheers
> Alok
>
>
> On Wed, Aug 31, 2011 at 6:37 AM, Paul Leo <p.leo at="" uq.edu.au="">
wrote:
>
>>
>> HI Sonika
>> It is probably not zero's that are causing the problem but NAs,
>>
>> Check through the counts array
>> to see if it contains ?NA's ... someting like..
>>
>> apply(d$counts,2,function(x) sumis.na(x)))
>>
>> should get back all zeros....
>>
>> probably setting them to 0 is appropriate.
>>
>>
>> Cheers
>> Paul
>>
>>
>>
>> -----Original Message-----
>> From: Sonika Tyagi <sonika.tyagi at="" agrf.org.au="">
>> To: 'bioconductor at r-project.org' <bioconductor at="" r-project.org="">
>> Subject: [BioC] edgeR: handling missing values with Quantile
>> normalisation
>> Date: Wed, 31 Aug 2011 10:02:26 +1000
>>
>> Hi there,
>>
>> I am analysing RNAseq counts using edgeR package. But I am running
into
>> problems because of 'zero' counts for certain tags in my data.
>>
>> The code syntax I am using is here:
>>
>> > targets <- read.delim(file = "Targets.txt", stringsAsFactors =
FALSE)
>> > targets
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?files ? group description
>> 1 ?Sample_xx_count.txt.raw control ? something
>> 2 ?Sample_xx_count.txt.raw control ? something
>> 3 ?Sample_xx_count.txt.raw ?Hi_Pos ? something
>> 4 ?Sample_xx_count.txt.raw ?Hi_Pos ? something
>> 5 ?Sample_xx_count.txt.raw control ? something
>> 6 ?Sample_xx_count.txt.raw control ? something
>> 7 ? ................
>>
>> d <- readDGE(targets, skip = 0, comment.char = "#")
>> d
>>
>> An object of class "DGEList"
>> $samples
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? files ? group description ?lib.size
>> norm.factors
>> 1 Sample_xx_count.txt.raw control ? something 498180513 ? ? ? ? ?
?1
>> 2 Sample_xx_count.txt.raw control ? something 483775405 ? ? ? ? ?
?1
>> 3 Sample_xx_count.txt.raw ?Hi_Pos ? something 368609647 ? ? ? ? ?
?1
>> 4 Sample_xx_count.txt.raw ?Hi_Pos ? something 617334315 ? ? ? ? ?
?1
>> 5 Sample_xx_count.txt.raw control ? something 678060765 ? ? ? ? ?
?1
>> 13 more rows ...
>>
>> $counts
>> ? ? ? ? ? ? ? ? ? ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? ?7 ? ? 8
? ? 9
>> ?10 ? ? 11 ? ?12 ? ?13 ? ?14 ? ? 15 16 ? ?17 ? ?18
>> Tag1 ? 15923 20323 14867 23098 32484 17223 ?51579 29578 17408 24097
?34470
>> 31964 17583 17583 ?39460 ?0 30359 25416
>> Tag2 ? ? ? ?700 ? 600 ? 200 ? 695 ? 500 ?1300 ? 1425 ?1775 ? 700
?1974
>> 1300 ?2371 ? 900 ? 900 ? 1689 ?0 ? 898 ?1690
>> Tag3 ? ? ?0 ? ? 0 ? 100 ? ? 0 ? ? 0 ? ? 0 ? ? ?0 ? ? 0 ? ? 0 ? ? 0
? ? ?0
>> ? 0 ? ? 0 ? ? 0 ? ?100 ?0 ? 100 ? ? 0
>> Tag4 ? ? 74008 58753 51648 65233 93828 71047 117340 90551 55000
70124
>> 121393 86106 46197 46197 127290 ?0 98369 79673
>> Tag5 ? ? 19868 19385 25500 31215 56684 24096 ?51265 37492 27420
24496
>> ?32729 24722 24913 24913 ?50448 ?0 39755 55829
>> 21887 more rows ...
>>
>>
>> ?d <- calcNormFactors(d)
>> Error in quantile.default(x, p = q) :
>> ?missing values and NaN's not allowed if 'na.rm' is FALSE
>>
>> Could someone please suggest how to handle the missing values with
edgeR
>> normalisation methods ?
>>
>> Thank you
>> Sonika
>> -------------------
>>
>> > sessionInfo()
>> R version 2.12.2 (2011-02-25)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_Australia.1252
?LC_CTYPE=English_Australia.1252
>> ?LC_MONETARY=English_Australia.1252
>> [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? ?
LC_TIME=English_Australia.1252
>>
>> attached base packages:
>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ?
base
>>
>> other attached packages:
>> [1] edgeR_2.0.5 ?svIDE_0.9-50
>>
>> loaded via a namespace (and not attached):
>> [1] limma_3.6.9 ? svMisc_0.9-61 tcltk_2.12.2 ?tools_2.12.2
?XML_3.2-0.2
>>
>> ? ? ? ?[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> ************************************************************
> Alok Kumar Srivastava
> Ph.D scholar
> Centre of Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> JNU, New Delhi
> ************************************************************
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
--
-----------------------------------------------------------
Davis J McCarthy
DPhil Candidate
University of Oxford
E: davis.mccarthy at balliol.ox.ac.uk
W: sites.google.com/site/davismcc