edgeR norm.factors NaN

0

Entering edit mode

@davenportcolinmh-hannoverde-5408

Last seen 9.6 years ago

Dear Bioconductors, I have an issue with calculating normalisation factors in edgeR. This has always i.e. on three other datasets worked just fine, which leads me baffled here. To summarise- -NaNs occur independently of the calcNormFactors method -the counts appear ok -no NaNs are present in the counts virusDGE = calcNormFactors(virusDGE, method="TMM") virusDGE = calcNormFactors(virusDGE, method="RLE") virusDGE = calcNormFactors(virusDGE, method="upperquartile") > virusDGE An object of class "DGEList" $samples group lib.size norm.factors counts1 all 17 NaN counts2 all 8 NaN counts3 all 14 NaN counts4 all 4 NaN counts5 all 18218 NaN counts6 all 37146 NaN counts7 all 2579 NaN counts8 all 1027 NaN $counts counts1 counts2 counts3 MuHV1_gp001 0 0 0 MuHV1_gp002 0 0 0 MuHV1_gp003 0 0 0 MuHV1_gp004 0 0 0 MuHV1_gp005 0 0 0 counts4 counts5 counts6 MuHV1_gp001 0 0 1 MuHV1_gp002 0 4 5 MuHV1_gp003 0 13 18 MuHV1_gp004 0 11 2 MuHV1_gp005 0 4 6 counts7 counts8 MuHV1_gp001 0 0 MuHV1_gp002 0 0 MuHV1_gp003 3 0 MuHV1_gp004 3 0 MuHV1_gp005 2 0 is.integer(virusDGE$counts) #TRUE is.na(virusDGE$counts) #(all are FALSE) > sumis.na(virusDGE$counts)) #[1] 0 > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.4.6 limma_3.10.3 GenomicFeatures_1.6.9 [4] AnnotationDbi_1.16.19 Biobase_2.14.0 GenomicRanges_1.6.7 [7] IRanges_1.12.6 loaded via a namespace (and not attached): [1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0 DBI_0.2-5 [5] RCurl_1.91-1 RSQLite_0.11.1 rtracklayer_1.14.4 tools_2.14.1 [9] XML_3.9-4 zlibbioc_1.0.1 I am using a custom built annotation, i.e. virustxdb=makeTranscriptDb(transcripts, splicings, genes, chrominfo) It seems to have worked fine so far and counted reads per feature reliably, but could this be the problem ? Thanks for your time, Colin Davenport Dr. Colin Davenport Bioinformatician Tümmler Group PFZ S0-7440 Hannover Medical School Germany davenport [dot] colin <at> mh-hannover.de 0049 511532-8733 Genomics software available at http://genomics1.mh-hannover.de [[alternative HTML version deleted]]

Annotation edgeR Annotation edgeR • 1.7k views

ADD COMMENT • link updated 11.8 years ago by Mark Robinson ▴ 880 • written 11.8 years ago by Davenport.Colin@mh-hannover.de ▴ 10

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

HI Colin, I believe its too many zeros. Basically, in the docs it says: ----- If ?refColumn? is unspecified, the library whose upper quartile is closest to the mean upper quartile is used. ----- I think this breaks down with your data. But the major issue you'll need to deal with is that for the first 4 columns of counts, you barely have any! In 'counts4', you have 4 total reads mapped. I've seen early experiments with 10s of thousands of total mapped reads, but <20 is surely a mistake. Are you sure this experiment worked, or that your custom annotation has captured the mappings correctly? Best, Mark On 19.07.2012, at 11:02, <davenport.colin at="" mh-hannover.de=""> <davenport.colin at="" mh-hannover.de=""> wrote: > Dear Bioconductors, > > I have an issue with calculating normalisation factors in edgeR. This has always i.e. on three other datasets worked just fine, which leads me baffled here. > > To summarise- > -NaNs occur independently of the calcNormFactors method > -the counts appear ok > -no NaNs are present in the counts > > > virusDGE = calcNormFactors(virusDGE, method="TMM") > virusDGE = calcNormFactors(virusDGE, method="RLE") > virusDGE = calcNormFactors(virusDGE, method="upperquartile") > > >> virusDGE > An object of class "DGEList" > $samples > group lib.size norm.factors > counts1 all 17 NaN > counts2 all 8 NaN > counts3 all 14 NaN > counts4 all 4 NaN > counts5 all 18218 NaN > counts6 all 37146 NaN > counts7 all 2579 NaN > counts8 all 1027 NaN > > $counts > counts1 counts2 counts3 > MuHV1_gp001 0 0 0 > MuHV1_gp002 0 0 0 > MuHV1_gp003 0 0 0 > MuHV1_gp004 0 0 0 > MuHV1_gp005 0 0 0 > counts4 counts5 counts6 > MuHV1_gp001 0 0 1 > MuHV1_gp002 0 4 5 > MuHV1_gp003 0 13 18 > MuHV1_gp004 0 11 2 > MuHV1_gp005 0 4 6 > counts7 counts8 > MuHV1_gp001 0 0 > MuHV1_gp002 0 0 > MuHV1_gp003 3 0 > MuHV1_gp004 3 0 > MuHV1_gp005 2 0 > > > > is.integer(virusDGE$counts) > #TRUE > is.na(virusDGE$counts) > #(all are FALSE) >> sumis.na(virusDGE$counts)) > #[1] 0 > > >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.4.6 limma_3.10.3 GenomicFeatures_1.6.9 > [4] AnnotationDbi_1.16.19 Biobase_2.14.0 GenomicRanges_1.6.7 > [7] IRanges_1.12.6 > > loaded via a namespace (and not attached): > [1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0 DBI_0.2-5 > [5] RCurl_1.91-1 RSQLite_0.11.1 rtracklayer_1.14.4 tools_2.14.1 > [9] XML_3.9-4 zlibbioc_1.0.1 > > > > I am using a custom built annotation, i.e. > virustxdb=makeTranscriptDb(transcripts, splicings, genes, chrominfo) > It seems to have worked fine so far and counted reads per feature reliably, but could this be the problem ? > > > Thanks for your time, > > Colin Davenport > > > Dr. Colin Davenport > Bioinformatician > T?mmler Group > PFZ S0-7440 > Hannover Medical School > Germany > davenport [dot] colin <at> mh-hannover.de > 0049 511532-8733 > > Genomics software available at > http://genomics1.mh-hannover.de > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ---------- http://www.fgcz.ch/Bioconductor2012 http://www.eccb12.org/t5

ADD COMMENT • link 11.8 years ago Mark Robinson ▴ 880

Login before adding your answer.