re incomplete analysis in Deseq

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

I'm using deseq with 454 data and it worked for one set of data but the same script is failing me the second time around with a different set of experimental data. The input data is a matrix of counts of 454 seqs per sample, I have 36 pre and 36 post samples. When I run the estmateSizeFactors I get all my samples as NA. Any ideas why? -- output of sessionInfo(): > cds <- estimateSizeFactors(cds) # Estimates size factors based on the count data > sizeFactors( cds ) X01_MA_1 X02_MA_10 X03_MA_100 X04_MA_102 X05_MA_11 X06_MA_13 X07_MA_14 X08_MA_15 X09_MA_17 X10_MA_18 X11_MA_19 NA NA NA NA NA NA NA NA NA NA NA X12_MA_2 X13_MA_20 X14_MA_22 X15_MA_23 X16_MA_24 X17_MA_25 X18_MA_4 X19_MA_47 X20_MA_5 X21_MA_69 X22_MA_7 NA NA NA NA NA NA NA NA NA NA NA X23_MA_71 X24_MA_73 X25_MA_75 X26_MA_77 X27_MA_79 X28_MA_8 X29_MA_81 X30_MA_83 X31_MA_86 X32_MA_88 X33_MA_9 NA NA NA NA NA NA NA NA NA NA NA X34_MA_90 X35_MA_92 X36_MA_94 X37_MA_96 X38_MA_98 X39_MA_101 X40_MA_103 X41_MA_26 X42_MA_27 X43_MA_29 X44_MA_30 NA NA NA NA NA NA NA NA NA NA NA X45_MA_31 X46_MA_33 X47_MA_34 X48_MA_36 X49_MA_37 X50_MA_40 X51_MA_41 X52_MA_42 X53_MA_43 X54_MA_44 X55_MA_45 NA NA NA NA NA NA NA NA NA NA NA X56_MA_46 X57_MA_49 X58_MA_50 X59_MA_52 X60_MA_54 X61_MA_55 X62_MA_70 X63_MA_72 X64_MA_74 X65_MA_76 X66_MA_78 NA NA NA NA NA NA NA NA NA NA NA X67_MA_80 X68_MA_82 X69_MA_84 X70_MA_87 X71_MA_89 X72_MA_91 X73_MA_93 X74_MA_95 X75_MA_97 X76_MA_99 NA NA NA NA NA NA NA NA NA NA -- Sent via the guest posting facility at bioconductor.org.

DESeq DESeq • 1.5k views

ADD COMMENT • link updated 12.1 years ago by Wolfgang Huber ★ 13k • written 12.1 years ago by Guest User ★ 13k

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 9 months ago

United States

My guess is that you have NA's in your count matrix. Kasper On Mon, Mar 5, 2012 at 5:50 AM, Julian [guest] <guest at="" bioconductor.org=""> wrote: > > I'm using deseq with 454 data and it worked for one set of data but the same script is failing me the second time around with a different set of experimental data. > > The input data is a matrix of counts of 454 seqs per sample, I have 36 pre and 36 post samples. > > When I run the estmateSizeFactors I get all my samples as NA. > > Any ideas why? > > ?-- output of sessionInfo(): > > >> cds <- estimateSizeFactors(cds) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Estimates size factors based on the count data >> sizeFactors( cds ) > ?X01_MA_1 ?X02_MA_10 X03_MA_100 X04_MA_102 ?X05_MA_11 ?X06_MA_13 ?X07_MA_14 ?X08_MA_15 ?X09_MA_17 ?X10_MA_18 ?X11_MA_19 > ? ? ? ?NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA > ?X12_MA_2 ?X13_MA_20 ?X14_MA_22 ?X15_MA_23 ?X16_MA_24 ?X17_MA_25 ? X18_MA_4 ?X19_MA_47 ? X20_MA_5 ?X21_MA_69 ? X22_MA_7 > ? ? ? ?NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA > ?X23_MA_71 ?X24_MA_73 ?X25_MA_75 ?X26_MA_77 ?X27_MA_79 ? X28_MA_8 ?X29_MA_81 ?X30_MA_83 ?X31_MA_86 ?X32_MA_88 ? X33_MA_9 > ? ? ? ?NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA > ?X34_MA_90 ?X35_MA_92 ?X36_MA_94 ?X37_MA_96 ?X38_MA_98 X39_MA_101 X40_MA_103 ?X41_MA_26 ?X42_MA_27 ?X43_MA_29 ?X44_MA_30 > ? ? ? ?NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA > ?X45_MA_31 ?X46_MA_33 ?X47_MA_34 ?X48_MA_36 ?X49_MA_37 ?X50_MA_40 ?X51_MA_41 ?X52_MA_42 ?X53_MA_43 ?X54_MA_44 ?X55_MA_45 > ? ? ? ?NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA > ?X56_MA_46 ?X57_MA_49 ?X58_MA_50 ?X59_MA_52 ?X60_MA_54 ?X61_MA_55 ?X62_MA_70 ?X63_MA_72 ?X64_MA_74 ?X65_MA_76 ?X66_MA_78 > ? ? ? ?NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA > ?X67_MA_80 ?X68_MA_82 ?X69_MA_84 ?X70_MA_87 ?X71_MA_89 ?X72_MA_91 ?X73_MA_93 ?X74_MA_95 ?X75_MA_97 ?X76_MA_99 > ? ? ? ?NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.1 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 11 days ago

EMBL European Molecular Biology Laborat…

Dear Julian do your data contain NA? What is the output of sessionInfo()? Can you provide a code example (incl. input data, potentially subset) that reproduces your error? Best wishes Wolfgang Mar/5/12 11:50 AM, Julian [guest] scripsit:: > > I'm using deseq with 454 data and it worked for one set of data but the same script is failing me the second time around with a different set of experimental data. > > The input data is a matrix of counts of 454 seqs per sample, I have 36 pre and 36 post samples. > > When I run the estmateSizeFactors I get all my samples as NA. > > Any ideas why? > > -- output of sessionInfo(): > > >> cds<- estimateSizeFactors(cds) # Estimates size factors based on the count data >> sizeFactors( cds ) > X01_MA_1 X02_MA_10 X03_MA_100 X04_MA_102 X05_MA_11 X06_MA_13 X07_MA_14 X08_MA_15 X09_MA_17 X10_MA_18 X11_MA_19 > NA NA NA NA NA NA NA NA NA NA NA > X12_MA_2 X13_MA_20 X14_MA_22 X15_MA_23 X16_MA_24 X17_MA_25 X18_MA_4 X19_MA_47 X20_MA_5 X21_MA_69 X22_MA_7 > NA NA NA NA NA NA NA NA NA NA NA > X23_MA_71 X24_MA_73 X25_MA_75 X26_MA_77 X27_MA_79 X28_MA_8 X29_MA_81 X30_MA_83 X31_MA_86 X32_MA_88 X33_MA_9 > NA NA NA NA NA NA NA NA NA NA NA > X34_MA_90 X35_MA_92 X36_MA_94 X37_MA_96 X38_MA_98 X39_MA_101 X40_MA_103 X41_MA_26 X42_MA_27 X43_MA_29 X44_MA_30 > NA NA NA NA NA NA NA NA NA NA NA > X45_MA_31 X46_MA_33 X47_MA_34 X48_MA_36 X49_MA_37 X50_MA_40 X51_MA_41 X52_MA_42 X53_MA_43 X54_MA_44 X55_MA_45 > NA NA NA NA NA NA NA NA NA NA NA > X56_MA_46 X57_MA_49 X58_MA_50 X59_MA_52 X60_MA_54 X61_MA_55 X62_MA_70 X63_MA_72 X64_MA_74 X65_MA_76 X66_MA_78 > NA NA NA NA NA NA NA NA NA NA NA > X67_MA_80 X68_MA_82 X69_MA_84 X70_MA_87 X71_MA_89 X72_MA_91 X73_MA_93 X74_MA_95 X75_MA_97 X76_MA_99 > NA NA NA NA NA NA NA NA NA NA > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 12.1 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

I got the data from Julian. It is not a standard RNA-seq experiment. He only has 150 "genes" (rows), but 76 samples (columns). There are no NA's. The problem arises from the fact that estimateSizeFactorsForMatrix only uses rows (genes) where all samples have counts > 0. In Julian's case, all rows have at least one column with zero counts, implying that loggeomeans is -Inf for all rows. Kasper On Mon, Mar 5, 2012 at 10:59 AM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > Dear Julian > > do your data contain NA? > What is the output of sessionInfo()? > > Can you provide a code example (incl. input data, potentially subset) that > reproduces your error? > > ? ? ? ?Best wishes > ? ? ? ?Wolfgang > > > Mar/5/12 11:50 AM, Julian [guest] scripsit:: > >> >> I'm using deseq with 454 data and it worked for one set of data but the >> same script is failing me the second time around with a different set of >> experimental data. >> >> The input data is a matrix of counts of 454 seqs per sample, I have 36 pre >> and 36 post samples. >> >> When I run the estmateSizeFactors I get all my samples as NA. >> >> Any ideas why? >> >> ?-- output of sessionInfo(): >> >> >>> cds<- estimateSizeFactors(cds) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Estimates >>> size factors based on the count data >>> sizeFactors( cds ) >> >> ? X01_MA_1 ?X02_MA_10 X03_MA_100 X04_MA_102 ?X05_MA_11 ?X06_MA_13 >> ?X07_MA_14 ?X08_MA_15 ?X09_MA_17 ?X10_MA_18 ?X11_MA_19 >> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> ? X12_MA_2 ?X13_MA_20 ?X14_MA_22 ?X15_MA_23 ?X16_MA_24 ?X17_MA_25 >> X18_MA_4 ?X19_MA_47 ? X20_MA_5 ?X21_MA_69 ? X22_MA_7 >> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> ?X23_MA_71 ?X24_MA_73 ?X25_MA_75 ?X26_MA_77 ?X27_MA_79 ? X28_MA_8 >> ?X29_MA_81 ?X30_MA_83 ?X31_MA_86 ?X32_MA_88 ? X33_MA_9 >> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> ?X34_MA_90 ?X35_MA_92 ?X36_MA_94 ?X37_MA_96 ?X38_MA_98 X39_MA_101 >> X40_MA_103 ?X41_MA_26 ?X42_MA_27 ?X43_MA_29 ?X44_MA_30 >> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> ?X45_MA_31 ?X46_MA_33 ?X47_MA_34 ?X48_MA_36 ?X49_MA_37 ?X50_MA_40 >> ?X51_MA_41 ?X52_MA_42 ?X53_MA_43 ?X54_MA_44 ?X55_MA_45 >> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> ?X56_MA_46 ?X57_MA_49 ?X58_MA_50 ?X59_MA_52 ?X60_MA_54 ?X61_MA_55 >> ?X62_MA_70 ?X63_MA_72 ?X64_MA_74 ?X65_MA_76 ?X66_MA_78 >> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> ?X67_MA_80 ?X68_MA_82 ?X69_MA_84 ?X70_MA_87 ?X71_MA_89 ?X72_MA_91 >> ?X73_MA_93 ?X74_MA_95 ?X75_MA_97 ?X76_MA_99 >> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Best wishes > ? ? ? ?Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.1 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Hi Julian, Kasper in this case you will need to (and as I am sure, be able to) come up with another method for estimating size factors that works on these data. Why not try with the 'colSums'? Best wishes Wolfgang Mar/6/12 3:25 PM, Kasper Daniel Hansen scripsit:: > I got the data from Julian. It is not a standard RNA-seq experiment. > He only has 150 "genes" (rows), but 76 samples (columns). There are > no NA's. The problem arises from the fact that > estimateSizeFactorsForMatrix only uses rows (genes) where all samples > have counts> 0. In Julian's case, all rows have at least one column > with zero counts, implying that loggeomeans is -Inf for all rows. > > Kasper > > On Mon, Mar 5, 2012 at 10:59 AM, Wolfgang Huber<whuber at="" embl.de=""> wrote: >> Dear Julian >> >> do your data contain NA? >> What is the output of sessionInfo()? >> >> Can you provide a code example (incl. input data, potentially subset) that >> reproduces your error? >> >> Best wishes >> Wolfgang >> >> >> Mar/5/12 11:50 AM, Julian [guest] scripsit:: >> >>> >>> I'm using deseq with 454 data and it worked for one set of data but the >>> same script is failing me the second time around with a different set of >>> experimental data. >>> >>> The input data is a matrix of counts of 454 seqs per sample, I have 36 pre >>> and 36 post samples. >>> >>> When I run the estmateSizeFactors I get all my samples as NA. >>> >>> Any ideas why? >>> >>> -- output of sessionInfo(): >>> >>> >>>> cds<- estimateSizeFactors(cds) # Estimates >>>> size factors based on the count data >>>> sizeFactors( cds ) >>> >>> X01_MA_1 X02_MA_10 X03_MA_100 X04_MA_102 X05_MA_11 X06_MA_13 >>> X07_MA_14 X08_MA_15 X09_MA_17 X10_MA_18 X11_MA_19 >>> NA NA NA NA NA NA >>> NA NA NA NA NA >>> X12_MA_2 X13_MA_20 X14_MA_22 X15_MA_23 X16_MA_24 X17_MA_25 >>> X18_MA_4 X19_MA_47 X20_MA_5 X21_MA_69 X22_MA_7 >>> NA NA NA NA NA NA >>> NA NA NA NA NA >>> X23_MA_71 X24_MA_73 X25_MA_75 X26_MA_77 X27_MA_79 X28_MA_8 >>> X29_MA_81 X30_MA_83 X31_MA_86 X32_MA_88 X33_MA_9 >>> NA NA NA NA NA NA >>> NA NA NA NA NA >>> X34_MA_90 X35_MA_92 X36_MA_94 X37_MA_96 X38_MA_98 X39_MA_101 >>> X40_MA_103 X41_MA_26 X42_MA_27 X43_MA_29 X44_MA_30 >>> NA NA NA NA NA NA >>> NA NA NA NA NA >>> X45_MA_31 X46_MA_33 X47_MA_34 X48_MA_36 X49_MA_37 X50_MA_40 >>> X51_MA_41 X52_MA_42 X53_MA_43 X54_MA_44 X55_MA_45 >>> NA NA NA NA NA NA >>> NA NA NA NA NA >>> X56_MA_46 X57_MA_49 X58_MA_50 X59_MA_52 X60_MA_54 X61_MA_55 >>> X62_MA_70 X63_MA_72 X64_MA_74 X65_MA_76 X66_MA_78 >>> NA NA NA NA NA NA >>> NA NA NA NA NA >>> X67_MA_80 X68_MA_82 X69_MA_84 X70_MA_87 X71_MA_89 X72_MA_91 >>> X73_MA_93 X74_MA_95 X75_MA_97 X76_MA_99 >>> NA NA NA NA NA NA >>> NA NA NA NA >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> Best wishes >> Wolfgang >> >> Wolfgang Huber >> EMBL >> http://www.embl.de/research/units/genome_biology/huber >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 12.1 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

On Tue, Mar 6, 2012 at 3:28 PM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > > Hi Julian, Kasper > > in this case you will need to (and as I am sure, be able to) come up with > another method for estimating size factors that works on these data. Why not > try with the 'colSums'? Hey, I am just reporting. It might be worthwhile adding some check in estimateSizeFactorsForMatrix to at least warn about the case where the number of non-infinity rows is small, say less than 100. This could be an issue with data types where the number of features is small, eg. small RNAs or other types. I am sure people are using DESeq for these types as well. Kasper > > ? ? ? ?Best wishes > ? ? ? ?Wolfgang > > Mar/6/12 3:25 PM, Kasper Daniel Hansen scripsit:: > >> I got the data from Julian. ?It is not a standard RNA-seq experiment. >> He only has 150 "genes" (rows), but 76 samples (columns). ?There are >> no NA's. ?The problem arises from the fact that >> estimateSizeFactorsForMatrix only uses rows (genes) where all samples >> have counts> ?0. ?In Julian's case, all rows have at least one column >> with zero counts, implying that loggeomeans is -Inf for all rows. >> >> Kasper >> >> On Mon, Mar 5, 2012 at 10:59 AM, Wolfgang Huber<whuber at="" embl.de=""> ?wrote: >>> >>> Dear Julian >>> >>> do your data contain NA? >>> What is the output of sessionInfo()? >>> >>> Can you provide a code example (incl. input data, potentially subset) >>> that >>> reproduces your error? >>> >>> ? ? ? ?Best wishes >>> ? ? ? ?Wolfgang >>> >>> >>> Mar/5/12 11:50 AM, Julian [guest] scripsit:: >>> >>>> >>>> I'm using deseq with 454 data and it worked for one set of data but the >>>> same script is failing me the second time around with a different set of >>>> experimental data. >>>> >>>> The input data is a matrix of counts of 454 seqs per sample, I have 36 >>>> pre >>>> and 36 post samples. >>>> >>>> When I run the estmateSizeFactors I get all my samples as NA. >>>> >>>> Any ideas why? >>>> >>>> ?-- output of sessionInfo(): >>>> >>>> >>>>> cds<- estimateSizeFactors(cds) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # >>>>> Estimates >>>>> size factors based on the count data >>>>> sizeFactors( cds ) >>>> >>>> >>>> ? X01_MA_1 ?X02_MA_10 X03_MA_100 X04_MA_102 ?X05_MA_11 ?X06_MA_13 >>>> ?X07_MA_14 ?X08_MA_15 ?X09_MA_17 ?X10_MA_18 ?X11_MA_19 >>>> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> ? X12_MA_2 ?X13_MA_20 ?X14_MA_22 ?X15_MA_23 ?X16_MA_24 ?X17_MA_25 >>>> X18_MA_4 ?X19_MA_47 ? X20_MA_5 ?X21_MA_69 ? X22_MA_7 >>>> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> ?X23_MA_71 ?X24_MA_73 ?X25_MA_75 ?X26_MA_77 ?X27_MA_79 ? X28_MA_8 >>>> ?X29_MA_81 ?X30_MA_83 ?X31_MA_86 ?X32_MA_88 ? X33_MA_9 >>>> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> ?X34_MA_90 ?X35_MA_92 ?X36_MA_94 ?X37_MA_96 ?X38_MA_98 X39_MA_101 >>>> X40_MA_103 ?X41_MA_26 ?X42_MA_27 ?X43_MA_29 ?X44_MA_30 >>>> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> ?X45_MA_31 ?X46_MA_33 ?X47_MA_34 ?X48_MA_36 ?X49_MA_37 ?X50_MA_40 >>>> ?X51_MA_41 ?X52_MA_42 ?X53_MA_43 ?X54_MA_44 ?X55_MA_45 >>>> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> ?X56_MA_46 ?X57_MA_49 ?X58_MA_50 ?X59_MA_52 ?X60_MA_54 ?X61_MA_55 >>>> ?X62_MA_70 ?X63_MA_72 ?X64_MA_74 ?X65_MA_76 ?X66_MA_78 >>>> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> ?X67_MA_80 ?X68_MA_82 ?X69_MA_84 ?X70_MA_87 ?X71_MA_89 ?X72_MA_91 >>>> ?X73_MA_93 ?X74_MA_95 ?X75_MA_97 ?X76_MA_99 >>>> ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> NA ? ? ? ? NA ? ? ? ? NA ? ? ? ? NA >>>> >>>> >>>> -- >>>> Sent via the guest posting facility at bioconductor.org. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >>> >>> -- >>> Best wishes >>> ? ? ? ?Wolfgang >>> >>> Wolfgang Huber >>> EMBL >>> http://www.embl.de/research/units/genome_biology/huber >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Best wishes > ? ? ? ?Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > >

ADD REPLY • link 12.1 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Hi, On Tue, Mar 6, 2012 at 3:28 PM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > > Hi Julian, Kasper > > in this case you will need to (and as I am sure, be able to) come up with > another method for estimating size factors that works on these data. Why not > try with the 'colSums'? Couldn't we just-as-well adapt the estimateSizesFactorsForMatrix function to step over the (row,col) bins that have the 0 counts instead of skipping over rows that only have 1 0 element? Something like so: ######### esfm <- function(counts, locfunc=median) { ## Can't use info from rows that are all 0, so axe them counts <- counts[rowSums(counts) > 0,] logc <- log(counts) loggeomeans <- apply(logc, 1, function(x) { use <- is.finite(x) sum(x[use]) / sum(use) }) apply(logc, 2, function(lc) { use <- is.finite(lc) exp(locfunc(lc[use] - loggeomeans[use])) }) } ########## ?mas o menos? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 12.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Or, perhaps a bit better. The first check for which rows to keep would be something like: esfm <- function(counts, locfunc=median, min.obs=3) { counts <- counts[rowSums(counts > 0) > min.obs,] ## Add checks to bail on situation when nrow(counts) ## is too small ## ... ## code from below } > ######### > esfm <- function(counts, locfunc=median) { > ?## Can't use info from rows that are all 0, so axe them > ?counts <- counts[rowSums(counts) > 0,] > > ?logc <- log(counts) > ?loggeomeans <- apply(logc, 1, function(x) { > ? ?use <- is.finite(x) > ? ?sum(x[use]) / sum(use) > ?}) > > ?apply(logc, 2, function(lc) { > ? ?use <- is.finite(lc) > ? ?exp(locfunc(lc[use] - loggeomeans[use])) > ?}) > } > ########## > > ?mas o menos? > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 12.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi On 2012-03-06 22:21, Steve Lianoglou wrote: > Couldn't we just-as-well adapt the estimateSizesFactorsForMatrix > function to step over the (row,col) bins that have the 0 counts > instead of skipping over rows that only have 1 0 element? This might cause a bit of a bias, as we censor data which might pull down data. As a compromise, I am tempted to suggest adding a pseudocount, e.g., to add something like 0.1 to all values to avoid the zeroes. This also causes some bias but a less severe one. Of course, I now pulled the value 0.1 out of thin air, and cannot quite come up with a good justification for it. Simon

ADD REPLY • link 12.1 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Hi Simon, On Tue, Mar 6, 2012 at 5:38 PM, Simon Anders <anders at="" embl.de=""> wrote: > Hi > > > On 2012-03-06 22:21, Steve Lianoglou wrote: >> >> Couldn't we just-as-well adapt the estimateSizesFactorsForMatrix >> function to step over the (row,col) bins that have the 0 counts >> instead of skipping over rows that only have 1 0 element? > > > This might cause a bit of a bias, as we censor data which might pull down > data. I'm curious if you could elaborate a bit -- maybe give a toy example of what you mean? I'm having a hard time parsing that sentence :-) I guess (maybe) you're talking about a scenario where bin A has a super-high read count in expt1 but bin A has a 0 read count in expt2, then this number will essentially be skipped when calculating the size factors? And I guess there might be some pathological datasets where there are many such events? Or, am I jumping down the wrong rabbit hole, here? -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 12.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Simon Anders scripsit 03/06/2012 11:38 PM: > Hi > > On 2012-03-06 22:21, Steve Lianoglou wrote: >> Couldn't we just-as-well adapt the estimateSizesFactorsForMatrix >> function to step over the (row,col) bins that have the 0 counts >> instead of skipping over rows that only have 1 0 element? > > This might cause a bit of a bias, as we censor data which might pull > down data. As a compromise, I am tempted to suggest adding a > pseudocount, e.g., to add something like 0.1 to all values to avoid the > zeroes. This also causes some bias but a less severe one. Of course, I > now pulled the value 0.1 out of thin air, and cannot quite come up with > a good justification for it. That reminds me of the development of 'vsn' many years ago, where similar considerations suggested that one should do the estimation of the normalisation parameters and of the across-array error model (i.e. variance stabilising transformation) simultaneously; whereas currently in DESeq, this is done as a two-step process, and somewhat more heuristically. This could be a nice master thesis for someone... Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 12.1 years ago Wolfgang Huber ★ 13k

Login before adding your answer.