Question

edgeR: what to do with no replicates

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

Dear Bogdan,

The advice in the edgeR User's Guide about using estimateGLMCommonDisp() with method="deviance", robust="TRUE", subset=NULL is intended to be used without a design matrix, or in conjunction with the immediately preceding advice about removing terms from the design matrix.

I have now rewritten the advice in the User's Guide a little to make this more explicit.

Best wishes
Gordon

> Date: Mon, 21 Nov 2011 12:41:05 -0800 > From: Bogdan Tanasa <tanasa at gmail.com> > To: Mark Robinson <mark.robinson at imls.uzh.ch> > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] edgeR > > Hi Mark, > > I have two samples "Y" and "S" (with no replicates), and following edgeR > manual on "what to do if you do not have replicates", I am getting the > following results below. The end message is "No residual df: cannot > estimate dispersion". Please could you let me know what is going wrong .. > Thanks a lot ! > >> raw.data <- read.delim("file") >> d <- raw.data[,2:3] >> rownames(d) <- raw.data[,1] > >> group <- factor(c("Y","S")) >> design <- model.matrix(~group) >> d <- DGEList(counts = d, group=group) > Calculating library sizes from column totals. >> dim(d) > [1] 4013 2 >> d <- calcNormFactors(d) >> d > An object of class "DGEList" > $samples > group lib.size norm.factors > Y Y 628062 1.049011 > S S 422542 0.953279 > > $counts > Y S > chr8:53670099-53691880 132 109 > chr8:71033673-71122126 221 107 > chr8:74069636-74074658 7 1 > chr6:72172478-72187779 203 114 > chr6:72096548-72115900 36 15 > 4008 more rows ... > > $all.zeros > chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658 > FALSE FALSE FALSE > chr6:72172478-72187779 chr6:72096548-72115900 > FALSE FALSE > 4008 more elements ... > >> > d<-estimateGLMCommonDisp(d,design,method="deviance",robust="TRUE",subset=NULL) > Warning message: > In estimateGLMCommonDisp.default(y = y$counts, design = design, : > No residual df: cannot estimate dispersion > > For Common Dispersion (no GLM modeling): > >> d<-estimateCommonDisp(d,design) > Warning message: > In estimateCommonDisp(d, design) : > There is no replication. Setting common dispersion to 0. >> d > An object of class "DGEList" > $samples > group lib.size norm.factors > Y Y 628062 1.049011 > S S 422542 0.953279 > > $counts > Y S > chr8:53670099-53691880 132 109 > chr8:71033673-71122126 221 107 > chr8:74069636-74074658 7 1 > chr6:72172478-72187779 203 114 > chr6:72096548-72115900 36 15 > 4008 more rows ... > > $all.zeros > chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658 > FALSE FALSE FALSE > chr6:72172478-72187779 chr6:72096548-72115900 > FALSE FALSE > 4008 more elements ... > > $common.dispersion > [1] 1e-16 > > $pseudo.alt > Y S > chr8:53670099-53691880 103.192083 139.42506 > chr8:71033673-71122126 172.781586 136.86720 > chr8:74069636-74074658 5.453794 1.30187 > chr6:72172478-72187779 158.707305 145.81970 > chr6:72096548-72115900 28.129216 19.20588 > 4008 more rows ... > > $conc > $conc.common > chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658 > 2.270073e-04 3.089534e-04 7.535477e-06 > chr6:72172478-72187779 chr6:72096548-72115900 > 2.985930e-04 4.803864e-05 > 4008 more elements ... > > $conc.group > S Y > chr8:53670099-53691880 2.706055e-04 2.003510e-04 > chr8:71033673-71122126 2.656403e-04 3.354361e-04 > chr8:74069636-74074658 2.482619e-06 1.062467e-05 > chr6:72172478-72187779 2.830186e-04 3.081155e-04 > chr6:72096548-72115900 3.723929e-05 5.464117e-05 > 4008 more rows ... > > > $common.lib.size > $pseudo > Y S > chr8:53670099-53691880 103.192083 139.42506 > chr8:71033673-71122126 172.781586 136.86720 > chr8:74069636-74074658 5.453794 1.30187 > chr6:72172478-72187779 158.707305 145.81970 > chr6:72096548-72115900 28.129216 19.20588 > 4008 more rows ... > > $conc > $conc.common > chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658 > 2.270073e-04 3.089534e-04 7.535477e-06 > chr6:72172478-72187779 chr6:72096548-72115900 > 2.985930e-04 4.803864e-05 > 4008 more elements ... > > $conc.group > S Y > chr8:53670099-53691880 2.706055e-04 2.003510e-04 > chr8:71033673-71122126 2.656403e-04 3.354361e-04 > chr8:74069636-74074658 2.482619e-06 1.062467e-05 > chr6:72172478-72187779 2.830186e-04 3.081155e-04 > chr6:72096548-72115900 3.723929e-05 5.464117e-05 > 4008 more rows ... > > > $N > [1] 515153 > > > On Fri, Nov 18, 2011 at 12:39 PM, Mark Robinson > <mark.robinson at imls.uzh.ch>wrote: > >> >> >> On 18.11.2011, at 21:27, Bogdan Tanasa wrote: >> >>> Dear all, >>> >>> are the functions "estimateGLMCommonDisp" and "estimateGLMTagwiseDisp" >>> available in edgeR on any platform ? I am using it on Linux/Ubuntu and >>> apparently, these functiosn are not available. >> >> >> What version of R/edgeR are you using? Perhaps you have an old version? >> These functions have been around for awhile. What does sessionInfo() give? >> >> Mark

edgeR • 9.8k views

ADD COMMENT • link 14.1 years ago • updated 3.6 years ago Gordon Smyth 53k

score 0 · Answer 1 · 2011-11-22

Dear Gordon, thanks a lot, it would really help us. Just a tiny additional question: edgeR analysis on samples with no replicates shall have the same statistical validity on either CHIP-seq, GRO-seq, or RNA-seq data ? And, if we deal for instance with 20 000 genome locations in the genome (eg ALU repeats) that have very small number of tags, or very small difference between tag counts in various samples (let's say 1 tag in "-" sample and 2-3 tags in "+" sample, or 5-6 tags in "-" sample and 4-5 tags in "+" sample), is there any appropriate statistical model we can use for assessing the differential expression. What would you suggest ? thanks, Bogdan On Tue, Nov 22, 2011 at 4:20 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Bogdan, > > The advice in the edgeR User's Guide about using estimateGLMCommonDisp() > with method="deviance",robust="**TRUE",subset=NULL is intended to be used > without a design matrix, or in conjunction with the immediately preceding > advice about removing terms from the design matrix. > > I have now rewritten the advice in the User's Guide a little to make this > more explicit. > > Best wishes > Gordon > > Message: 2 >> Date: Mon, 21 Nov 2011 12:41:05 -0800 >> From: Bogdan Tanasa <tanasa@gmail.com> >> To: Mark Robinson <mark.robinson@imls.uzh.ch> >> Cc: bioconductor@stat.math.ethz.ch >> Subject: Re: [BioC] edgeR >> >> Hi Mark, >> >> I have two samples "Y" and "S" (with no replicates), and following edgeR >> manual on "what to do if you do not have replicates", I am getting the >> following results below. The end message is "No residual df: cannot >> estimate dispersion". Please could you let me know what is going wrong .. >> Thanks a lot ! >> >> raw.data <- read.delim("file") >>> d <- raw.data[,2:3] >>> rownames(d) <- raw.data[,1] >>> >> >> group <- factor(c("Y","S")) >>> design <- model.matrix(~group) >>> d <- DGEList(counts = d, group=group) >>> >> Calculating library sizes from column totals. >> >>> dim(d) >>> >> [1] 4013 2 >> >>> d <- calcNormFactors(d) >>> d >>> >> An object of class "DGEList" >> $samples >> group lib.size norm.factors >> Y Y 628062 1.049011 >> S S 422542 0.953279 >> >> $counts >> Y S >> chr8:53670099-53691880 132 109 >> chr8:71033673-71122126 221 107 >> chr8:74069636-74074658 7 1 >> chr6:72172478-72187779 203 114 >> chr6:72096548-72115900 36 15 >> 4008 more rows ... >> >> $all.zeros >> chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658 >> FALSE FALSE FALSE >> chr6:72172478-72187779 chr6:72096548-72115900 >> FALSE FALSE >> 4008 more elements ... >> >> >>> d<-estimateGLMCommonDisp(d,**design,method="deviance",** >> robust="TRUE",subset=NULL) >> Warning message: >> In estimateGLMCommonDisp.default(**y = y$counts, design = design, : >> No residual df: cannot estimate dispersion >> >> For Common Dispersion (no GLM modeling): >> >> d<-estimateCommonDisp(d,**design) >>> >> Warning message: >> In estimateCommonDisp(d, design) : >> There is no replication. Setting common dispersion to 0. >> >>> d >>> >> An object of class "DGEList" >> $samples >> group lib.size norm.factors >> Y Y 628062 1.049011 >> S S 422542 0.953279 >> >> $counts >> Y S >> chr8:53670099-53691880 132 109 >> chr8:71033673-71122126 221 107 >> chr8:74069636-74074658 7 1 >> chr6:72172478-72187779 203 114 >> chr6:72096548-72115900 36 15 >> 4008 more rows ... >> >> $all.zeros >> chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658 >> FALSE FALSE FALSE >> chr6:72172478-72187779 chr6:72096548-72115900 >> FALSE FALSE >> 4008 more elements ... >> >> $common.dispersion >> [1] 1e-16 >> >> $pseudo.alt >> Y S >> chr8:53670099-53691880 103.192083 139.42506 >> chr8:71033673-71122126 172.781586 136.86720 >> chr8:74069636-74074658 5.453794 1.30187 >> chr6:72172478-72187779 158.707305 145.81970 >> chr6:72096548-72115900 28.129216 19.20588 >> 4008 more rows ... >> >> $conc >> $conc.common >> chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658 >> 2.270073e-04 3.089534e-04 7.535477e-06 >> chr6:72172478-72187779 chr6:72096548-72115900 >> 2.985930e-04 4.803864e-05 >> 4008 more elements ... >> >> $conc.group >> S Y >> chr8:53670099-53691880 2.706055e-04 2.003510e-04 >> chr8:71033673-71122126 2.656403e-04 3.354361e-04 >> chr8:74069636-74074658 2.482619e-06 1.062467e-05 >> chr6:72172478-72187779 2.830186e-04 3.081155e-04 >> chr6:72096548-72115900 3.723929e-05 5.464117e-05 >> 4008 more rows ... >> >> >> $common.lib.size >> $pseudo >> Y S >> chr8:53670099-53691880 103.192083 139.42506 >> chr8:71033673-71122126 172.781586 136.86720 >> chr8:74069636-74074658 5.453794 1.30187 >> chr6:72172478-72187779 158.707305 145.81970 >> chr6:72096548-72115900 28.129216 19.20588 >> 4008 more rows ... >> >> $conc >> $conc.common >> chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658 >> 2.270073e-04 3.089534e-04 7.535477e-06 >> chr6:72172478-72187779 chr6:72096548-72115900 >> 2.985930e-04 4.803864e-05 >> 4008 more elements ... >> >> $conc.group >> S Y >> chr8:53670099-53691880 2.706055e-04 2.003510e-04 >> chr8:71033673-71122126 2.656403e-04 3.354361e-04 >> chr8:74069636-74074658 2.482619e-06 1.062467e-05 >> chr6:72172478-72187779 2.830186e-04 3.081155e-04 >> chr6:72096548-72115900 3.723929e-05 5.464117e-05 >> 4008 more rows ... >> >> >> $N >> [1] 515153 >> >> >> On Fri, Nov 18, 2011 at 12:39 PM, Mark Robinson >> <mark.robinson@imls.uzh.ch>**wrote: >> >> >>> >>> On 18.11.2011, at 21:27, Bogdan Tanasa wrote: >>> >>> Dear all, >>>> >>>> are the functions "estimateGLMCommonDisp" and "estimateGLMTagwiseDisp" >>>> available in edgeR on any platform ? I am using it on Linux/Ubuntu and >>>> apparently, these functiosn are not available. >>>> >>> >>> >>> What version of R/edgeR are you using? Perhaps you have an old version? >>> These functions have been around for awhile. What does sessionInfo() >>> give? >>> >>> Mark >>> >>> >>> >>> >>>> thanks, >>>> >>>> Bogdan >>>> >>> > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}