Entering edit mode
Dear Bogdan,
The advice in the edgeR User's Guide about using estimateGLMCommonDisp()
with method="deviance"
, robust="TRUE"
, subset=NULL
is intended to be used
without a design matrix, or in conjunction with the immediately preceding
advice about removing terms from the design matrix.
I have now rewritten the advice in the User's Guide a little to make this more explicit.
Best wishes
Gordon
> Date: Mon, 21 Nov 2011 12:41:05 -0800
> From: Bogdan Tanasa <tanasa at gmail.com>
> To: Mark Robinson <mark.robinson at imls.uzh.ch>
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] edgeR
>
> Hi Mark,
>
> I have two samples "Y" and "S" (with no replicates), and following edgeR
> manual on "what to do if you do not have replicates", I am getting the
> following results below. The end message is "No residual df: cannot
> estimate dispersion". Please could you let me know what is going wrong ..
> Thanks a lot !
>
>> raw.data <- read.delim("file")
>> d <- raw.data[,2:3]
>> rownames(d) <- raw.data[,1]
>
>> group <- factor(c("Y","S"))
>> design <- model.matrix(~group)
>> d <- DGEList(counts = d, group=group)
> Calculating library sizes from column totals.
>> dim(d)
> [1] 4013 2
>> d <- calcNormFactors(d)
>> d
> An object of class "DGEList"
> $samples
> group lib.size norm.factors
> Y Y 628062 1.049011
> S S 422542 0.953279
>
> $counts
> Y S
> chr8:53670099-53691880 132 109
> chr8:71033673-71122126 221 107
> chr8:74069636-74074658 7 1
> chr6:72172478-72187779 203 114
> chr6:72096548-72115900 36 15
> 4008 more rows ...
>
> $all.zeros
> chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658
> FALSE FALSE FALSE
> chr6:72172478-72187779 chr6:72096548-72115900
> FALSE FALSE
> 4008 more elements ...
>
>>
> d<-estimateGLMCommonDisp(d,design,method="deviance",robust="TRUE",subset=NULL)
> Warning message:
> In estimateGLMCommonDisp.default(y = y$counts, design = design, :
> No residual df: cannot estimate dispersion
>
> For Common Dispersion (no GLM modeling):
>
>> d<-estimateCommonDisp(d,design)
> Warning message:
> In estimateCommonDisp(d, design) :
> There is no replication. Setting common dispersion to 0.
>> d
> An object of class "DGEList"
> $samples
> group lib.size norm.factors
> Y Y 628062 1.049011
> S S 422542 0.953279
>
> $counts
> Y S
> chr8:53670099-53691880 132 109
> chr8:71033673-71122126 221 107
> chr8:74069636-74074658 7 1
> chr6:72172478-72187779 203 114
> chr6:72096548-72115900 36 15
> 4008 more rows ...
>
> $all.zeros
> chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658
> FALSE FALSE FALSE
> chr6:72172478-72187779 chr6:72096548-72115900
> FALSE FALSE
> 4008 more elements ...
>
> $common.dispersion
> [1] 1e-16
>
> $pseudo.alt
> Y S
> chr8:53670099-53691880 103.192083 139.42506
> chr8:71033673-71122126 172.781586 136.86720
> chr8:74069636-74074658 5.453794 1.30187
> chr6:72172478-72187779 158.707305 145.81970
> chr6:72096548-72115900 28.129216 19.20588
> 4008 more rows ...
>
> $conc
> $conc.common
> chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658
> 2.270073e-04 3.089534e-04 7.535477e-06
> chr6:72172478-72187779 chr6:72096548-72115900
> 2.985930e-04 4.803864e-05
> 4008 more elements ...
>
> $conc.group
> S Y
> chr8:53670099-53691880 2.706055e-04 2.003510e-04
> chr8:71033673-71122126 2.656403e-04 3.354361e-04
> chr8:74069636-74074658 2.482619e-06 1.062467e-05
> chr6:72172478-72187779 2.830186e-04 3.081155e-04
> chr6:72096548-72115900 3.723929e-05 5.464117e-05
> 4008 more rows ...
>
>
> $common.lib.size
> $pseudo
> Y S
> chr8:53670099-53691880 103.192083 139.42506
> chr8:71033673-71122126 172.781586 136.86720
> chr8:74069636-74074658 5.453794 1.30187
> chr6:72172478-72187779 158.707305 145.81970
> chr6:72096548-72115900 28.129216 19.20588
> 4008 more rows ...
>
> $conc
> $conc.common
> chr8:53670099-53691880 chr8:71033673-71122126 chr8:74069636-74074658
> 2.270073e-04 3.089534e-04 7.535477e-06
> chr6:72172478-72187779 chr6:72096548-72115900
> 2.985930e-04 4.803864e-05
> 4008 more elements ...
>
> $conc.group
> S Y
> chr8:53670099-53691880 2.706055e-04 2.003510e-04
> chr8:71033673-71122126 2.656403e-04 3.354361e-04
> chr8:74069636-74074658 2.482619e-06 1.062467e-05
> chr6:72172478-72187779 2.830186e-04 3.081155e-04
> chr6:72096548-72115900 3.723929e-05 5.464117e-05
> 4008 more rows ...
>
>
> $N
> [1] 515153
>
>
> On Fri, Nov 18, 2011 at 12:39 PM, Mark Robinson
> <mark.robinson at imls.uzh.ch>wrote:
>
>>
>>
>> On 18.11.2011, at 21:27, Bogdan Tanasa wrote:
>>
>>> Dear all,
>>>
>>> are the functions "estimateGLMCommonDisp" and
"estimateGLMTagwiseDisp"
>>> available in edgeR on any platform ? I am using it on Linux/Ubuntu
and
>>> apparently, these functiosn are not available.
>>
>>
>> What version of R/edgeR are you using? Perhaps you have an old
version?
>> These functions have been around for awhile. What does
sessionInfo() give?
>>
>> Mark
Dear Bogdan,
What I suggest is biological replicates!
The edgeR analysis without replicates (with
robust=TRUE
) is just our attempt to offer something better than nothing in a bad situation. We haven't published this methodology in a refereed journal and we don't use it in our own biological research. It hasn't been tested extensively, so you should use at your own risk. How it will go on different technologies remains to be seen.Regarding your question about low count tags, I can tell you for certain that no valid statistical test will ever be able to find a statistically significant difference for counts of 1 in one group vs 3 in another, or 4 in one group vs 6 in another. Theoretically, the smallest possible count difference that could possibly yield a two-sided p-value less than 0.05 is 0 vs 6, even if the dispersion was zero and even if no adjustment is made for multiple testing. This is because the variability in the data can never be less than the Poisson variability inherence in the sequencing technology. When the inter-library dispersion is nonzero, as it always is, much larger differences may be required. When adjustment is made for genome-wide testing, much larger differences are required.
If you have a large number of genome locations that consistently show a small but non statistically significant difference, then you might obtain a statistically significant result by pooling locations, or else by doing a gene set test using this locations. (For the gene set test possibility, see the last case study of the limma package.) I don't understand enough about your scientific problem to know whether these approaches would be appropriate for your particular study.
Best wishes
Gordon