Hello,
I'm trying to use edgeR to analyze some RNA-seq time series data. I
have
four time points. The first and last time points have three
replicates,
while the middle two have two replicates. The following gives the
error I
get:
require(edgeR)
counts <-
read.csv("http://www.duke.edu/~csm29/counts.csv",row.names=1)
d <- DGEList(counts)
d <- calcNormFactors(d)
d <- d[rowSums(d$counts)>9,]
times <- rep(c("zero","one","three","six"),c(3,2,2,3))
design <- model.matrix(~factor(times))
d <- estimateCRDisp(d, design)
Error in beta[k, ] <- betaj[decr, ] :
NAs are not allowed in subscripted assignments
traceback()
3: mglmLS(y, design, dispersion, offset = offset)
2: adjustedProfileLik(spline.disp[i], y.filt, design = design, offset
=
offset.mat.filt +
lib.size.mat.filt)
1: estimateCRDisp(d, design)
sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_2.0.3
loaded via a namespace (and not attached):
[1] limma_3.6.9
The function seems to be getting hung up on one or two genes when I
recover
the error. However, when I remove those genes from the data, the
problem is
still there. Any help would be much appreciated!
[[alternative HTML version deleted]]
Hi Colin,
Thanks for the nice reproducible data example.
The problem that you see is caused by lack of convergence of the
interative glm algorithm for some of your transcripts. Here are two
possible solutions.
Firstly, your design has the form of a one-way layout, so you could
analyse your data using classic exact edgeR instead of the glm code.
In
this style of analysis, you would make pairwise comparisons between
the
time points. I have tried this one your data, and it works fine:
> times <- factor(times,levels=c("zero","one","three","six"))
> d$samples$group <- times
> d2 <- estimateCommonDisp(d)
> d2$common.disp
[1] 0.2359880
> d2 <- estimateTagwiseDisp(d2)
Using grid search to estimate tagwise dispersion.
> topTags(exactTest(d2,c("zero","one"),common=FALSE))
Comparison of groups: one-zero
logConc logFC PValue FDR
19847 -29.88350 40.265106 7.849699e-50 1.458945e-45
6370 -31.78645 36.459199 8.076247e-37 7.505257e-33
20014 -31.57373 36.884641 1.569212e-36 7.655628e-33
13189 -16.39203 -17.244752 1.647612e-36 7.655628e-33
3387 -31.89198 36.248152 6.213699e-36 2.309756e-32
6146 -19.46744 11.313569 4.111695e-31 1.273666e-27
17831 -30.38242 -39.267266 9.207825e-31 2.444809e-27
791 -12.50480 7.443195 8.829879e-25 2.051402e-21
11698 -16.87556 7.510317 1.779005e-24 3.673842e-21
17562 -31.98794 -36.056237 2.586138e-23 4.806596e-20
If you really do need the glm functionality, because you want to do
something other than pairwise comparisons, then we will have to make
the
glm code work for you. We will probably never succeed in writing glm
fitting code that converges for every possible data set, yet for your
data
we can make the functions work if (i) we recognise that the design is
a
oneway layout or (ii) use better starting values. If you need this,
let
us know and we will send you some newer code.
Best wishes
Gordon
> Date: Sat, 26 Feb 2011 15:08:28 -0500
> From: Colin Maxwell <csm29 at="" duke.edu="">
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR glm fit error
>
> Hello,
> I'm trying to use edgeR to analyze some RNA-seq time series data. I
have
> four time points. The first and last time points have three
replicates,
> while the middle two have two replicates. The following gives the
error I
> get:
>
> require(edgeR)
> counts <-
read.csv("http://www.duke.edu/~csm29/counts.csv",row.names=1)
> d <- DGEList(counts)
> d <- calcNormFactors(d)
> d <- d[rowSums(d$counts)>9,]
> times <- rep(c("zero","one","three","six"),c(3,2,2,3))
> design <- model.matrix(~factor(times))
> d <- estimateCRDisp(d, design)
>
> Error in beta[k, ] <- betaj[decr, ] :
> NAs are not allowed in subscripted assignments
>
> traceback()
> 3: mglmLS(y, design, dispersion, offset = offset)
> 2: adjustedProfileLik(spline.disp[i], y.filt, design = design,
offset =
> offset.mat.filt +
> lib.size.mat.filt)
> 1: estimateCRDisp(d, design)
>
> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_2.0.3
>
> loaded via a namespace (and not attached):
> [1] limma_3.6.9
>
> The function seems to be getting hung up on one or two genes when I
recover
> the error. However, when I remove those genes from the data, the
problem is
> still there. Any help would be much appreciated!
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}