Entering edit mode
Davis McCarthy
▴
260
@davis-mccarthy-4138
Last seen 10.2 years ago
Dear Josquin
Just to follow up here: yesterday I committed an update to the release
and devel versions of edgeR to fix a big related to the use of offsets
in the GLMs. I strongly recommend that you and other edgeR users
update to the latest version of the package, especially if you are
using the GLM functions.
Best regards
Davis
On Nov 28, 2010, at 4:34 PM, Gordon K Smyth wrote:
> Dear Josquin,
>
> We have committed a major revision of edgeR, version 2.0.0, to
Bioconductor that runs successfully on all your data without errors.
>
> Previously we simply relied on the glm code in the stats package of
R to fit the negative binomial distribution to one gene at a time.
Now we have written new code designed to fit thousands of genes
simultaneously. The new code implements a line search to prevent
divergence, hence it now works on all your data. An added bonus is
that the new code is a couple of orders of magnitude faster than the
old.
>
> Best wishes
> Gordon
>
> On Fri, 5 Nov 2010, Gordon K Smyth wrote:
>
>> Dear Josquin,
>>
>> Thanks for providing some data. We've confirmed the problem is
that the glm functions provided with R fail to converge, when used to
fit a negative binomial glm, to some of the rows in your data,
especially rows with very large counts. The problem affects glmFit
and glmLRT, as well as estimateCRDisp, because they all make calls to
glm.fit.
>>
>> We're writing some new glm code, implementing convergence checking
etc, that should solve the problem. This will be incorporated into
the edgeR package during the next week.
>>
>> Regards
>> Gordon
>>
>> On Sun, 24 Oct 2010, Gordon K Smyth wrote:
>>
>>> Dear Josquin,
>>> estimateCDDisp() has worked on all the datasets we have tested it
out on so far (although that's admitedly a very limited number), so we
are not able to reproduce your problems. The function is liable to
convergence issues however because it needs to fit a very large number
of generalized linear models, one or more of which could conceivably
fail to converge. That might be the problem with your data. I don't
see any problems with the code example that you give, you seem to be
using the functions correctly, so the issue must be with the count
data itself. Would you be willing to share your data with us offline
so that we diagnose and perhaps solve the problem?
>>> To see estimateCDDisp() working as intended, you can run the
example:
>>>
>>> library(edgeR)
>>> example(glmFit)
>>> CR <- estimateCRDisp(d, design)
>>> fit <- glmFit(d, design, dispersion=CR$CR.common.dispersion)
>>> results <- glmLRT(d, fit, coef=2)
>>> topTags(results)
>>> Best wishes
>>> Gordon
>>> ----- ORIGINAL MESSAGE -------------
>>> [BioC] EdgeR - problems running estimateCRDisp
>>> josquin.tibbits at dpi.vic.gov.au josquin.tibbits at
dpi.vic.gov.au
>>> Fri Oct 22 07:44:00 CEST 2010
>>> I am running the new EdgeR package 1.8 and have updated R to
1.12.0 and
>>> updated all the packages I have installed.
>>> I have an RNAseq experiment with 24 samples and am wanting to run
a glm
>>> analysis using the Cox-Reid common dispersion (and tagwise)
paramaters. I
>>> have created a DGEList object using a targets file and this
successfully
>>> ran in the calcNormFactors() and in the plotMDS.dge() functions. I
have
>>> also created what appears to be a sensible and ok design matrix
using
>>> model.matrix(). These then are being input to estimateCRDisp() to
>>> estimate the common dispersion. The code I have run is as follows
(in
>>> red) with output (blue) which is pretty much exactly the same as
the
>>> worked example in the manual on pp 51-- and does not seem abnormal
at
>>> all......
>>> targets <- read.delim("Lp_NUE_DGEbymSEQ_TARGETS_EdgeRIN.txt",
>>> stringsAsFactors = FALSE, row.names = "label")
>>> targets$Nitrogen <- factor(targets$Nitrogen)
>>> targets$Endophyte <- factor(targets$Endophyte)
>>> targets$Tissue <- factor(targets$Tissue)
>>> targets
>>> ## CREATE THE DGEList Object
>>> DGEList.object <- readDGE(targets, skip = 1, comment.char = '#')
>>> colnames(DGEList.object) <- row.names(targets)
>>> ## CALCULATE THE NORMALISATION (f) FACTORS
>>> DGEList.object <- calcNormFactors(DGEList.object)
>>> ## Exclude data where rowsums are < 30 YOU CAN CHANGE THIS (there
being 24
>>> samples in the experiment)
>>> DGEList.object <- DGEList.object[rowSums(DGEList.object$counts) >
30, ]
>>> DGEList.object
>>> An object of class "DGEList"
>>> $samples
>>> files Endophyte Tissue
Nitrogen
>>> lib.size norm.factors
>>> Lp_NUE_LFFULL Lp_NUE_LFFULL_ER_RAWREADS.txt Free Leaf
Full
>>> 3786614 0.9244799
>>> Lp_NUE_LFNO3 Lp_NUE_LFNO3_ER_RAWREADS.txt Free Leaf
Partial
>>> 5311856 0.8120410
>>> Lp_NUE_LFK Lp_NUE_LFK_ER_RAWREADS.txt Free Leaf
Full
>>> 3482772 0.7224960
>>> Lp_NUE_LFPO4 Lp_NUE_LFPO4_ER_RAWREADS.txt Free Leaf
Full
>>> 6208397 0.7367309
>>> Lp_NUE_LFCa Lp_NUE_LFCa_ER_RAWREADS.txt Free Leaf
Full
>>> 3597004 0.8147619
>>> 19 more rows ...
>>> $counts
>>> Lp_NUE_LFFULL Lp_NUE_LFNO3 Lp_NUE_LFK Lp_NUE_LFPO4
>>> Lp_NUE_LFCa Lp_NUE_LFNH4 Lp_NUE_RFFULL Lp_NUE_RFNO3 Lp_NUE_RFK
>>> Lp_NUE_RFPO4
>>>> prg_cds_44 345 382 120
261 141
>>> 215 888 717 698 1112
>>>> prg_cds_141 290 451 126
422 235
>>> 459 676 559 474 879
>>>> prg_cds_2 233 277 131
333 239
>>> 376 246 185 124 282
>>>> prg_cds_16 433 608 184
614 346
>>> 423 924 756 595 913
>>>> prg_cds_84 287 353 90
241 248
>>> 394 823 612 989 1061
>>> Lp_NUE_RFCa Lp_NUE_RFNH4 Lp_NUE_LSFULL Lp_NUE_LSNO3
>>> Lp_NUE_LSK Lp_NUE_LSPO4 Lp_NUE_LSCa Lp_NUE_LSNH4 Lp_NUE_RSFULL
>>> Lp_NUE_RSNO3
>>>> prg_cds_44 1107 439 954
257 205
>>> 107 868 106 316 1147
>>>> prg_cds_141 583 1052 773
325 230
>>> 193 600 264 414 1046
>>>> prg_cds_2 235 454 177
232 177
>>> 203 138 239 334 228
>>>> prg_cds_16 817 1040 2373
464 220
>>> 276 895 291 714 2655
>>>> prg_cds_84 890 1653 894
268 169
>>> 123 714 209 311 1078
>>> Lp_NUE_RSK Lp_NUE_RSPO4 Lp_NUE_RSCa Lp_NUE_RSNH4
>>>> prg_cds_44 622 903 114 474
>>>> prg_cds_141 505 711 205 1166
>>>> prg_cds_2 93 136 157 340
>>>> prg_cds_16 1398 1660 311 1258
>>>> prg_cds_84 893 771 154 1269
>>> 63905 more rows ...
>>> ## Produce an MDS plot of the data
>>> #plotMDS.dge(DGEList.object, main = "MDS plot for
Lp_NUE_DGEbymSEQ")
>>> ## DEFINE THE DESIGN MATRIX
>>> design <- model.matrix(~ Endophyte + Tissue + Nitrogen , data =
targets)
>>> print("This is the design matrix")
>>> design
>>>
>>> (Intercept) EndophyteSTWT TissueRoot NitrogenPartial
>>> Lp_NUE_LFFULL 1 0 0
0
>>> Lp_NUE_LFNO3 1 0 0
1
>>> Lp_NUE_LFK 1 0 0
0
>>> Lp_NUE_LFPO4 1 0 0
0
>>> Lp_NUE_LFCa 1 0 0
0
>>> Lp_NUE_LFNH4 1 0 0
1
>>> Lp_NUE_RFFULL 1 0 1
0
>>> Lp_NUE_RFNO3 1 0 1
1
>>> Lp_NUE_RFK 1 0 1
0
>>> Lp_NUE_RFPO4 1 0 1
0
>>> Lp_NUE_RFCa 1 0 1
0
>>> Lp_NUE_RFNH4 1 0 1
1
>>> Lp_NUE_LSFULL 1 1 0
0
>>> Lp_NUE_LSNO3 1 1 0
1
>>> Lp_NUE_LSK 1 1 0
0
>>> Lp_NUE_LSPO4 1 1 0
0
>>> Lp_NUE_LSCa 1 1 0
0
>>> Lp_NUE_LSNH4 1 1 0
1
>>> Lp_NUE_RSFULL 1 1 1
0
>>> Lp_NUE_RSNO3 1 1 1
1
>>> Lp_NUE_RSK 1 1 1
0
>>> Lp_NUE_RSPO4 1 1 1
0
>>> Lp_NUE_RSCa 1 1 1
0
>>> Lp_NUE_RSNH4 1 1 1
1
>>> attr(,"assign")
>>> [1] 0 1 2 3
>>> attr(,"contrasts")
>>> attr(,"contrasts")$Endophyte
>>> [1] "contr.treatment"
>>> attr(,"contrasts")$Tissue
>>> [1] "contr.treatment"
>>> attr(,"contrasts")$Nitrogen
>>> [1] "contr.treatment"
>>> ## ESTIMATE THE Cox-Reid common dispersion
>>> DGEList.object <- estimateCRDisp(DGEList.object, design)
>>> names(DGEList.object)
>>> This step is giving the following error:::
>>> Loading required package: MASS
>>> Error: NA/NaN/Inf in foreign function call (arg 1)
>>> In addition: Warning messages:
>>> 1: glm.fit: algorithm did not converge
>>> 2: step size truncated due to divergence
>>>> names(DGEList.object)
>>> [1] "samples" "counts"
>>> Clearly the function is not working. I have tried two different
computers
>>> and a run it in 64 bit but always with the same result. I have
also tried
>>> using a balanced design and a simple design all with the same
results. I
>>> have used the DGEList object to analyse the data using the
original
>>> estimateCommonDisp() and exactTest() which worked fine.
>>> I was hoping someone might be able to help resolve this for me????
>>> Thanks in advance....................... Josquin
>>> The sessionInfo() and traceback() are below:
>>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>> locale:
>>> [1] LC_COLLATE=English_Australia.1252
LC_CTYPE=English_Australia.1252
>>> LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
>>> [5] LC_TIME=English_Australia.1252
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods
base
>>> other attached packages:
>>> [1] MASS_7.3-8 edgeR_1.8.1
>>> loaded via a namespace (and not attached):
>>> [1] limma_3.6.0
>>>> traceback()
>>> 4: .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars,
>>> y = w * z, ny = 1L, tol = min(1e-07, control$epsilon/1000),
>>> coefficients = double(nvars), residuals = double(ngoodobs),
>>> effects = double(ngoodobs), rank = integer(1L), pivot =
1L:nvars,
>>> qraux = double(nvars), work = double(2 * nvars), PACKAGE =
"base")
>>> 3: glm.fit(design, y[i, ], offset = offset[i, ], family = f)
>>> 2: adjustedProfileLik(spline.disp[i], y.select, design = design,
>>> offset = offset.mat.select + lib.size.mat.select)
>>> 1: estimateCRDisp(DGEList.object, design)
>>> Notice:
>>> This email and any attachments may contain information
t...{{dropped:14}}
----------------------------------------------------------------------
--
Davis J McCarthy
Research Technician
Bioinformatics Division
Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Vic 3052, Australia
dmccarthy at wehi.edu.au
http://www.wehi.edu.au
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}