edgeR
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Hi, I am using edgeR to analyse counts data from an experiment consisting of two genotypes sampled at five time points. Three biological replicates. I have used the 'Classic approach' and the 'GLM approach', and expected to get the same results either way. At least, I expected to find the same topTags irrespectively of approach. (But maybe that is wrong?) Here is what I've done: CLASSIC APPROACH: > # Classic part > > y<-DGEList(counts=x,group=Group) > y$samples group lib.size norm.factors E00R1 E00 57811138 1 E00R2 E00 51440945 1 E00R3 E00 48826680 1 E01R1 E01 45632509 1 E01R2 E01 58231864 1 E01R3 E01 52760716 1 E05R1 E05 56470353 1 E05R2 E05 49744068 1 E05R3 E05 46002201 1 E48R2 E48 40331851 1 E48R3 E48 52134771 1 E96R1 E96 49449978 1 E96R2 E96 44233768 1 E96R3 E96 57043515 1 J00R1 J00 58919765 1 J00R2 J00 40388557 1 J00R3 J00 54126457 1 J01R1 J01 57302722 1 J01R2 J01 56813843 1 J01R3 J01 46711117 1 J05R1 J05 64733652 1 J05R2 J05 48922602 1 J05R3 J05 28696513 1 J48R2 J48 49544429 1 J48R3 J48 42298884 1 J96R1 J96 51538248 1 J96R2 J96 66995708 1 J96R3 J96 38325422 1 > > levels(y$samples$group) [1] "E00" "E01" "E05" "E48" "E96" "J00" "J01" "J05" "J48" "J96" > y<-calcNormFactors(y) > # estimating dispersion > # over all genes > y<-estimateCommonDisp(y) > > # genewise > > y<-estimateTagwiseDisp(y) > > plotBCV(y) > > et<-exactTest(y, pair=c("E00","J00")) > topTags(et) Comparison of groups: J00-E00 logFC logCPM PValue FDR comp394477_c0 -15.422419 5.602823 0.000000e+00 0.000000e+00 comp413538_c1 -11.685657 4.694669 0.000000e+00 0.000000e+00 comp431566_c0 -11.615584 4.405613 0.000000e+00 0.000000e+00 comp404433_c0 8.790634 4.710831 0.000000e+00 0.000000e+00 comp412635_c0 6.325601 4.519383 0.000000e+00 0.000000e+00 comp431005_c0 4.235474 6.494423 0.000000e+00 0.000000e+00 comp373625_c1 6.893877 3.728574 3.134846e-288 9.760118e-285 comp421285_c2 -4.744062 5.016100 6.226410e-278 1.696230e-274 comp419056_c0 13.453985 2.801532 6.846340e-239 1.657879e-235 comp401947_c0 7.987398 2.858410 1.078798e-217 2.351132e-214 > > et<-exactTest(y, pair=c("E01","J01")) > topTags(et) Comparison of groups: J01-E01 logFC logCPM PValue FDR comp413538_c1 -14.570734 4.694669 0.000000e+00 0.000000e+00 comp431566_c0 -14.185398 4.405613 0.000000e+00 0.000000e+00 comp394477_c0 -14.137519 5.602823 0.000000e+00 0.000000e+00 comp404433_c0 10.181217 4.710831 0.000000e+00 0.000000e+00 comp373625_c1 9.738900 3.728574 0.000000e+00 0.000000e+00 comp412635_c0 7.077064 4.519383 0.000000e+00 0.000000e+00 comp431005_c0 4.393042 6.494423 0.000000e+00 0.000000e+00 comp421285_c2 -4.656943 5.016100 2.038437e-267 5.553212e-264 comp404433_c1 13.310478 3.132245 1.777238e-232 4.303681e-229 comp419056_c0 11.369324 2.801532 2.968126e-222 6.468734e-219 > > et<-exactTest(y, pair=c("E05","J05")) > topTags(et) Comparison of groups: J05-E05 logFC logCPM PValue FDR comp404433_c0 12.392403 4.710831 0.000000e+00 0.000000e+00 comp394477_c0 -11.251416 5.602823 0.000000e+00 0.000000e+00 comp413538_c1 -11.075882 4.694669 0.000000e+00 0.000000e+00 comp412635_c0 7.071984 4.519383 0.000000e+00 0.000000e+00 comp431005_c0 3.866607 6.494423 5.618431e-314 2.448962e-310 comp373625_c1 9.542258 3.728574 4.235325e-310 1.538411e-306 comp431566_c0 -11.206496 4.405613 1.889575e-289 5.883056e-286 comp421285_c2 -4.448176 5.016100 4.956948e-245 1.350396e-241 comp379820_c0 -6.741883 4.283112 4.759103e-221 1.152443e-217 comp401947_c0 12.325326 2.858410 2.865886e-201 6.245911e-198 > > et<-exactTest(y, pair=c("E48","J48")) > topTags(et) Comparison of groups: J48-E48 logFC logCPM PValue FDR comp413538_c1 -11.577416 4.694669 9.806810e-268 2.137296e-263 comp412635_c0 7.534573 4.519383 4.519394e-256 4.924784e-252 comp373625_c1 11.099259 3.728574 4.211513e-231 3.059524e-227 comp421285_c2 -5.694947 5.016100 8.898521e-225 4.848359e-221 comp369941_c0 19.008105 9.936255 1.581987e-216 6.895566e-213 comp431566_c0 -13.422262 4.405613 1.567587e-181 5.693999e-178 comp404433_c0 12.908316 4.710831 2.647072e-159 8.241469e-156 comp394477_c0 -11.329087 5.602823 8.649066e-159 2.356222e-155 comp423647_c1 -3.235352 5.607951 1.216586e-156 2.946030e-153 comp379820_c0 -6.545352 4.283112 1.001698e-152 2.183100e-149 > > et<-exactTest(y, pair=c("E96","J96")) > topTags(et) Comparison of groups: J96-E96 logFC logCPM PValue FDR comp413538_c1 -14.558952 4.694669 0.000000e+00 0.000000e+00 comp431566_c0 -14.309061 4.405613 0.000000e+00 0.000000e+00 comp373625_c1 10.996613 3.728574 0.000000e+00 0.000000e+00 comp412635_c0 6.482673 4.519383 0.000000e+00 0.000000e+00 comp421285_c2 -6.022731 5.016100 0.000000e+00 0.000000e+00 comp404433_c0 12.027340 4.710831 3.172936e-312 1.152516e-308 comp369941_c0 17.982161 9.936255 2.109743e-292 6.568535e-289 comp423647_c1 -3.434699 5.607951 7.129247e-256 1.942185e-252 comp431005_c0 3.384003 6.494423 7.465632e-247 1.807844e-243 comp394477_c0 -13.722197 5.602823 9.993279e-244 2.177935e-240 > MY INTENTION SO FAR HAS BEEN TO COMPARE THE GENOTYPES, E AND J, AT THE DIFFERENT TIME POINTS. SO FAR, SO GOOD (?) MOVING OVER TO THE GLM APPROACH: > z<-x > z<-DGEList(counts=x,group=Group) > z$samples group lib.size norm.factors E00R1 E 57811138 1 E00R2 E 51440945 1 E00R3 E 48826680 1 E01R1 E 45632509 1 E01R2 E 58231864 1 E01R3 E 52760716 1 E05R1 E 56470353 1 E05R2 E 49744068 1 E05R3 E 46002201 1 E48R2 E 40331851 1 E48R3 E 52134771 1 E96R1 E 49449978 1 E96R2 E 44233768 1 E96R3 E 57043515 1 J00R1 J 58919765 1 J00R2 J 40388557 1 J00R3 J 54126457 1 J01R1 J 57302722 1 J01R2 J 56813843 1 J01R3 J 46711117 1 J05R1 J 64733652 1 J05R2 J 48922602 1 J05R3 J 28696513 1 J48R2 J 49544429 1 J48R3 J 42298884 1 J96R1 J 51538248 1 J96R2 J 66995708 1 J96R3 J 38325422 1 > > z<-calcNormFactors(z) > z<-x > z<-DGEList(counts=z,group=Group) > z$samples group lib.size norm.factors E00R1 E 57811138 1 E00R2 E 51440945 1 E00R3 E 48826680 1 E01R1 E 45632509 1 E01R2 E 58231864 1 E01R3 E 52760716 1 E05R1 E 56470353 1 E05R2 E 49744068 1 E05R3 E 46002201 1 E48R2 E 40331851 1 E48R3 E 52134771 1 E96R1 E 49449978 1 E96R2 E 44233768 1 E96R3 E 57043515 1 J00R1 J 58919765 1 J00R2 J 40388557 1 J00R3 J 54126457 1 J01R1 J 57302722 1 J01R2 J 56813843 1 J01R3 J 46711117 1 J05R1 J 64733652 1 J05R2 J 48922602 1 J05R3 J 28696513 1 J48R2 J 49544429 1 J48R3 J 42298884 1 J96R1 J 51538248 1 J96R2 J 66995708 1 J96R3 J 38325422 1 > > z<-calcNormFactors(z) > Group<-factor(substring(colnames(z),1,1)) > Group [1] E E E E E E E E E E E E E E J J J J J J J J J J J J J J Levels: E J > > Time<-factor(substring(colnames(z),2,3)) > Time [1] 00 00 00 01 01 01 05 05 05 48 48 96 96 96 00 00 00 01 01 01 05 05 05 48 48 96 96 96 Levels: 00 01 05 48 96 > > design<-model.matrix(~Group+Group:Time) > rownames(design)<-colnames(z) > design (Intercept) GroupJ GroupE:Time01 GroupJ:Time01 GroupE:Time05 GroupJ:Time05 GroupE:Time48 GroupJ:Time48 GroupE:Time96 GroupJ:Time96 E00R1 1 0 0 0 0 0 0 0 0 0 E00R2 1 0 0 0 0 0 0 0 0 0 E00R3 1 0 0 0 0 0 0 0 0 0 E01R1 1 0 1 0 0 0 0 0 0 0 E01R2 1 0 1 0 0 0 0 0 0 0 E01R3 1 0 1 0 0 0 0 0 0 0 E05R1 1 0 0 0 1 0 0 0 0 0 E05R2 1 0 0 0 1 0 0 0 0 0 E05R3 1 0 0 0 1 0 0 0 0 0 E48R2 1 0 0 0 0 0 1 0 0 0 E48R3 1 0 0 0 0 0 1 0 0 0 E96R1 1 0 0 0 0 0 0 0 1 0 E96R2 1 0 0 0 0 0 0 0 1 0 E96R3 1 0 0 0 0 0 0 0 1 0 J00R1 1 1 0 0 0 0 0 0 0 0 J00R2 1 1 0 0 0 0 0 0 0 0 J00R3 1 1 0 0 0 0 0 0 0 0 J01R1 1 1 0 1 0 0 0 0 0 0 J01R2 1 1 0 1 0 0 0 0 0 0 J01R3 1 1 0 1 0 0 0 0 0 0 J05R1 1 1 0 0 0 1 0 0 0 0 J05R2 1 1 0 0 0 1 0 0 0 0 J05R3 1 1 0 0 0 1 0 0 0 0 J48R2 1 1 0 0 0 0 0 1 0 0 J48R3 1 1 0 0 0 0 0 1 0 0 J96R1 1 1 0 0 0 0 0 0 0 1 J96R2 1 1 0 0 0 0 0 0 0 1 J96R3 1 1 0 0 0 0 0 0 0 1 attr(,"assign") [1] 0 1 2 2 2 2 2 2 2 2 attr(,"contrasts") attr(,"contrasts")$Group [1] "contr.treatment" attr(,"contrasts")$Time [1] "contr.treatment" > z<-estimateGLMCommonDisp(z,design) > > # genewise > z<-estimateGLMTrendedDisp(z,design) > z<-estimateGLMTagwiseDisp(z,design) > > fit<-glmFit(z,design) > lrt<-glmLRT(fit,coef=2) > topTags(lrt) Coefficient: GroupJ logFC logCPM LR PValue FDR comp394477_c0 -15.320108 5.602823 1599.742 0.000000e+00 0.000000e+00 comp413538_c1 -11.668893 4.694669 2063.236 0.000000e+00 0.000000e+00 comp431566_c0 -11.598820 4.405613 1505.514 0.000000e+00 0.000000e+00 comp404433_c0 8.789010 4.710831 1614.360 0.000000e+00 0.000000e+00 comp412635_c0 6.325157 4.519383 1638.631 0.000000e+00 0.000000e+00 comp431005_c0 4.235436 6.494423 1572.862 0.000000e+00 0.000000e+00 comp421285_c2 -4.743959 5.016100 1273.102 7.917312e-279 2.464999e-275 comp373625_c1 6.892714 3.728574 1255.435 5.468549e-275 1.489769e-271 comp419056_c0 13.351686 2.801532 1103.131 6.893184e-242 1.669223e-238 comp401947_c0 7.983599 2.858410 1005.328 1.247665e-220 2.719160e-217 > > # testing the Group DE at 01h > lrt<-glmLRT(fit,contrast=c(0,0,-1,1,0,0,0,0,0,0)) > topTags(lrt) Coefficient: -1*GroupE:Time01 1*GroupJ:Time01 logFC logCPM LR PValue FDR comp401050_c0 3.493540 4.889617 169.62289 8.943982e-39 1.949251e-34 comp427887_c0 2.494577 6.359053 151.68090 7.439934e-35 8.107296e-31 comp429669_c1 3.461429 7.202470 139.75242 3.015465e-32 2.190635e-28 comp425652_c1 -3.747270 5.578305 121.36647 3.176789e-28 1.467673e-24 comp431260_c0 -1.510660 6.227693 121.25102 3.367149e-28 1.467673e-24 comp428540_c0 2.458811 4.676311 95.33553 1.607026e-22 5.837254e-19 comp437641_c0 1.729965 5.735428 89.12381 3.708599e-21 1.010451e-17 comp415376_c0 -3.768543 1.566064 89.12355 3.709096e-21 1.010451e-17 comp419207_c0 -2.096465 7.000344 88.31376 5.585358e-21 1.352525e-17 comp428866_c0 1.503459 4.857282 85.41288 2.421386e-20 5.277168e-17 > > # testing the Group DE at 05h > lrt<-glmLRT(fit,contrast=c(0,0,0,0,-1,1,0,0,0,0)) > topTags(lrt) Coefficient: -1*GroupE:Time05 1*GroupJ:Time05 logFC logCPM LR PValue FDR comp436562_c0 3.559768 8.677505 245.0805 3.068810e-55 6.688165e-51 comp419735_c0 7.941315 6.207378 200.3481 1.753344e-45 1.910619e-41 comp401050_c0 3.464120 4.889617 166.9922 3.358441e-38 2.439795e-34 comp427887_c0 2.392173 6.359053 139.7174 3.069165e-32 1.672235e-28 comp429669_c1 3.377087 7.202470 133.3528 7.569234e-31 2.972487e-27 comp415376_c0 -4.542037 1.566064 133.1979 8.183411e-31 2.972487e-27 comp425652_c1 -3.682664 5.578305 117.3181 2.445272e-27 7.613181e-24 comp430561_c0 -2.003697 5.119557 108.4644 2.126367e-25 5.531729e-22 comp433200_c0 2.289307 4.954039 108.3223 2.284370e-25 5.531729e-22 comp429162_c0 1.995171 3.402911 107.9965 2.692502e-25 5.868040e-22 > > # testing the Group DE at 48h > lrt<-glmLRT(fit,contrast=c(0,0,0,0,0,0,-1,1,0,0)) > topTags(lrt) Coefficient: -1*GroupE:Time48 1*GroupJ:Time48 logFC logCPM LR PValue FDR comp369941_c0 18.541426 9.936255 823.6359 3.919895e-181 8.543020e-177 comp417702_c0 -3.622411 3.990389 207.6927 4.377967e-47 4.770670e-43 comp417690_c0 -3.347656 5.239097 176.9797 2.212611e-40 1.607388e-36 comp436283_c0 -3.851908 1.834438 164.1105 1.430948e-37 7.796518e-34 comp417686_c0 -3.156963 3.568423 141.8250 1.062032e-32 4.629184e-29 comp417690_c1 -3.248463 4.347931 129.4769 5.333285e-30 1.937227e-26 comp423176_c0 -2.346243 5.066883 125.6650 3.640327e-29 1.133390e-25 comp431260_c0 -1.695085 6.227693 121.4132 3.102817e-28 8.452849e-25 comp419207_c0 -2.760753 7.000344 119.4983 8.146313e-28 1.972675e-24 comp435553_c0 1.785259 5.622214 110.4265 7.902139e-26 1.722192e-22 > > # testing the Group DE at 96h > lrt<-glmLRT(fit,contrast=c(0,0,0,0,0,0,0,0,-1,1)) > topTags(lrt) Coefficient: -1*GroupE:Time96 1*GroupJ:Time96 logFC logCPM LR PValue FDR comp369941_c0 17.174042 9.936255 978.8875 6.972860e-215 1.519665e-210 comp431260_c0 -2.005239 6.227693 211.6925 5.869844e-48 6.396369e-44 comp417702_c0 -2.822779 3.990389 155.0349 1.375824e-35 9.994901e-32 comp423176_c0 -2.224759 5.066883 141.4204 1.302007e-32 7.093983e-29 comp419207_c0 -2.629442 7.000344 135.4855 2.585533e-31 1.126982e-27 comp435199_c0 -1.941443 6.089785 134.1832 4.981873e-31 1.809582e-27 comp428866_c0 1.871781 4.857282 132.4752 1.177677e-30 3.666612e-27 comp437434_c0 -1.928556 5.025404 122.7183 1.607256e-28 4.378567e-25 comp437540_c1 -1.719987 7.634490 122.0333 2.269992e-28 5.496911e-25 comp424275_c1 2.829585 2.058209 113.6028 1.592057e-26 3.469729e-23 SO, EXCEPT FOR THE FIRST TIME POINT (00) THE COMPARISONS ARE NOT SIMILAR. I WOULD VERY MUCH RECEIVE COMMENTS ON THIS AND IN PARTICULAR INDICATION(S) ON WHERE I ERR. THANK YOU. JAHN -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Norwegian (Bokm??l)_Norway.1252 LC_CTYPE=Norwegian (Bokm??l)_Norway.1252 LC_MONETARY=Norwegian (Bokm??l)_Norway.1252 LC_NUMERIC=C [5] LC_TIME=Norwegian (Bokm??l)_Norway.1252 attached base packages: [1] splines parallel stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.4.2 limma_3.18.6 DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 IRanges_1.20.6 lme4_1.0-5 [9] MASS_7.3-29 Matrix_1.1-1.1 minqa_1.2.2 nlme_3.1-113 RColorBrewer_1.0-5 RSQLite_0.11.4 stats4_3.0.2 survival_2.37-4 [17] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 -- Sent via the guest posting facility at bioconductor.org.
edgeR edgeR • 1.5k views
ADD COMMENT
0
Entering edit mode

Moderator: same question posted a second time.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 56 minutes ago
WEHI, Melbourne, Australia

Dear Jahn,

The classic and glm approaches do give similar results, but if they were identical then we wouldn't need to keep both in package.  No two methods will give thousands of genes in exactly the same order however.

In this case, though, it doesn't seem to me that you are testing the same comparisons for the classic and glm approaches. All the comparisons that you make in the classic approach are between groups at a fixed time (Group nested within Time) but the factorial model used

   design<-model.matrix(~Group+Group:Time)

has Time nested within Group.  Except at time zero, it doesn't seem to me that you are testing the same comparisons for the classic and glm approaches.

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode

Thank you for your reply, Gordon.
Could I bother you to advice me how to do that - do the same tests under the glm approach as in the classic approach? I would be much obliged and I might learn something. I come from the SAS world and R thinking is still a little cryptic to me.

Best regards,
jahn

ADD REPLY
0
Entering edit mode

Gordon,
Don't worry about my last request. Looking closer into your User's Guide (e.g., section 3.3 ), I believe you actually have addressed these issues. Sorry for the hassle.

Best
jahn

ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6