edgeR glmFit -- how to define and test reduced, additive and interaction fitted models

0

Entering edit mode

Miguel Gallach ▴ 40

@miguel-gallach-5016

Last seen 11.4 years ago

Dear list, I am using edgeR to analyze my RNA-Seq data. My data consist in two factors (Adaptation and Temperature) and two levels per factor, and I want fit full and reduced glm like the next ones: Reduced model fit0 = counts ~ Adaptation Aditive model fit1 = counts ~ Adaptation + Temperature Interaction model fit2 = counts ~ block + treatment + block:treatment That is what I did until now: d = raw.data[,1:8] head (d) R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold FBgn0000003 0 0 0 0 2 8 3 1 FBgn0000008 92 123 80 41 76 135 188 150 FBgn0000014 0 1 2 4 5 8 10 7 FBgn0000015 2 1 0 0 2 3 9 5 FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 FBgn0000018 36 43 10 17 6 16 20 30 Adaptation = factor(c("HotAdapted","HotAdapted","ColdAdapted","ColdAdapted","HotAda pted","HotAdapted","ColdAdapted","ColdAdapted")) Temperature = factor(c("HotCage","HotCage","HotCage","HotCage","ColdCage","ColdCage" ,"ColdCage","ColdCage")) design = model.matrix(~Adaptation + Temperature + Adaptation:Temperature) design (Intercept) AdaptationHotAdapted TemperatureHot 1 1 1 1 2 1 1 1 3 1 0 1 4 1 0 1 5 1 1 0 6 1 1 0 7 1 0 0 8 1 0 0 AdaptationHotAdapted:TemperatureHot 1 1 2 1 3 0 4 0 5 0 6 0 7 0 8 0 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$Adaptation [1] "contr.treatment" attr(,"contrasts")$Temperature [1] "contr.treatment" d.GLM = DGEList(d, group = c("HotAdaptedHotCage", "HotAdaptedHotCage", "ColdAdaptedHotCage", "ColdAdaptedHotCage","HotAdaptedColdCage", "HotAdaptedColdCage", "ColdAdaptedColdCage", "ColdAdaptedColdCage")) Calculating library sizes from column totals. d.GLM = calcNormFactors(d.GLM) d.GLM An object of class "DGEList" $samples group lib.size norm.factors R4.Hot HotAdaptedHot 17409289 0.9881635 R5.Hot HotAdaptedHot 17642552 1.0818144 R9.Hot ColdAdaptedHot 20010974 0.8621807 R10.Hot ColdAdaptedHot 14064143 0.8932791 R4.Cold HotAdaptedCold 11968317 1.0061084 R5.Cold HotAdaptedCold 11072832 1.0523857 R9.Cold ColdAdaptedCold 22386103 1.0520949 R10.Cold ColdAdaptedCold 17408532 1.0903311 $counts R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold FBgn0000003 0 0 0 0 2 8 3 1 FBgn0000008 92 123 80 41 76 135 188 150 FBgn0000014 0 1 2 4 5 8 10 7 FBgn0000015 2 1 0 0 2 3 9 5 FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 14864 more rows ... $all.zeros FBgn0000003 FBgn0000008 FBgn0000014 FBgn0000015 FBgn0000017 FALSE FALSE FALSE FALSE FALSE 14864 more elements ... #According to the manual, DGEList is now ready to be passed to the functions that do the calculations to determine differential expression levels for the genes. However is not possible to achieve statistical significance with fewer than ten counts in total for a tag/gene and we also do not want to waste effort finding spurious DE (such as when a gene is only expressed in one library), so we filter out tags/genes with fewer than 1 count per million in SIX or more libraries. cpm.d.GLM = cpm(d.GLM) d.GLM = d.GLM[rowSums(cpm.d.GLM > 1) >= 2, ] nrow(d.GLM) [1] 9418 #Now the dataset is ready to be analysed for differential expression #The design matrix design (Intercept) AdaptationHotAdapted TemperatureHotCage 1 1 1 1 2 1 1 1 3 1 0 1 4 1 0 1 5 1 1 0 6 1 1 0 7 1 0 0 8 1 0 0 AdaptationHotAdapted:TemperatureHotCage 1 1 2 1 3 0 4 0 5 0 6 0 7 0 8 0 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$Adaptation [1] "contr.treatment" attr(,"contrasts")$Temperature [1] "contr.treatment" #Estimate CR common/trended/tagwise dispersion d.GLM = estimateGLMCommonDisp(d.GLM, design) names(d.GLM) [1] "samples" "counts" "all.zeros" [4] "common.dispersion" d.GLM$common.dispersion [1] 0.04634741 sqrt(d.GLM$common.dispersion) [1] 0.2152845 d.GLM = estimateGLMTrendedDisp(d.GLM, design) Loading required package: splines summary(d.GLM$trended.dispersion) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.03672 0.03843 0.04326 0.05267 0.06134 0.11870 d.GLM = estimateGLMTagwiseDisp(d.GLM, design) d.GLM$prior.n NULL d$prior.n NULL ls() [1] "Adaptation" "cpm.d" "cpm.d.GLM" "d" "d.GLM" [6] "design" "raw.data" "Temperature" d.GLM An object of class "DGEList" $samples group lib.size norm.factors R4.Hot HotAdaptedHotCage 17409289 0.9881635 R5.Hot HotAdaptedHotCage 17642552 1.0818144 R9.Hot ColdAdaptedHotCage 20010974 0.8621807 R10.Hot ColdAdaptedHotCage 14064143 0.8932791 R4.Cold HotAdaptedColdCage 11968317 1.0061084 R5.Cold HotAdaptedColdCage 11072832 1.0523857 R9.Cold ColdAdaptedColdCage 22386103 1.0520949 R10.Cold ColdAdaptedColdCage 17408532 1.0903311 $counts R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold FBgn0000008 92 123 80 41 76 135 188 150 FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 FBgn0000018 36 43 10 17 6 16 20 30 FBgn0000024 51 68 73 42 118 118 242 129 FBgn0000032 1640 1942 2328 1342 755 674 2110 1307 9413 more rows ... $all.zeros FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032 FALSE FALSE FALSE FALSE FALSE 9413 more elements ... $common.dispersion [1] 0.04634741 $trended.dispersion [1] 0.06147752 0.03682037 0.09393240 0.06250704 0.03701067 9413 more elements ... $abundance FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032 -11.915246 -9.183744 -13.519050 -11.966242 -9.300038 9413 more elements ... $bin.dispersion [1] 0.10242476 0.08298402 0.07844827 0.06269934 0.05173652 0.04872388 [7] 0.04284990 0.04557811 0.04009431 0.03637836 0.03769710 0.03569138 [13] 0.03724316 0.03980808 0.04726670 $bin.abundance [1] -13.904034 -13.147003 -12.527444 -12.012132 -11.522218 -11.127957 [7] -10.769579 -10.430591 -10.119638 -9.821485 -9.532052 -9.228034 [13] -8.853866 -8.376654 -7.322181 $tagwise.dispersion [1] 0.05760539 0.03015829 0.10190732 0.05315469 0.03199193 9413 more elements ... getPriorN(d.GLM, design=design) [1] 5 #Fit model with tagwise dispersion glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion) lrt.tgw = glmLRT(d.GLM, glmfit.tgw) topTags(lrt.tgw) Coefficient: AdaptationHotAdapted:TemperatureHotCage logConc logFC LR P.Value FDR FBgn0032285 -8.215481 -5.121465 60.90457 5.990958e-15 5.642284e-11 FBgn0026089 -8.406297 -3.019672 40.81214 1.675890e-10 7.891766e-07 FBgn0039475 -10.039331 -2.757261 39.04361 4.144435e-10 1.301076e-06 FBgn0051641 -10.291549 2.976952 37.46208 9.320768e-10 2.194575e-06 FBgn0058042 -9.665534 2.267317 33.71276 6.388032e-09 1.203250e-05 FBgn0010333 -8.364017 -2.431286 31.28393 2.229167e-08 3.163945e-05 FBgn0030094 -9.930521 -4.244210 31.18012 2.351626e-08 3.163945e-05 FBgn0053653 -11.223515 -2.506126 30.73483 2.958070e-08 3.482388e-05 FBgn0028863 -9.954041 -2.419844 30.13375 4.032520e-08 4.219808e-05 FBgn0037979 -11.777803 -3.211380 29.61620 5.266308e-08 4.959809e-05 If I understand, glmfit.tgw would be the interaction model (f2) I was talking about, wright? My problem is that I do not know now how to define the additive and reduced models to test Additive vs. Reduced and Interaction vs. Additive. Any suggestion? Many thanks, Miguel Gallach [[alternative HTML version deleted]]

edgeR edgeR • 3.4k views

ADD COMMENT • link updated 14.1 years ago by Gordon Smyth 53k • written 14.1 years ago by Miguel Gallach ▴ 40

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Dear Miguel, You want to do two tests. You want to test for Adapation:Temperature interaction (fit2 vs fit1), and you want to test for Temperature adjusted for Adaption (fit1 vs fit0). In edgeR, you conduct these test by fitting the alternative model, the specifying the term to be removed. design2 <- model.matrix(~Adaption*Temperature) design1 <- model.matrix(~Adaption+Temperature) To do genewise tests for the interaction: fit2 <- glmFit(d.GLM,design2) lrt2 <- glmLRT(d.GLM,fit2) topTags(lrt2) To test for Temperature adjusted for Adaption: fit1 <- glmFit(d.GLM,design1) lrt1 <- glmLRT(d.GLM,fit1) topTags(lrt1) In both cases, the last term in the model is the one removed unless you specify otherwise. You do not have to explicitly fit reduced models, because edgeR does this for you automatically. Best wishes Gordon > Date: Fri, 16 Dec 2011 14:48:39 +0100 > From: Miguel Gallach <miguel.gallach at="" vetmeduni.ac.at=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR glmFit -- how to define and test reduced, > additive and interaction fitted models > > Dear list, > > I am using edgeR to analyze my RNA-Seq data. My data consist in two factors > (Adaptation and Temperature) and two levels per factor, and I want fit full > and reduced glm like the next ones: > > Reduced model > fit0 = counts ~ Adaptation > > Aditive model > fit1 = counts ~ Adaptation + Temperature > > Interaction model > fit2 = counts ~ block + treatment + block:treatment > > > That is what I did until now: > d = raw.data[,1:8] > head (d) > R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold > FBgn0000003 0 0 0 0 2 8 3 1 > FBgn0000008 92 123 80 41 76 135 188 150 > FBgn0000014 0 1 2 4 5 8 10 7 > FBgn0000015 2 1 0 0 2 3 9 5 > FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 > FBgn0000018 36 43 10 17 6 16 20 30 > > > Adaptation = > factor(c("HotAdapted","HotAdapted","ColdAdapted","ColdAdapted","HotA dapted","HotAdapted","ColdAdapted","ColdAdapted")) > Temperature = > factor(c("HotCage","HotCage","HotCage","HotCage","ColdCage","ColdCag e","ColdCage","ColdCage")) > design = model.matrix(~Adaptation + Temperature + Adaptation:Temperature) > design > (Intercept) AdaptationHotAdapted TemperatureHot > 1 1 1 1 > 2 1 1 1 > 3 1 0 1 > 4 1 0 1 > 5 1 1 0 > 6 1 1 0 > 7 1 0 0 > 8 1 0 0 > AdaptationHotAdapted:TemperatureHot > 1 1 > 2 1 > 3 0 > 4 0 > 5 0 > 6 0 > 7 0 > 8 0 > attr(,"assign") > [1] 0 1 2 3 > attr(,"contrasts") > attr(,"contrasts")$Adaptation > [1] "contr.treatment" > > attr(,"contrasts")$Temperature > [1] "contr.treatment" > > > d.GLM = DGEList(d, group = c("HotAdaptedHotCage", "HotAdaptedHotCage", > "ColdAdaptedHotCage", "ColdAdaptedHotCage","HotAdaptedColdCage", > "HotAdaptedColdCage", "ColdAdaptedColdCage", "ColdAdaptedColdCage")) > Calculating library sizes from column totals. > d.GLM = calcNormFactors(d.GLM) > d.GLM > An object of class "DGEList" > $samples > group lib.size norm.factors > R4.Hot HotAdaptedHot 17409289 0.9881635 > R5.Hot HotAdaptedHot 17642552 1.0818144 > R9.Hot ColdAdaptedHot 20010974 0.8621807 > R10.Hot ColdAdaptedHot 14064143 0.8932791 > R4.Cold HotAdaptedCold 11968317 1.0061084 > R5.Cold HotAdaptedCold 11072832 1.0523857 > R9.Cold ColdAdaptedCold 22386103 1.0520949 > R10.Cold ColdAdaptedCold 17408532 1.0903311 > > $counts > R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold > FBgn0000003 0 0 0 0 2 8 3 1 > FBgn0000008 92 123 80 41 76 135 188 150 > FBgn0000014 0 1 2 4 5 8 10 7 > FBgn0000015 2 1 0 0 2 3 9 5 > FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 > 14864 more rows ... > > $all.zeros > FBgn0000003 FBgn0000008 FBgn0000014 FBgn0000015 FBgn0000017 > FALSE FALSE FALSE FALSE FALSE > 14864 more elements ... > > > #According to the manual, DGEList is now ready to be passed to the > functions that do the calculations to determine differential expression > levels for the genes. However is not possible to achieve statistical > significance with fewer than ten counts in total for a tag/gene and we also > do not want to waste effort finding spurious DE (such as when a gene is > only expressed in one library), so we filter out tags/genes with fewer than > 1 count per million in SIX or more libraries. > > cpm.d.GLM = cpm(d.GLM) > d.GLM = d.GLM[rowSums(cpm.d.GLM > 1) >= 2, ] > nrow(d.GLM) > [1] 9418 > > > #Now the dataset is ready to be analysed for differential expression > #The design matrix > design > (Intercept) AdaptationHotAdapted TemperatureHotCage > 1 1 1 1 > 2 1 1 1 > 3 1 0 1 > 4 1 0 1 > 5 1 1 0 > 6 1 1 0 > 7 1 0 0 > 8 1 0 0 > AdaptationHotAdapted:TemperatureHotCage > 1 1 > 2 1 > 3 0 > 4 0 > 5 0 > 6 0 > 7 0 > 8 0 > attr(,"assign") > [1] 0 1 2 3 > attr(,"contrasts") > attr(,"contrasts")$Adaptation > [1] "contr.treatment" > > attr(,"contrasts")$Temperature > [1] "contr.treatment" > > > #Estimate CR common/trended/tagwise dispersion > d.GLM = estimateGLMCommonDisp(d.GLM, design) > names(d.GLM) > [1] "samples" "counts" "all.zeros" > [4] "common.dispersion" > > d.GLM$common.dispersion > [1] 0.04634741 > > sqrt(d.GLM$common.dispersion) > [1] 0.2152845 > > d.GLM = estimateGLMTrendedDisp(d.GLM, design) > Loading required package: splines > > summary(d.GLM$trended.dispersion) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.03672 0.03843 0.04326 0.05267 0.06134 0.11870 > > d.GLM = estimateGLMTagwiseDisp(d.GLM, design) > > d.GLM$prior.n > NULL > > d$prior.n > NULL > > ls() > [1] "Adaptation" "cpm.d" "cpm.d.GLM" "d" "d.GLM" > [6] "design" "raw.data" "Temperature" > > > d.GLM > An object of class "DGEList" > $samples > group lib.size norm.factors > R4.Hot HotAdaptedHotCage 17409289 0.9881635 > R5.Hot HotAdaptedHotCage 17642552 1.0818144 > R9.Hot ColdAdaptedHotCage 20010974 0.8621807 > R10.Hot ColdAdaptedHotCage 14064143 0.8932791 > R4.Cold HotAdaptedColdCage 11968317 1.0061084 > R5.Cold HotAdaptedColdCage 11072832 1.0523857 > R9.Cold ColdAdaptedColdCage 22386103 1.0520949 > R10.Cold ColdAdaptedColdCage 17408532 1.0903311 > > $counts > R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold > FBgn0000008 92 123 80 41 76 135 188 150 > FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 > FBgn0000018 36 43 10 17 6 16 20 30 > FBgn0000024 51 68 73 42 118 118 242 129 > FBgn0000032 1640 1942 2328 1342 755 674 2110 1307 > 9413 more rows ... > > $all.zeros > FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032 > FALSE FALSE FALSE FALSE FALSE > 9413 more elements ... > > $common.dispersion > [1] 0.04634741 > > $trended.dispersion > [1] 0.06147752 0.03682037 0.09393240 0.06250704 0.03701067 > 9413 more elements ... > > $abundance > FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032 > -11.915246 -9.183744 -13.519050 -11.966242 -9.300038 > 9413 more elements ... > > $bin.dispersion > [1] 0.10242476 0.08298402 0.07844827 0.06269934 0.05173652 0.04872388 > [7] 0.04284990 0.04557811 0.04009431 0.03637836 0.03769710 0.03569138 > [13] 0.03724316 0.03980808 0.04726670 > > $bin.abundance > [1] -13.904034 -13.147003 -12.527444 -12.012132 -11.522218 -11.127957 > [7] -10.769579 -10.430591 -10.119638 -9.821485 -9.532052 -9.228034 > [13] -8.853866 -8.376654 -7.322181 > > $tagwise.dispersion > [1] 0.05760539 0.03015829 0.10190732 0.05315469 0.03199193 > 9413 more elements ... > > getPriorN(d.GLM, design=design) > [1] 5 > > #Fit model with tagwise dispersion > glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion) > lrt.tgw = glmLRT(d.GLM, glmfit.tgw) > topTags(lrt.tgw) > Coefficient: AdaptationHotAdapted:TemperatureHotCage > logConc logFC LR P.Value FDR > FBgn0032285 -8.215481 -5.121465 60.90457 5.990958e-15 5.642284e-11 > FBgn0026089 -8.406297 -3.019672 40.81214 1.675890e-10 7.891766e-07 > FBgn0039475 -10.039331 -2.757261 39.04361 4.144435e-10 1.301076e-06 > FBgn0051641 -10.291549 2.976952 37.46208 9.320768e-10 2.194575e-06 > FBgn0058042 -9.665534 2.267317 33.71276 6.388032e-09 1.203250e-05 > FBgn0010333 -8.364017 -2.431286 31.28393 2.229167e-08 3.163945e-05 > FBgn0030094 -9.930521 -4.244210 31.18012 2.351626e-08 3.163945e-05 > FBgn0053653 -11.223515 -2.506126 30.73483 2.958070e-08 3.482388e-05 > FBgn0028863 -9.954041 -2.419844 30.13375 4.032520e-08 4.219808e-05 > FBgn0037979 -11.777803 -3.211380 29.61620 5.266308e-08 4.959809e-05 > > > If I understand, glmfit.tgw would be the interaction model (f2) I was > talking about, wright? My problem is that I do not know now how to define > the additive and reduced models to test Additive vs. Reduced and > Interaction vs. Additive. > > Any suggestion? > > Many thanks, > > Miguel Gallach ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 14.1 years ago Gordon Smyth 53k

0

Entering edit mode

Is there a way to extract simple stats like sd, cv, mean, variance for each probeset from lmfit objects? Is there any other way to extract the info for a large experiment with multiple factors and levels? Any suggestions would be greatly appreciated. Thanks, Som. > Date: Mon, 19 Dec 2011 09:41:15 +1100 > From: smyth@wehi.EDU.AU > To: miguel.gallach@vetmeduni.ac.at > CC: bioconductor@r-project.org > Subject: [BioC] edgeR glmFit -- how to define and test reduced, additive and interaction fitted models > > Dear Miguel, > > You want to do two tests. You want to test for Adapation:Temperature > interaction (fit2 vs fit1), and you want to test for Temperature adjusted > for Adaption (fit1 vs fit0). In edgeR, you conduct these test by fitting > the alternative model, the specifying the term to be removed. > > design2 <- model.matrix(~Adaption*Temperature) > design1 <- model.matrix(~Adaption+Temperature) > > To do genewise tests for the interaction: > > fit2 <- glmFit(d.GLM,design2) > lrt2 <- glmLRT(d.GLM,fit2) > topTags(lrt2) > > To test for Temperature adjusted for Adaption: > > fit1 <- glmFit(d.GLM,design1) > lrt1 <- glmLRT(d.GLM,fit1) > topTags(lrt1) > > In both cases, the last term in the model is the one removed unless you > specify otherwise. You do not have to explicitly fit reduced models, > because edgeR does this for you automatically. > > Best wishes > Gordon > > > Date: Fri, 16 Dec 2011 14:48:39 +0100 > > From: Miguel Gallach <miguel.gallach@vetmeduni.ac.at> > > To: bioconductor@r-project.org > > Subject: [BioC] edgeR glmFit -- how to define and test reduced, > > additive and interaction fitted models > > > > Dear list, > > > > I am using edgeR to analyze my RNA-Seq data. My data consist in two factors > > (Adaptation and Temperature) and two levels per factor, and I want fit full > > and reduced glm like the next ones: > > > > Reduced model > > fit0 = counts ~ Adaptation > > > > Aditive model > > fit1 = counts ~ Adaptation + Temperature > > > > Interaction model > > fit2 = counts ~ block + treatment + block:treatment > > > > > > That is what I did until now: > > d = raw.data[,1:8] > > head (d) > > R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold > > FBgn0000003 0 0 0 0 2 8 3 1 > > FBgn0000008 92 123 80 41 76 135 188 150 > > FBgn0000014 0 1 2 4 5 8 10 7 > > FBgn0000015 2 1 0 0 2 3 9 5 > > FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 > > FBgn0000018 36 43 10 17 6 16 20 30 > > > > > > Adaptation = > > factor(c("HotAdapted","HotAdapted","ColdAdapted","ColdAdapted","Ho tAdapted","HotAdapted","ColdAdapted","ColdAdapted")) > > Temperature = > > factor(c("HotCage","HotCage","HotCage","HotCage","ColdCage","ColdC age","ColdCage","ColdCage")) > > design = model.matrix(~Adaptation + Temperature + Adaptation:Temperature) > > design > > (Intercept) AdaptationHotAdapted TemperatureHot > > 1 1 1 1 > > 2 1 1 1 > > 3 1 0 1 > > 4 1 0 1 > > 5 1 1 0 > > 6 1 1 0 > > 7 1 0 0 > > 8 1 0 0 > > AdaptationHotAdapted:TemperatureHot > > 1 1 > > 2 1 > > 3 0 > > 4 0 > > 5 0 > > 6 0 > > 7 0 > > 8 0 > > attr(,"assign") > > [1] 0 1 2 3 > > attr(,"contrasts") > > attr(,"contrasts")$Adaptation > > [1] "contr.treatment" > > > > attr(,"contrasts")$Temperature > > [1] "contr.treatment" > > > > > > d.GLM = DGEList(d, group = c("HotAdaptedHotCage", "HotAdaptedHotCage", > > "ColdAdaptedHotCage", "ColdAdaptedHotCage","HotAdaptedColdCage", > > "HotAdaptedColdCage", "ColdAdaptedColdCage", "ColdAdaptedColdCage")) > > Calculating library sizes from column totals. > > d.GLM = calcNormFactors(d.GLM) > > d.GLM > > An object of class "DGEList" > > $samples > > group lib.size norm.factors > > R4.Hot HotAdaptedHot 17409289 0.9881635 > > R5.Hot HotAdaptedHot 17642552 1.0818144 > > R9.Hot ColdAdaptedHot 20010974 0.8621807 > > R10.Hot ColdAdaptedHot 14064143 0.8932791 > > R4.Cold HotAdaptedCold 11968317 1.0061084 > > R5.Cold HotAdaptedCold 11072832 1.0523857 > > R9.Cold ColdAdaptedCold 22386103 1.0520949 > > R10.Cold ColdAdaptedCold 17408532 1.0903311 > > > > $counts > > R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold > > FBgn0000003 0 0 0 0 2 8 3 1 > > FBgn0000008 92 123 80 41 76 135 188 150 > > FBgn0000014 0 1 2 4 5 8 10 7 > > FBgn0000015 2 1 0 0 2 3 9 5 > > FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 > > 14864 more rows ... > > > > $all.zeros > > FBgn0000003 FBgn0000008 FBgn0000014 FBgn0000015 FBgn0000017 > > FALSE FALSE FALSE FALSE FALSE > > 14864 more elements ... > > > > > > #According to the manual, DGEList is now ready to be passed to the > > functions that do the calculations to determine differential expression > > levels for the genes. However is not possible to achieve statistical > > significance with fewer than ten counts in total for a tag/gene and we also > > do not want to waste effort finding spurious DE (such as when a gene is > > only expressed in one library), so we filter out tags/genes with fewer than > > 1 count per million in SIX or more libraries. > > > > cpm.d.GLM = cpm(d.GLM) > > d.GLM = d.GLM[rowSums(cpm.d.GLM > 1) >= 2, ] > > nrow(d.GLM) > > [1] 9418 > > > > > > #Now the dataset is ready to be analysed for differential expression > > #The design matrix > > design > > (Intercept) AdaptationHotAdapted TemperatureHotCage > > 1 1 1 1 > > 2 1 1 1 > > 3 1 0 1 > > 4 1 0 1 > > 5 1 1 0 > > 6 1 1 0 > > 7 1 0 0 > > 8 1 0 0 > > AdaptationHotAdapted:TemperatureHotCage > > 1 1 > > 2 1 > > 3 0 > > 4 0 > > 5 0 > > 6 0 > > 7 0 > > 8 0 > > attr(,"assign") > > [1] 0 1 2 3 > > attr(,"contrasts") > > attr(,"contrasts")$Adaptation > > [1] "contr.treatment" > > > > attr(,"contrasts")$Temperature > > [1] "contr.treatment" > > > > > > #Estimate CR common/trended/tagwise dispersion > > d.GLM = estimateGLMCommonDisp(d.GLM, design) > > names(d.GLM) > > [1] "samples" "counts" "all.zeros" > > [4] "common.dispersion" > > > > d.GLM$common.dispersion > > [1] 0.04634741 > > > > sqrt(d.GLM$common.dispersion) > > [1] 0.2152845 > > > > d.GLM = estimateGLMTrendedDisp(d.GLM, design) > > Loading required package: splines > > > > summary(d.GLM$trended.dispersion) > > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 0.03672 0.03843 0.04326 0.05267 0.06134 0.11870 > > > > d.GLM = estimateGLMTagwiseDisp(d.GLM, design) > > > > d.GLM$prior.n > > NULL > > > > d$prior.n > > NULL > > > > ls() > > [1] "Adaptation" "cpm.d" "cpm.d.GLM" "d" "d.GLM" > > [6] "design" "raw.data" "Temperature" > > > > > > d.GLM > > An object of class "DGEList" > > $samples > > group lib.size norm.factors > > R4.Hot HotAdaptedHotCage 17409289 0.9881635 > > R5.Hot HotAdaptedHotCage 17642552 1.0818144 > > R9.Hot ColdAdaptedHotCage 20010974 0.8621807 > > R10.Hot ColdAdaptedHotCage 14064143 0.8932791 > > R4.Cold HotAdaptedColdCage 11968317 1.0061084 > > R5.Cold HotAdaptedColdCage 11072832 1.0523857 > > R9.Cold ColdAdaptedColdCage 22386103 1.0520949 > > R10.Cold ColdAdaptedColdCage 17408532 1.0903311 > > > > $counts > > R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold > > FBgn0000008 92 123 80 41 76 135 188 150 > > FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665 > > FBgn0000018 36 43 10 17 6 16 20 30 > > FBgn0000024 51 68 73 42 118 118 242 129 > > FBgn0000032 1640 1942 2328 1342 755 674 2110 1307 > > 9413 more rows ... > > > > $all.zeros > > FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032 > > FALSE FALSE FALSE FALSE FALSE > > 9413 more elements ... > > > > $common.dispersion > > [1] 0.04634741 > > > > $trended.dispersion > > [1] 0.06147752 0.03682037 0.09393240 0.06250704 0.03701067 > > 9413 more elements ... > > > > $abundance > > FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032 > > -11.915246 -9.183744 -13.519050 -11.966242 -9.300038 > > 9413 more elements ... > > > > $bin.dispersion > > [1] 0.10242476 0.08298402 0.07844827 0.06269934 0.05173652 0.04872388 > > [7] 0.04284990 0.04557811 0.04009431 0.03637836 0.03769710 0.03569138 > > [13] 0.03724316 0.03980808 0.04726670 > > > > $bin.abundance > > [1] -13.904034 -13.147003 -12.527444 -12.012132 -11.522218 -11.127957 > > [7] -10.769579 -10.430591 -10.119638 -9.821485 -9.532052 -9.228034 > > [13] -8.853866 -8.376654 -7.322181 > > > > $tagwise.dispersion > > [1] 0.05760539 0.03015829 0.10190732 0.05315469 0.03199193 > > 9413 more elements ... > > > > getPriorN(d.GLM, design=design) > > [1] 5 > > > > #Fit model with tagwise dispersion > > glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion) > > lrt.tgw = glmLRT(d.GLM, glmfit.tgw) > > topTags(lrt.tgw) > > Coefficient: AdaptationHotAdapted:TemperatureHotCage > > logConc logFC LR P.Value FDR > > FBgn0032285 -8.215481 -5.121465 60.90457 5.990958e-15 5.642284e-11 > > FBgn0026089 -8.406297 -3.019672 40.81214 1.675890e-10 7.891766e-07 > > FBgn0039475 -10.039331 -2.757261 39.04361 4.144435e-10 1.301076e-06 > > FBgn0051641 -10.291549 2.976952 37.46208 9.320768e-10 2.194575e-06 > > FBgn0058042 -9.665534 2.267317 33.71276 6.388032e-09 1.203250e-05 > > FBgn0010333 -8.364017 -2.431286 31.28393 2.229167e-08 3.163945e-05 > > FBgn0030094 -9.930521 -4.244210 31.18012 2.351626e-08 3.163945e-05 > > FBgn0053653 -11.223515 -2.506126 30.73483 2.958070e-08 3.482388e-05 > > FBgn0028863 -9.954041 -2.419844 30.13375 4.032520e-08 4.219808e-05 > > FBgn0037979 -11.777803 -3.211380 29.61620 5.266308e-08 4.959809e-05 > > > > > > If I understand, glmfit.tgw would be the interaction model (f2) I was > > talking about, wright? My problem is that I do not know now how to define > > the additive and reduced models to test Additive vs. Reduced and > > Interaction vs. Additive. > > > > Any suggestion? > > > > Many thanks, > > > > Miguel Gallach > > ______________________________________________________________________ > The information in this email is confidential and intend...{{dropped:4}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 14.1 years ago fire1976 wyoming ▴ 380

0

Entering edit mode

Please see Sections 10.1 and 10.2 of the limma User's Guide. You can compute the statistics you want from the provided components. Gordon On Mon, 19 Dec 2011, somnath bandyopadhyay wrote: > Is there a way to extract simple stats like sd, cv, mean, variance for each probeset from lmfit objects? Is there any other way to extract the info for a large experiment with multiple factors and levels? Any suggestions would be greatly appreciated. > Thanks, > Som. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 14.1 years ago Gordon Smyth 53k

0

Entering edit mode

Hi Gordon, Thanks for your prompt reply. I have looked at the sections you mentioned in your email. What I am looking for is more of standard statistcis for individual factors even before doing a contrast. If I go with the linear model fit (the first fit before the contrast matrix), is there a way to pull the mean, sd, cv, variance etc. out of the fitted object for each factor? I am looking at a large gene expression study using afymetrix plus 2 chips. I have 4 factors (blood compartment, gender, time, subject) and I want to look at the variability of each of the factors on the basis of the entire chip: e.g. by boxplots of cvs of all probesets for each factor. Can I make use of limma to get this information? Any suggestions would be greatly appreciated. Thanks, Som. > Date: Tue, 20 Dec 2011 09:39:24 +1100 > From: smyth@wehi.EDU.AU > To: genome1976@hotmail.com > CC: bioconductor@r-project.org > Subject: Re: LIMMA: extracting simple stats for factors and levels from lmfit > > Please see Sections 10.1 and 10.2 of the limma User's Guide. You can > compute the statistics you want from the provided components. > > Gordon > > On Mon, 19 Dec 2011, somnath bandyopadhyay wrote: > > > Is there a way to extract simple stats like sd, cv, mean, variance for each probeset from lmfit objects? Is there any other way to extract the info for a large experiment with multiple factors and levels? Any suggestions would be greatly appreciated. > > Thanks, > > Som. > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:9}}

ADD REPLY • link 14.1 years ago fire1976 wyoming ▴ 380

Login before adding your answer.