creating contrast matrix (limma) for two factorial in R

0

Entering edit mode

Ryan M Ghan ▴ 20

@ryan-m-ghan-6448

Last seen 9.6 years ago

Hello limma community- I am attempting to construct a contrast matrix that I can run in R, using the limma bioconductor package, but I am not sure that I have coded the contrast matrix correctly. A previous post<https: stats.stackexchange.com="" questions="" 64249="" creating-="" contrast-matrix-for-linear-regression-="" in-r?newreg="add2674ca9d04b7eb85fad255b45b7f5"> on stats.stackexchange.com, the bioconductor mailing list, and the limma guide were helpful, but my two factorial design is more complicated than what is illustrated there. The first factor is the treatment, with two levels (control=c and stress=s), and the second factor is the genotype, with five levels (g1, g2, g3, g4, g5). Each genotype/treatment consists of 3-biological replicates (30xsamples total). My dataset has already been normalized and log2 transformed (Should I even start with log transformed data?). It consists of 1208 proteins (based upon spectral counting for those that care) that measures protein abundance differences in the five genotypes and two treatments. The dataset is complete, meaning each sample/condition has a datapoint, and appears to be normally distributed (histogram and qqplots). ## Session information > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.18.13 loaded via a namespace (and not attached): [1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4 ggplot2_0.9.3.1 grid_3.0.2 gtable_0.1.2 [7] labeling_0.2 MASS_7.3-30 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 [13] Rcpp_0.11.0 reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.2 ## Subset of the data: proteinID g1.s1 g1.s2 g1.s3 g1.c1 g1.c2 g1.c3 g2.s1 g2.s2 g2.s3 g2.c1 g2.c2 g2.c3 g3.s1 g3.s2 g3.s3 g3.c1 g3.c2 g3.c3 g4.s1 g4.s2 g4.s3 g4.c1 g4.c2 g4.c3 g5.s1 g5.s2 g5.s3 g5.c1 g5.c2 g5.c3 prot1 -9.70583694 -9.940059478 -9.764489183 -9.691937821 -9.547306096 -9.668928704 -9.821333234 -10.00376839 -9.843380585 -10.0789111 -9.958506961 -9.791583706 -10.04996359 -10.10279896 -10.0689715 -9.989303332 -10.05414639 -10.00619809 -9.907032795 -10.09700113 -10.00902876 -10.05603575 -10.26218387 -10.15527373 -9.88009858 -9.748974338 -9.730010667 -9.899956956 -9.773955101 -9.957684691 prot2 -9.810354967 -9.844319231 -9.896748977 -9.777040294 -9.821308434 -9.906798728 -9.832236541 -9.876359355 -9.935535795 -10.05991278 -9.831098077 -9.789738587 -10.08470861 -10.18515166 -10.10371621 -10.01971224 -9.977142493 -10.09055782 -9.739831978 -9.586647999 -9.949407778 -9.800183583 -9.83900565 -9.943521592 -9.99229056 -9.744850134 -9.794814509 -9.98542989 -9.766324886 -9.95430439 prot3 -11.70842601 -11.72521838 -11.90389475 -11.98273998 -11.915401 -11.88620205 -11.91603643 -11.96029519 -12.14926486 -12.23846499 -12.26650985 -11.84300821 -12.64562082 -12.41471031 -12.66462278 -12.577619 -12.90001898 -12.31577711 -11.66323243 -11.50283992 -11.4844068 -11.60402491 -11.95270942 -11.68245512 -12.32380181 -12.24294758 -12.23990879 -12.21563403 -12.33730369 -12.437377 prot4 -10.88942769 -11.16906693 -11.13942576 -11.31332257 -11.04718433 -11.11811122 -11.17687812 -11.12503828 -10.9724186 -11.16837945 -11.19642214 -10.96468249 -11.3975887 -11.28808753 -11.32778647 -11.34124725 -11.30972182 -11.29564372 -10.74370929 -10.92223539 -10.97733154 -11.40528844 -11.1238659 -11.15938598 -11.24937805 -10.8691392 -11.12478375 -10.75566728 -10.99485703 -11.09493115 prot5 -10.0102959 -9.936796529 -9.964629149 -9.842835973 -9.791578592 -9.773380518 -9.72290866 -9.715837804 -9.79028651 -9.951486129 -9.636225505 -9.820715987 -10.41899204 -10.25269382 -10.26949484 -10.02644184 -10.13120897 -10.20756299 -9.752087376 -9.687001368 -10.07111473 -9.815279198 -9.995624174 -9.993526894 -9.722360141 -9.551502595 -9.551929198 -9.724500546 -9.502769792 -9.65324573 prot6 -10.34051005 -10.27571947 -10.14968761 -10.17419023 -10.47812301 -10.11019796 -10.40447672 -10.15885481 -10.22900798 -10.26612428 -10.21920493 -10.17186677 -10.66125689 -10.95438025 -10.63751536 -10.65825783 -10.60857688 -10.78516027 -10.33890785 -10.49726978 -10.47100414 -10.64742463 -10.78932619 -10.5318634 -10.26494688 -9.975182247 -10.24870036 -10.2356165 -10.26689552 -10.13061368 prot7 -10.24930429 -10.37307132 -10.03573128 -10.29985129 -9.991216794 -10.05854902 -10.1958704 -10.30549818 -10.2078462 -10.28795766 -10.23314344 -10.23897922 -9.997472306 -10.27461285 -10.20805608 -10.06261332 -10.24876706 -10.12643737 -9.906088449 -10.07316322 -10.23545822 -10.30970717 -10.40745591 -10.36432166 -10.22423532 -10.25703553 -10.44925268 -9.902554721 -9.891163766 -10.0695915 prot8 -10.98782595 -10.84184533 -10.76496107 -10.68290092 -10.55763113 -10.91736394 -10.87505278 -10.76474268 -10.58319007 -10.87547281 -10.71948079 -10.95011831 -10.99753277 -11.061728 -10.8852958 -10.86371208 -10.96638746 -11.24112703 -10.46809937 -10.78446288 -10.71240489 -10.80931259 -10.6598091 -10.54801115 -10.70612733 -10.7339808 -10.8184854 -10.53370359 -10.47323989 -10.62675183 prot9 -8.83857166 -8.736344638 -8.743339515 -8.8152675 -8.743086044 -8.719612156 -8.898093257 -8.902781886 -9.071574958 -8.945970659 -8.862394746 -8.825061244 -8.82313363 -9.161452294 -8.905846232 -8.940119002 -9.024995852 -8.943721201 -8.768488159 -8.802155458 -8.721187011 -8.84850416 -8.931513624 -8.86743278 -8.856904592 -8.675257846 -8.900833162 -8.676117406 -8.758661701 -8.925717389 prot10 -10.65297508 -10.74532307 -10.65940071 -10.36671791 -10.50431649 -10.54915637 -11.07154003 -10.79884265 -10.97164196 -11.1201714 -11.14821342 -10.9254445 -10.92875918 -10.90806369 -10.77581175 -11.2324716 -11.31360896 -11.01070959 -11.04450945 -10.89694291 -10.76865867 -10.92983387 -11.07365287 -11.43888216 -11.14948441 -10.69611194 -10.85827316 -10.64470128 -10.79046792 -10.86048168 ## Code that I am attempting to utilize: proteins.mat <- as.matrix(proteins.df) treat = c("g1.s","g1.c","g2.s","g2.c","g3.s","g3.c","g4.s","g4.c", "g5.s","g5.c") factors = gl(10,3,labels=treat) design <- model.matrix(~0+factors) colnames(design) <- treat ## Here is the design for my model: > design g1.s g1.c g2.s g2.c g3.s g3.c g4.s g4.c g5.s g5.c 1 1 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 0 0 0 4 0 1 0 0 0 0 0 0 0 0 5 0 1 0 0 0 0 0 0 0 0 6 0 1 0 0 0 0 0 0 0 0 7 0 0 1 0 0 0 0 0 0 0 8 0 0 1 0 0 0 0 0 0 0 9 0 0 1 0 0 0 0 0 0 0 10 0 0 0 1 0 0 0 0 0 0 11 0 0 0 1 0 0 0 0 0 0 12 0 0 0 1 0 0 0 0 0 0 13 0 0 0 0 1 0 0 0 0 0 14 0 0 0 0 1 0 0 0 0 0 15 0 0 0 0 1 0 0 0 0 0 16 0 0 0 0 0 1 0 0 0 0 17 0 0 0 0 0 1 0 0 0 0 18 0 0 0 0 0 1 0 0 0 0 19 0 0 0 0 0 0 1 0 0 0 20 0 0 0 0 0 0 1 0 0 0 21 0 0 0 0 0 0 1 0 0 0 22 0 0 0 0 0 0 0 1 0 0 23 0 0 0 0 0 0 0 1 0 0 24 0 0 0 0 0 0 0 1 0 0 25 0 0 0 0 0 0 0 0 1 0 26 0 0 0 0 0 0 0 0 1 0 27 0 0 0 0 0 0 0 0 1 0 28 0 0 0 0 0 0 0 0 0 1 29 0 0 0 0 0 0 0 0 0 1 30 0 0 0 0 0 0 0 0 0 1 attr(,"assign") [1] 1 1 1 1 1 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$factors [1] "contr.treatment" ## My contrast model. I want to test for interaction, differences between genotypes, and to see if specific genotypes respond differently to the treatment from one another: cmtx <- makeContrasts( GenotypevsTreatment=(g1.s-g1.c)-(g2.s-g2.c)-(g3.s-g3.c)-(g4.s-g4 .c)-(g5.s-g5.c), genotype=(g1.s+g1.c)-(g2.s+g2.c)-(g3.s+g3.c)-(g4.s+g4.c)-(g5.s+g5.c), Treatment=(g1.s+g2.s+g3.s+g4.s+g5.s)-(g1.c+g2.c+g3.c+g4.c+g5.c), levels=design) ## What my contrast model looks like, but I don't think this is correct: > cmtx Contrasts Levels GenotypevsTreatment Genotype Treatment g1.s 1 1 1 g1.c -1 1 -1 g2.s -1 -1 1 g2.c 1 -1 -1 g3.s -1 -1 1 g3.c 1 -1 -1 g4.s -1 -1 1 g4.c 1 -1 -1 g5.s -1 -1 1 g5.c 1 -1 -1 ## Fitting the linear model by empirical bayes statistics for differential expression: fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design), cmtx)) topTable(fit, adjust.method="BH") ## The below topTable proteins are the same as the subset of data from above: > topTable(fit, adjust.method="BH") GenotypevsTreatment Genotype Treatment AveExpr F P.Value adj.P.Val prot1 -0.40786338 60.30918 0.073054723 -9.918822 17308.55 1.124646e-39 1.232079e-36 prot2 -0.09255219 59.60864 0.061701713 -9.897968 15801.43 3.304533e-39 1.232079e-36 prot3 -0.23880357 73.48557 0.536672827 -12.090016 15650.65 3.701463e-39 1.232079e-36 prot4 -0.11834000 66.76931 0.305471823 -11.122034 15522.46 4.079731e-39 1.232079e-36 prot5 -0.15210172 59.21509 -0.183849274 -9.876144 14734.51 7.556112e-39 1.423908e-36 prot6 -0.15761118 62.87467 0.155340561 -10.389362 14565.87 8.658504e-39 1.423908e-36 prot7 -0.03886438 61.15652 -0.166795475 -10.182834 14551.88 8.757515e-39 1.423908e-36 prot8 -0.10425341 64.63523 -0.186904167 -10.780359 14461.18 9.429854e-39 1.423908e-36 prot9 -0.03426380 53.48057 0.007403722 -8.854471 13713.49 1.767090e-38 2.021378e-36 prot10 -0.75250251 66.62646 0.327497120 -10.894506 13480.51 2.164184e-38 2.021378e-36 I think I am naively constructing the contrast matrix. The result for Genotype (above) looks incorrect to me. Ideally, I would like to figure this out because I wish to apply similar modeling techniques to an RNAseq dataset from the same experimental samples that I used for my protein work. Also, in order to make the correct comparisons, do I also need to employ the duplicateCorrelation function (pg. 50, lima guide 9.7 Multi-level Experiments)? Something like this: corfit <- duplicateCorrelation(proteins.mat, design) > corfit$consensus [1] -0.7888584 fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design, corrleation=corfit$consensus), cmtx)) Any insight/corrections would be greatly appreciated, and if I have forgotten some information that would aid in helping me with this problem I am happy to cooperate. Cheers, Ryan Ryan M. Ghan Graduate Research Assistant Department of Biochemistry and Molecular Biology, MS 330 University of Nevada, Reno Reno, NV 89557 Howard 205 Lab: (775) 784-4225 rghan@unr.edu [[alternative HTML version deleted]]

RNASeq limma RNASeq limma • 2.4k views

ADD COMMENT • link updated 10.1 years ago by Ryan C. Thompson ★ 7.9k • written 10.1 years ago by Ryan M Ghan ▴ 20

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

Hi Ryan, Your design matrix is fine. However, your mistake is trying to stuff each one of your tests into a single contrast. As an example, for the interaction test, you are testing the interaction between a 5-level factor and a 2-level factor, so the test has 4 degrees of freedom. This means that you'll need 4 contrasts to represent this test: interaction.contrast.exprs <- sprintf("(g%s.s-g%s.c) - (g1.s-g1.c)", 2:5, 2:5) interaction.contrast.matrix <- makeContrasts(contrasts=interaction.contrast.exprs, levels=design) The resulting contrast matrix should have 4 columns, one for each contrast in your test. For your other two tests, you should be aware that the expression "(g1.s + g1.c)/2" means "an equal mix of control and stressed", which is probably not what you want. I would say that it makes more sense to just compare the genotypes within a single condition. To test for genotype differences in the control condition, this is again a 4-degree-of-freedom test, so you'll need a similar approach to the above: control.genotype.contrast.exprs <- sprintf("g%s.c-g1.c", 2:5, 2:5) control.genotype.contrast.matrix <- makeContrasts(contrasts=control.genotype.contrast.exprs, levels=design) You could also to the same with the stressed condition as a separate test. The same caveat applies to your treatment contrast, which is only correct if you want to know the expression differences for control vs stressed in an equal mix of all 5 genotypes (i.e. a hypothetical population with 20% of each genotype). Again, the best approach is probably to test control vs stressed in each genotype separately, i.e. 5 tests of one contrast each. Lastly, since you said you are working with spectral count data, you might want to consider using the edgeR or DESeq2 packages, which are specifically designed for analyzing count data, or else to use limma's voom function to perform the log2-transform while accounting for heteroskedasticity. Anyway, that's my understanding. I hope it helps you. Anyone else can feel free to correct me if I've made any errors in my explanation. -Ryan Thompson On Thu Mar 13 15:14:20 2014, Ryan M Ghan wrote: > Hello limma community- > > I am attempting to construct a contrast matrix that I can run in R, using the limma bioconductor package, but I am not sure that I have coded the contrast matrix correctly. A previous post<https: stats.stackexchange.com="" questions="" 64249="" creating-="" contrast-matrix-for-linear-regression-="" in-r?newreg="add2674ca9d04b7eb85fad255b45b7f5"> on stats.stackexchange.com, the bioconductor mailing list, and the limma guide were helpful, but my two factorial design is more complicated than what is illustrated there. > > The first factor is the treatment, with two levels (control=c and stress=s), and the second factor is the genotype, with five levels (g1, g2, g3, g4, g5). Each genotype/treatment consists of 3-biological replicates (30xsamples total). My dataset has already been normalized and log2 transformed (Should I even start with log transformed data?). It consists of 1208 proteins (based upon spectral counting for those that care) that measures protein abundance differences in the five genotypes and two treatments. The dataset is complete, meaning each sample/condition has a datapoint, and appears to be normally distributed (histogram and qqplots). > > ## Session information >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.18.13 > > loaded via a namespace (and not attached): > [1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4 ggplot2_0.9.3.1 grid_3.0.2 gtable_0.1.2 > [7] labeling_0.2 MASS_7.3-30 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 > [13] Rcpp_0.11.0 reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.2 > > ## Subset of the data: > proteinID g1.s1 g1.s2 g1.s3 g1.c1 g1.c2 g1.c3 g2.s1 g2.s2 g2.s3 g2.c1 g2.c2 g2.c3 g3.s1 g3.s2 g3.s3 g3.c1 g3.c2 g3.c3 g4.s1 g4.s2 g4.s3 g4.c1 g4.c2 g4.c3 g5.s1 g5.s2 g5.s3 g5.c1 g5.c2 g5.c3 > prot1 -9.70583694 -9.940059478 -9.764489183 -9.691937821 -9.547306096 -9.668928704 -9.821333234 -10.00376839 -9.843380585 -10.0789111 -9.958506961 -9.791583706 -10.04996359 -10.10279896 -10.0689715 -9.989303332 -10.05414639 -10.00619809 -9.907032795 -10.09700113 -10.00902876 -10.05603575 -10.26218387 -10.15527373 -9.88009858 -9.748974338 -9.730010667 -9.899956956 -9.773955101 -9.957684691 > prot2 -9.810354967 -9.844319231 -9.896748977 -9.777040294 -9.821308434 -9.906798728 -9.832236541 -9.876359355 -9.935535795 -10.05991278 -9.831098077 -9.789738587 -10.08470861 -10.18515166 -10.10371621 -10.01971224 -9.977142493 -10.09055782 -9.739831978 -9.586647999 -9.949407778 -9.800183583 -9.83900565 -9.943521592 -9.99229056 -9.744850134 -9.794814509 -9.98542989 -9.766324886 -9.95430439 > prot3 -11.70842601 -11.72521838 -11.90389475 -11.98273998 -11.915401 -11.88620205 -11.91603643 -11.96029519 -12.14926486 -12.23846499 -12.26650985 -11.84300821 -12.64562082 -12.41471031 -12.66462278 -12.577619 -12.90001898 -12.31577711 -11.66323243 -11.50283992 -11.4844068 -11.60402491 -11.95270942 -11.68245512 -12.32380181 -12.24294758 -12.23990879 -12.21563403 -12.33730369 -12.437377 > prot4 -10.88942769 -11.16906693 -11.13942576 -11.31332257 -11.04718433 -11.11811122 -11.17687812 -11.12503828 -10.9724186 -11.16837945 -11.19642214 -10.96468249 -11.3975887 -11.28808753 -11.32778647 -11.34124725 -11.30972182 -11.29564372 -10.74370929 -10.92223539 -10.97733154 -11.40528844 -11.1238659 -11.15938598 -11.24937805 -10.8691392 -11.12478375 -10.75566728 -10.99485703 -11.09493115 > prot5 -10.0102959 -9.936796529 -9.964629149 -9.842835973 -9.791578592 -9.773380518 -9.72290866 -9.715837804 -9.79028651 -9.951486129 -9.636225505 -9.820715987 -10.41899204 -10.25269382 -10.26949484 -10.02644184 -10.13120897 -10.20756299 -9.752087376 -9.687001368 -10.07111473 -9.815279198 -9.995624174 -9.993526894 -9.722360141 -9.551502595 -9.551929198 -9.724500546 -9.502769792 -9.65324573 > prot6 -10.34051005 -10.27571947 -10.14968761 -10.17419023 -10.47812301 -10.11019796 -10.40447672 -10.15885481 -10.22900798 -10.26612428 -10.21920493 -10.17186677 -10.66125689 -10.95438025 -10.63751536 -10.65825783 -10.60857688 -10.78516027 -10.33890785 -10.49726978 -10.47100414 -10.64742463 -10.78932619 -10.5318634 -10.26494688 -9.975182247 -10.24870036 -10.2356165 -10.26689552 -10.13061368 > prot7 -10.24930429 -10.37307132 -10.03573128 -10.29985129 -9.991216794 -10.05854902 -10.1958704 -10.30549818 -10.2078462 -10.28795766 -10.23314344 -10.23897922 -9.997472306 -10.27461285 -10.20805608 -10.06261332 -10.24876706 -10.12643737 -9.906088449 -10.07316322 -10.23545822 -10.30970717 -10.40745591 -10.36432166 -10.22423532 -10.25703553 -10.44925268 -9.902554721 -9.891163766 -10.0695915 > prot8 -10.98782595 -10.84184533 -10.76496107 -10.68290092 -10.55763113 -10.91736394 -10.87505278 -10.76474268 -10.58319007 -10.87547281 -10.71948079 -10.95011831 -10.99753277 -11.061728 -10.8852958 -10.86371208 -10.96638746 -11.24112703 -10.46809937 -10.78446288 -10.71240489 -10.80931259 -10.6598091 -10.54801115 -10.70612733 -10.7339808 -10.8184854 -10.53370359 -10.47323989 -10.62675183 > prot9 -8.83857166 -8.736344638 -8.743339515 -8.8152675 -8.743086044 -8.719612156 -8.898093257 -8.902781886 -9.071574958 -8.945970659 -8.862394746 -8.825061244 -8.82313363 -9.161452294 -8.905846232 -8.940119002 -9.024995852 -8.943721201 -8.768488159 -8.802155458 -8.721187011 -8.84850416 -8.931513624 -8.86743278 -8.856904592 -8.675257846 -8.900833162 -8.676117406 -8.758661701 -8.925717389 > prot10 -10.65297508 -10.74532307 -10.65940071 -10.36671791 -10.50431649 -10.54915637 -11.07154003 -10.79884265 -10.97164196 -11.1201714 -11.14821342 -10.9254445 -10.92875918 -10.90806369 -10.77581175 -11.2324716 -11.31360896 -11.01070959 -11.04450945 -10.89694291 -10.76865867 -10.92983387 -11.07365287 -11.43888216 -11.14948441 -10.69611194 -10.85827316 -10.64470128 -10.79046792 -10.86048168 > > ## Code that I am attempting to utilize: > proteins.mat <- as.matrix(proteins.df) > treat = c("g1.s","g1.c","g2.s","g2.c","g3.s","g3.c","g4.s","g4. c","g5.s","g5.c") > factors = gl(10,3,labels=treat) > design <- model.matrix(~0+factors) > colnames(design) <- treat > > ## Here is the design for my model: > > design > g1.s g1.c g2.s g2.c g3.s g3.c g4.s g4.c g5.s g5.c > 1 1 0 0 0 0 0 0 0 0 0 > 2 1 0 0 0 0 0 0 0 0 0 > 3 1 0 0 0 0 0 0 0 0 0 > 4 0 1 0 0 0 0 0 0 0 0 > 5 0 1 0 0 0 0 0 0 0 0 > 6 0 1 0 0 0 0 0 0 0 0 > 7 0 0 1 0 0 0 0 0 0 0 > 8 0 0 1 0 0 0 0 0 0 0 > 9 0 0 1 0 0 0 0 0 0 0 > 10 0 0 0 1 0 0 0 0 0 0 > 11 0 0 0 1 0 0 0 0 0 0 > 12 0 0 0 1 0 0 0 0 0 0 > 13 0 0 0 0 1 0 0 0 0 0 > 14 0 0 0 0 1 0 0 0 0 0 > 15 0 0 0 0 1 0 0 0 0 0 > 16 0 0 0 0 0 1 0 0 0 0 > 17 0 0 0 0 0 1 0 0 0 0 > 18 0 0 0 0 0 1 0 0 0 0 > 19 0 0 0 0 0 0 1 0 0 0 > 20 0 0 0 0 0 0 1 0 0 0 > 21 0 0 0 0 0 0 1 0 0 0 > 22 0 0 0 0 0 0 0 1 0 0 > 23 0 0 0 0 0 0 0 1 0 0 > 24 0 0 0 0 0 0 0 1 0 0 > 25 0 0 0 0 0 0 0 0 1 0 > 26 0 0 0 0 0 0 0 0 1 0 > 27 0 0 0 0 0 0 0 0 1 0 > 28 0 0 0 0 0 0 0 0 0 1 > 29 0 0 0 0 0 0 0 0 0 1 > 30 0 0 0 0 0 0 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 1 1 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$factors > [1] "contr.treatment" > > ## My contrast model. I want to test for interaction, differences between genotypes, and to see if specific genotypes respond differently to the treatment from one another: > cmtx <- makeContrasts( > GenotypevsTreatment=(g1.s-g1.c)-(g2.s-g2.c)-(g3.s-g3.c)-(g4.s -g4.c)-(g5.s-g5.c), > genotype=(g1.s+g1.c)-(g2.s+g2.c)-(g3.s+g3.c)-(g4.s+g4.c)-(g5.s+g5.c), > Treatment=(g1.s+g2.s+g3.s+g4.s+g5.s)-(g1.c+g2.c+g3.c+g4.c+g5.c), > levels=design) > > ## What my contrast model looks like, but I don't think this is correct: > > cmtx > Contrasts > Levels GenotypevsTreatment Genotype Treatment > g1.s 1 1 1 > g1.c -1 1 -1 > g2.s -1 -1 1 > g2.c 1 -1 -1 > g3.s -1 -1 1 > g3.c 1 -1 -1 > g4.s -1 -1 1 > g4.c 1 -1 -1 > g5.s -1 -1 1 > g5.c 1 -1 -1 > > ## Fitting the linear model by empirical bayes statistics for differential expression: > fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design), cmtx)) > topTable(fit, adjust.method="BH") > > ## The below topTable proteins are the same as the subset of data from above: > > topTable(fit, adjust.method="BH") > GenotypevsTreatment Genotype Treatment AveExpr F P.Value adj.P.Val > prot1 -0.40786338 60.30918 0.073054723 -9.918822 17308.55 1.124646e-39 1.232079e-36 > prot2 -0.09255219 59.60864 0.061701713 -9.897968 15801.43 3.304533e-39 1.232079e-36 > prot3 -0.23880357 73.48557 0.536672827 -12.090016 15650.65 3.701463e-39 1.232079e-36 > prot4 -0.11834000 66.76931 0.305471823 -11.122034 15522.46 4.079731e-39 1.232079e-36 > prot5 -0.15210172 59.21509 -0.183849274 -9.876144 14734.51 7.556112e-39 1.423908e-36 > prot6 -0.15761118 62.87467 0.155340561 -10.389362 14565.87 8.658504e-39 1.423908e-36 > prot7 -0.03886438 61.15652 -0.166795475 -10.182834 14551.88 8.757515e-39 1.423908e-36 > prot8 -0.10425341 64.63523 -0.186904167 -10.780359 14461.18 9.429854e-39 1.423908e-36 > prot9 -0.03426380 53.48057 0.007403722 -8.854471 13713.49 1.767090e-38 2.021378e-36 > prot10 -0.75250251 66.62646 0.327497120 -10.894506 13480.51 2.164184e-38 2.021378e-36 > > I think I am naively constructing the contrast matrix. The result for Genotype (above) looks incorrect to me. Ideally, I would like to figure this out because I wish to apply similar modeling techniques to an RNAseq dataset from the same experimental samples that I used for my protein work. > > Also, in order to make the correct comparisons, do I also need to employ the ?duplicateCorrelation? function (pg. 50, lima guide 9.7 Multi-level Experiments)? Something like this: > > corfit <- duplicateCorrelation(proteins.mat, design) >> corfit$consensus > [1] -0.7888584 > fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design, corrleation=corfit$consensus), cmtx)) > > Any insight/corrections would be greatly appreciated, and if I have forgotten some information that would aid in helping me with this problem I am happy to cooperate. > > Cheers, > Ryan > > Ryan M. Ghan > Graduate Research Assistant > Department of Biochemistry and Molecular Biology, MS 330 > University of Nevada, Reno > Reno, NV 89557 > Howard 205 > Lab: (775) 784-4225 > rghan at unr.edu > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Hi Ryan- Thank you very much for your comments and code contributions. Applying your suggestions seems to produce the results that I was looking for. I think the hardest thing has been to conceptualize the specific tests that I was after. As you mentioned, testing genotype differences for one condition makes more sense than what I had previously attempted. Likewise, testing treatment differences within one genotype at a time. Now I?ll need to spend some additional time with the data beyond a quick overview. Regarding other count specific packages, I was aware of both DESeq2 and limma?s voom, as alternatives to deal with spectral count data. I am actually in the process of utilizing the DESeq2 package for my RNAseq datasets, and I have thought about applying them towards my proteomic work once I am comfortable with it. My initial hesitation has been the criteria I have used for normalizing the spectral counts (Normalized Spectral Abundance Factor), which accounts for sample-to-sample variation of replicates and the fact that longer proteins can theoretically produce more peptide/spectral hits than shorter proteins. This criteria includes excluding any samples from quantification that have missing counts within >=1 biological replicate, and enforcing a minimum threshold of 5-counts >across all biological replicates for a protein to be included for >downstream analysis. I will have to try one of the packages out with the >same filtered dataset that exludes any of the aforementioned ?missing? >protein counts. Thanks again! Ryan Ryan M. Ghan Graduate Research Assistant Department of Biochemistry and Molecular Biology, MS 330 University of Nevada, Reno Reno, NV 89557 Howard 205 Lab: (775) 784-4225 rghan at unr.edu On 3/13/14, 4:31 PM, "Ryan" <rct at="" thompsonclan.org=""> wrote: >Hi Ryan, > >Your design matrix is fine. However, your mistake is trying to stuff >each one of your tests into a single contrast. As an example, for the >interaction test, you are testing the interaction between a 5-level >factor and a 2-level factor, so the test has 4 degrees of freedom. This >means that you'll need 4 contrasts to represent this test: > >interaction.contrast.exprs <- sprintf("(g%s.s-g%s.c) - (g1.s-g1.c)", >2:5, 2:5) >interaction.contrast.matrix <- >makeContrasts(contrasts=interaction.contrast.exprs, levels=design) > >The resulting contrast matrix should have 4 columns, one for each >contrast in your test. > >For your other two tests, you should be aware that the expression >"(g1.s + g1.c)/2" means "an equal mix of control and stressed", which >is probably not what you want. I would say that it makes more sense to >just compare the genotypes within a single condition. To test for >genotype differences in the control condition, this is again a >4-degree-of-freedom test, so you'll need a similar approach to the >above: > >control.genotype.contrast.exprs <- sprintf("g%s.c-g1.c", 2:5, 2:5) >control.genotype.contrast.matrix <- >makeContrasts(contrasts=control.genotype.contrast.exprs, levels=design) > >You could also to the same with the stressed condition as a separate >test. The same caveat applies to your treatment contrast, which is only >correct if you want to know the expression differences for control vs >stressed in an equal mix of all 5 genotypes (i.e. a hypothetical >population with 20% of each genotype). Again, the best approach is >probably to test control vs stressed in each genotype separately, i.e. >5 tests of one contrast each. > >Lastly, since you said you are working with spectral count data, you >might want to consider using the edgeR or DESeq2 packages, which are >specifically designed for analyzing count data, or else to use limma's >voom function to perform the log2-transform while accounting for >heteroskedasticity. > >Anyway, that's my understanding. I hope it helps you. Anyone else can >feel free to correct me if I've made any errors in my explanation. > >-Ryan Thompson > >On Thu Mar 13 15:14:20 2014, Ryan M Ghan wrote: >> Hello limma community- >> >> I am attempting to construct a contrast matrix that I can run in R, >>using the limma bioconductor package, but I am not sure that I have >>coded the contrast matrix correctly. A previous >>post<https: stats.stackexchange.com="" questions="" 64249="" creating-="" contrast-ma="">>trix-for-linear-regression- in-r?newreg=add2674ca9d04b7eb85fad255b45b7f5> >>on stats.stackexchange.com, the bioconductor mailing list, and the limma >>guide were helpful, but my two factorial design is more complicated than >>what is illustrated there. >> >> The first factor is the treatment, with two levels (control=c and >>stress=s), and the second factor is the genotype, with five levels (g1, >>g2, g3, g4, g5). Each genotype/treatment consists of 3-biological >>replicates (30xsamples total). My dataset has already been normalized >>and log2 transformed (Should I even start with log transformed data?). >>It consists of 1208 proteins (based upon spectral counting for those >>that care) that measures protein abundance differences in the five >>genotypes and two treatments. The dataset is complete, meaning each >>sample/condition has a datapoint, and appears to be normally distributed >>(histogram and qqplots). >> >> ## Session information >>> sessionInfo() >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_3.18.13 >> >> loaded via a namespace (and not attached): >> [1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4 >>ggplot2_0.9.3.1 grid_3.0.2 gtable_0.1.2 >> [7] labeling_0.2 MASS_7.3-30 munsell_0.4.2 >>plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 >> [13] Rcpp_0.11.0 reshape2_1.2.2 scales_0.2.3 >>stringr_0.6.2 tools_3.0.2 >> >> ## Subset of the data: >> proteinID g1.s1 g1.s2 g1.s3 g1.c1 g1.c2 g1.c3 g2.s1 g2.s2 g2.s3 >>g2.c1 g2.c2 g2.c3 g3.s1 g3.s2 g3.s3 g3.c1 g3.c2 g3.c3 g4.s1 g4.s2 g4.s3 >>g4.c1 g4.c2 g4.c3 g5.s1 g5.s2 g5.s3 g5.c1 g5.c2 g5.c3 >> prot1 -9.70583694 -9.940059478 -9.764489183 -9.691937821 >>-9.547306096 -9.668928704 -9.821333234 -10.00376839 -9.843380585 >>-10.0789111 -9.958506961 -9.791583706 -10.04996359 -10.10279896 >>-10.0689715 -9.989303332 -10.05414639 -10.00619809 -9.907032795 >>-10.09700113 -10.00902876 -10.05603575 -10.26218387 -10.15527373 >>-9.88009858 -9.748974338 -9.730010667 -9.899956956 -9.773955101 >>-9.957684691 >> prot2 -9.810354967 -9.844319231 -9.896748977 -9.777040294 >>-9.821308434 -9.906798728 -9.832236541 -9.876359355 -9.935535795 >>-10.05991278 -9.831098077 -9.789738587 -10.08470861 -10.18515166 >>-10.10371621 -10.01971224 -9.977142493 -10.09055782 -9.739831978 >>-9.586647999 -9.949407778 -9.800183583 -9.83900565 -9.943521592 >>-9.99229056 -9.744850134 -9.794814509 -9.98542989 -9.766324886 >>-9.95430439 >> prot3 -11.70842601 -11.72521838 -11.90389475 -11.98273998 >>-11.915401 -11.88620205 -11.91603643 -11.96029519 -12.14926486 >>-12.23846499 -12.26650985 -11.84300821 -12.64562082 -12.41471031 >>-12.66462278 -12.577619 -12.90001898 -12.31577711 -11.66323243 >>-11.50283992 -11.4844068 -11.60402491 -11.95270942 -11.68245512 >>-12.32380181 -12.24294758 -12.23990879 -12.21563403 -12.33730369 >>-12.437377 >> prot4 -10.88942769 -11.16906693 -11.13942576 -11.31332257 >>-11.04718433 -11.11811122 -11.17687812 -11.12503828 -10.9724186 >>-11.16837945 -11.19642214 -10.96468249 -11.3975887 -11.28808753 >>-11.32778647 -11.34124725 -11.30972182 -11.29564372 -10.74370929 >>-10.92223539 -10.97733154 -11.40528844 -11.1238659 -11.15938598 >>-11.24937805 -10.8691392 -11.12478375 -10.75566728 -10.99485703 >>-11.09493115 >> prot5 -10.0102959 -9.936796529 -9.964629149 -9.842835973 >>-9.791578592 -9.773380518 -9.72290866 -9.715837804 -9.79028651 >>-9.951486129 -9.636225505 -9.820715987 -10.41899204 -10.25269382 >>-10.26949484 -10.02644184 -10.13120897 -10.20756299 -9.752087376 >>-9.687001368 -10.07111473 -9.815279198 -9.995624174 -9.993526894 >>-9.722360141 -9.551502595 -9.551929198 -9.724500546 -9.502769792 >>-9.65324573 >> prot6 -10.34051005 -10.27571947 -10.14968761 -10.17419023 >>-10.47812301 -10.11019796 -10.40447672 -10.15885481 -10.22900798 >>-10.26612428 -10.21920493 -10.17186677 -10.66125689 -10.95438025 >>-10.63751536 -10.65825783 -10.60857688 -10.78516027 -10.33890785 >>-10.49726978 -10.47100414 -10.64742463 -10.78932619 -10.5318634 >>-10.26494688 -9.975182247 -10.24870036 -10.2356165 -10.26689552 >>-10.13061368 >> prot7 -10.24930429 -10.37307132 -10.03573128 -10.29985129 >>-9.991216794 -10.05854902 -10.1958704 -10.30549818 -10.2078462 >>-10.28795766 -10.23314344 -10.23897922 -9.997472306 -10.27461285 >>-10.20805608 -10.06261332 -10.24876706 -10.12643737 -9.906088449 >>-10.07316322 -10.23545822 -10.30970717 -10.40745591 -10.36432166 >>-10.22423532 -10.25703553 -10.44925268 -9.902554721 -9.891163766 >>-10.0695915 >> prot8 -10.98782595 -10.84184533 -10.76496107 -10.68290092 >>-10.55763113 -10.91736394 -10.87505278 -10.76474268 -10.58319007 >>-10.87547281 -10.71948079 -10.95011831 -10.99753277 -11.061728 >>-10.8852958 -10.86371208 -10.96638746 -11.24112703 -10.46809937 >>-10.78446288 -10.71240489 -10.80931259 -10.6598091 -10.54801115 >>-10.70612733 -10.7339808 -10.8184854 -10.53370359 -10.47323989 >>-10.62675183 >> prot9 -8.83857166 -8.736344638 -8.743339515 -8.8152675 >>-8.743086044 -8.719612156 -8.898093257 -8.902781886 -9.071574958 >>-8.945970659 -8.862394746 -8.825061244 -8.82313363 -9.161452294 >>-8.905846232 -8.940119002 -9.024995852 -8.943721201 -8.768488159 >>-8.802155458 -8.721187011 -8.84850416 -8.931513624 -8.86743278 >>-8.856904592 -8.675257846 -8.900833162 -8.676117406 -8.758661701 >>-8.925717389 >> prot10 -10.65297508 -10.74532307 -10.65940071 -10.36671791 >>-10.50431649 -10.54915637 -11.07154003 -10.79884265 -10.97164196 >>-11.1201714 -11.14821342 -10.9254445 -10.92875918 -10.90806369 >>-10.77581175 -11.2324716 -11.31360896 -11.01070959 -11.04450945 >>-10.89694291 -10.76865867 -10.92983387 -11.07365287 -11.43888216 >>-11.14948441 -10.69611194 -10.85827316 -10.64470128 -10.79046792 >>-10.86048168 >> >> ## Code that I am attempting to utilize: >> proteins.mat <- as.matrix(proteins.df) >> treat = >>c("g1.s","g1.c","g2.s","g2.c","g3.s","g3.c","g4.s","g4.c","g5.s","g5 .c") >> factors = gl(10,3,labels=treat) >> design <- model.matrix(~0+factors) >> colnames(design) <- treat >> >> ## Here is the design for my model: >> > design >> g1.s g1.c g2.s g2.c g3.s g3.c g4.s g4.c g5.s g5.c >> 1 1 0 0 0 0 0 0 0 0 0 >> 2 1 0 0 0 0 0 0 0 0 0 >> 3 1 0 0 0 0 0 0 0 0 0 >> 4 0 1 0 0 0 0 0 0 0 0 >> 5 0 1 0 0 0 0 0 0 0 0 >> 6 0 1 0 0 0 0 0 0 0 0 >> 7 0 0 1 0 0 0 0 0 0 0 >> 8 0 0 1 0 0 0 0 0 0 0 >> 9 0 0 1 0 0 0 0 0 0 0 >> 10 0 0 0 1 0 0 0 0 0 0 >> 11 0 0 0 1 0 0 0 0 0 0 >> 12 0 0 0 1 0 0 0 0 0 0 >> 13 0 0 0 0 1 0 0 0 0 0 >> 14 0 0 0 0 1 0 0 0 0 0 >> 15 0 0 0 0 1 0 0 0 0 0 >> 16 0 0 0 0 0 1 0 0 0 0 >> 17 0 0 0 0 0 1 0 0 0 0 >> 18 0 0 0 0 0 1 0 0 0 0 >> 19 0 0 0 0 0 0 1 0 0 0 >> 20 0 0 0 0 0 0 1 0 0 0 >> 21 0 0 0 0 0 0 1 0 0 0 >> 22 0 0 0 0 0 0 0 1 0 0 >> 23 0 0 0 0 0 0 0 1 0 0 >> 24 0 0 0 0 0 0 0 1 0 0 >> 25 0 0 0 0 0 0 0 0 1 0 >> 26 0 0 0 0 0 0 0 0 1 0 >> 27 0 0 0 0 0 0 0 0 1 0 >> 28 0 0 0 0 0 0 0 0 0 1 >> 29 0 0 0 0 0 0 0 0 0 1 >> 30 0 0 0 0 0 0 0 0 0 1 >> attr(,"assign") >> [1] 1 1 1 1 1 1 1 1 1 1 >> attr(,"contrasts") >> attr(,"contrasts")$factors >> [1] "contr.treatment" >> >> ## My contrast model. I want to test for interaction, differences >>between genotypes, and to see if specific genotypes respond differently >>to the treatment from one another: >> cmtx <- makeContrasts( >> >>GenotypevsTreatment=(g1.s-g1.c)-(g2.s-g2.c)-(g3.s-g3.c)-(g4.s-g4.c)- (g5.s >>-g5.c), >> >>genotype=(g1.s+g1.c)-(g2.s+g2.c)-(g3.s+g3.c)-(g4.s+g4.c)-(g5.s+g5.c) , >> Treatment=(g1.s+g2.s+g3.s+g4.s+g5.s)-(g1.c+g2.c+g3.c+g4.c+g5.c), >> levels=design) >> >> ## What my contrast model looks like, but I don't think this is correct: >> > cmtx >> Contrasts >> Levels GenotypevsTreatment Genotype Treatment >> g1.s 1 1 1 >> g1.c -1 1 -1 >> g2.s -1 -1 1 >> g2.c 1 -1 -1 >> g3.s -1 -1 1 >> g3.c 1 -1 -1 >> g4.s -1 -1 1 >> g4.c 1 -1 -1 >> g5.s -1 -1 1 >> g5.c 1 -1 -1 >> >> ## Fitting the linear model by empirical bayes statistics for >>differential expression: >> fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design), cmtx)) >> topTable(fit, adjust.method="BH") >> >> ## The below topTable proteins are the same as the subset of data from >>above: >> > topTable(fit, adjust.method="BH") >> GenotypevsTreatment Genotype Treatment AveExpr >>F P.Value adj.P.Val >> prot1 -0.40786338 60.30918 0.073054723 -9.918822 17308.55 >>1.124646e-39 1.232079e-36 >> prot2 -0.09255219 59.60864 0.061701713 -9.897968 15801.43 >>3.304533e-39 1.232079e-36 >> prot3 -0.23880357 73.48557 0.536672827 -12.090016 15650.65 >>3.701463e-39 1.232079e-36 >> prot4 -0.11834000 66.76931 0.305471823 -11.122034 15522.46 >>4.079731e-39 1.232079e-36 >> prot5 -0.15210172 59.21509 -0.183849274 -9.876144 14734.51 >>7.556112e-39 1.423908e-36 >> prot6 -0.15761118 62.87467 0.155340561 -10.389362 14565.87 >>8.658504e-39 1.423908e-36 >> prot7 -0.03886438 61.15652 -0.166795475 -10.182834 14551.88 >>8.757515e-39 1.423908e-36 >> prot8 -0.10425341 64.63523 -0.186904167 -10.780359 14461.18 >>9.429854e-39 1.423908e-36 >> prot9 -0.03426380 53.48057 0.007403722 -8.854471 13713.49 >>1.767090e-38 2.021378e-36 >> prot10 -0.75250251 66.62646 0.327497120 -10.894506 13480.51 >>2.164184e-38 2.021378e-36 >> >> I think I am naively constructing the contrast matrix. The result for >>Genotype (above) looks incorrect to me. Ideally, I would like to figure >>this out because I wish to apply similar modeling techniques to an >>RNAseq dataset from the same experimental samples that I used for my >>protein work. >> >> Also, in order to make the correct comparisons, do I also need to >>employ the ?duplicateCorrelation? function (pg. 50, lima guide 9.7 >>Multi-level Experiments)? Something like this: >> >> corfit <- duplicateCorrelation(proteins.mat, design) >>> corfit$consensus >> [1] -0.7888584 >> fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design, >>corrleation=corfit$consensus), cmtx)) >> >> Any insight/corrections would be greatly appreciated, and if I have >>forgotten some information that would aid in helping me with this >>problem I am happy to cooperate. >> >> Cheers, >> Ryan >> >> Ryan M. Ghan >> Graduate Research Assistant >> Department of Biochemistry and Molecular Biology, MS 330 >> University of Nevada, Reno >> Reno, NV 89557 >> Howard 205 >> Lab: (775) 784-4225 >> rghan at unr.edu >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.1 years ago Ryan M Ghan ▴ 20

Login before adding your answer.