design matrix in limma
2
0
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 6.8 years ago
Hello BioC forum, I am making a design matrix for gene expression data analysis with the adjustment of Sex variable. The R code I am using is below. I want to make sure if I am doing correctly. Thanks for your review and feedback. > library(limma) > Sample<-factor(pd$Sample_Group, levels=c("Control","Case")) ### treatment > Sex<-factor(pd$Sex, levels=c("M","F")) ### Sex > design<-model.matrix(~Sample+Sex) > design (Intercept) SampleCase SexF 1 1 0 0 2 1 0 0 3 1 0 0 4 1 0 1 5 1 0 1 6 1 0 1 7 1 1 1 8 1 0 0 9 1 0 1 10 1 1 0 11 1 1 1 12 1 0 1 13 1 0 1 14 1 1 0 15 1 0 0 16 1 1 1 17 1 0 0 18 1 0 0 19 1 1 0 20 1 0 0 21 1 0 1 22 1 1 1 23 1 0 1 24 1 0 0 25 1 1 0 26 1 0 1 27 1 0 1 28 1 0 1 29 1 0 0 30 1 0 0 31 1 1 1 32 1 0 1 33 1 0 1 34 1 1 0 35 1 0 0 attr(,"assign") [1] 0 1 2 attr(,"contrasts") attr(,"contrasts")$Sample [1] "contr.treatment" attr(,"contrasts")$Sex [1] "contr.treatment" > colnames(design) [1] "(Intercept)" "SampleCase" "SexF" > Lmfit<-lmFit(autosome.Mvalues.noSNPs, design) ## using M values > fit<-eBayes(Lmfit) > top.50<-topTable(fit,coef=2,adjust.method="fdr",number=50,sort.by="B") > top.50 -- output of sessionInfo(): > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] FlowSorted.Blood.450k_1.2.0 quadprog_1.5-5 [3] DMRcate_1.0.1 DMRcatedata_1.0.0 [5] limma_3.20.8 ks_1.9.2 [7] rgl_0.93.1098 mvtnorm_1.0-0 [9] misc3d_0.8-4 KernSmooth_2.23-12 [11] minfiData_0.6.0 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 [13] IlluminaHumanMethylation450kmanifest_0.4.0 minfi_1.10.2 [15] bumphunter_1.4.2 locfit_1.5-9.1 [17] iterators_1.0.7 foreach_1.4.2 [19] Biostrings_2.32.1 XVector_0.4.0 [21] GenomicRanges_1.16.4 GenomeInfoDb_1.0.2 [23] IRanges_1.22.10 lattice_0.20-29 [25] Biobase_2.24.0 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] annotate_1.42.1 AnnotationDbi_1.26.0 base64_1.1 beanplot_1.1 codetools_0.2-8 [6] DBI_0.2-7 digest_0.6.4 doRNG_1.6 genefilter_1.46.1 grid_3.1.0 [11] illuminaio_0.6.0 MASS_7.3-33 matrixStats_0.10.0 mclust_4.3 multtest_2.20.0 [16] nlme_3.1-117 nor1mix_1.1-4 pkgmaker_0.22 plyr_1.8.1 preprocessCore_1.26.1 [21] R.methodsS3_1.6.1 RColorBrewer_1.0-5 Rcpp_0.11.2 registry_0.2 reshape_0.8.5 [26] rngtools_1.2.4 RSQLite_0.11.4 siggenes_1.38.0 splines_3.1.0 stats4_3.1.0 [31] stringr_0.6.2 survival_2.37-7 tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 [36] zlibbioc_1.10.0 -- Sent via the guest posting facility at bioconductor.org. _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
• 1.9k views
3
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia

Looks like you are using a standard limma pipeline. So in principle, yes, it's fine.

0
Entering edit mode

Thank you, Dr. Smyth.

0
Entering edit mode
Bernd Klaus ▴ 590
@bernd-klaus-6281
Last seen 2.7 years ago
Germany

Dear KC,

Just as a an additional side note: commonly, the blocking factor is the second

coefficient, and the the coefficient representing the fold change of interest
is put last.

However, this is just as cosmetic in limma, since you specify explicitly which coefficient to extract, while e.g. in DESeq2 the last coefficient is tested by default. I.e.

design<-model.matrix(~Sex + Sample)

Additionally, you can also inspect the p-values for the blocking
factor "Sex", if you do not have a lot of rejections there, you
might even leave it out entirely and do a simple two groups
comparison.

Best wishes,

Bernd