nested factors in edgeR [was: something wrong with my design matrix in edgeR]
1
0
Entering edit mode
@gordon-smyth
Last seen 38 minutes ago
WEHI, Melbourne, Australia
Dear Li, Thanks for showing the targets file -- that makes your experiment very clear. Your design matrix can be seen to be incorrect because it has five columns for an experiment that has only four distinct groups. The problem is that the populations are subgroups, nested within the latitudes north and south. So your two factors are nested rather than crossed. I assume you want to test for differences between the two north populations (FBK and NWL), and then for differences between the two south populations (SOU and CAR), and then later for N vs S. There are many ways to do this, but the following is perhaps the easiest: pop <- factor(targets$pop, levels=c("FBK","NWL","SOU","CAR")) design <- model.matrix(~pop) Then later fit <- glmFit(d, design) so the coefficients 2-3 compare NWL, SOU and CAR respectively back to FBK. To test NWL vs FBK (within the north group): lrt <- glmFit(d, fit, coef=2) topTags(lrt) To test CAR vs SOU (within the south group): lrt <- glmFit(d, fit, contrast=c(0,0,-1,1)) topTags(lrt) To test S vs N overall: lrt <- glmFit(d, fit, contrast=c(0,-1,1,1)/2) topTags(lrt) Best wishes Gordon --------------- original message -------------- [BioC] something wrong with my design matrix in edgeR Wang, Li li.wang at ttu.edu Thu Apr 26 18:56:46 CEST 2012 Dear List members My targets include 31 samples representing two groups (south and north) and also four populations (FBK, NWL, SOU, CAR). > targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) > targets files group pop 1 FBK01.txt N FBK 2 FBK04.txt N FBK 3 FBK05.txt N FBK 4 FBK08.txt N FBK 5 FBK09.txt N FBK 6 FBK10.txt N FBK 7 FBK13.txt N FBK 8 NWL02.txt N NWL 9 NWL04.txt N NWL 10 NWL06.txt N NWL 11 NWL08.txt N NWL 12 NWL09.txt N NWL 13 NWL10.txt N NWL 14 NWL11.txt N NWL 15 NWL14.txt N NWL 16 SOU01.txt S SOU 17 SOU02.txt S SOU 18 SOU03.txt S SOU 19 SOU04.txt S SOU 20 SOU07.txt S SOU 21 SOU09.txt S SOU 22 SOU11.txt S SOU 23 SOU15.txt S SOU 24 CAR01.txt S CAR 25 CAR02.txt S CAR 26 CAR03.txt S CAR 27 CAR04.txt S CAR 28 CAR05.txt S CAR 29 CAR11.txt S CAR 30 CAR12.txt S CAR 31 CAR14.txt S CAR We are interested with the differential expression between north and south group by adjusting for differences between populations. > latitude <- factor(targets$group) > latitude <- relevel(latitude, ref="N") > pop <- factor(targets$pop) > design <- model.matrix(~pop+latitude) > rownames(design) <- rownames(d$samples) > design (Intercept) popFBK popNWL popSOU latitudeS 1 1 1 0 0 0 2 1 1 0 0 0 3 1 1 0 0 0 4 1 1 0 0 0 5 1 1 0 0 0 6 1 1 0 0 0 7 1 1 0 0 0 8 1 0 1 0 0 9 1 0 1 0 0 10 1 0 1 0 0 11 1 0 1 0 0 12 1 0 1 0 0 13 1 0 1 0 0 14 1 0 1 0 0 15 1 0 1 0 0 16 1 0 0 1 1 17 1 0 0 1 1 18 1 0 0 1 1 19 1 0 0 1 1 20 1 0 0 1 1 21 1 0 0 1 1 22 1 0 0 1 1 23 1 0 0 1 1 24 1 0 0 0 1 25 1 0 0 0 1 26 1 0 0 0 1 27 1 0 0 0 1 28 1 0 0 0 1 29 1 0 0 0 1 30 1 0 0 0 1 31 1 0 0 0 1 attr(,"assign") [1] 0 1 1 1 2 attr(,"contrasts") attr(,"contrasts")$pop [1] "contr.treatment" attr(,"contrasts")$latitude [1] "contr.treatment" I am wondering maybe there is something wrong with my design matrix. So, when I tried to estimate the dispersions, I got the following errors. > d <- estimateGLMCommonDisp(d, design) Error in solve.default(R, t(beta)) : system is computationally singular: reciprocal condition number = 5.16368e-17 The following is session information. Any comments are very appreciated. > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] qvalue_1.30.0 edgeR_2.6.0 limma_3.12.0 loaded via a namespace (and not attached): [1] tcltk_2.15.0 tools_2.15.0 Thanks in advance! Best wishes Li ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
• 1.4k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 38 minutes ago
WEHI, Melbourne, Australia
Dear Li, There was typo in my code. (I wrote glmFit where I meant glmLRT.) Correct code is: pop <- factor(targets$pop, levels=c("FBK","NWL","SOU","CAR")) design <- model.matrix(~pop) Then later fit <- glmFit(d, design) To test NWL vs FBK (within the north group): lrt <- glmLRT(d, fit, coef=2) topTags(lrt) To test CAR vs SOU (within the south group): lrt <- glmLRT(d, fit, contrast=c(0,0,-1,1)) topTags(lrt) To test S vs N overall: lrt <- glmLRT(d, fit, contrast=c(0,-1,1,1)/2) topTags(lrt) Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 1010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6