Dear Bioconductor Support Site,
I'm currently working with an RNA-Seq dataset, which consists of 84 samples(Collaborative Cross mice genetic panel of normal, diabetic and obese phenotypes). My code for the analysis used so far is the following:
head(sampleInfo) # the phenotype data information
SampleName Sex Obesity Diabetes
1 DT_77 M Obese Diabetic
2 DT_39 F Obese Normal
3 DT_145 M Normal Normal
4 DT_130 F Obese Normal
5 DT_103 M Normal Normal
6 DT_124 M Obese Diabetic
Next, after the creation of DGEList object/filtering/normalization, for the construction of the design matrix, I considered two separate approaches,to find DE genes concerning the obesity phenotype, but taking into account diabetes, while blocking on the sex confounder:
1) First Approach
# Create the design matrix condition1.group <- as.factor(dge$samples$Obesity) condition2.group <- as.factor(dge$samples$Diabetes) sex.group <- as.factor(dge$samples$Sex) design <- model.matrix(~condition1.group : condition2.group + sex.group)
y3 <- estimateDisp(dge, design, robust=TRUE)
Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, : Design matrix not of full rank. The following coefficients not estimable: condition1.groupObese:condition2.groupNormal head(design) (Intercept) sex.groupM condition1.groupNormal:condition2.groupDiabetic condition1.groupObese:condition2.groupDiabetic 1 1 1 0 1 2 1 0 0 0 3 1 1 0 0 4 1 0 0 0 5 1 1 0 0 6 1 1 0 1 condition1.groupNormal:condition2.groupNormal condition1.groupObese:condition2.groupNormal 1 0 0 2 0 1 3 1 0 4 0 1 5 1 0 6 0 0
fit <- glmQLFit(y3, design, robust=TRUE)
fit2 <- glmQLFTest(fit, coef=2)
2) Second Approach
mult_con <- factor(paste(condition1.group, condition2.group, sep = ".")) sample2 <- cbind(sampleinfo,Group=mult_con) design <- model.matrix(~0+sample2$Group + sex.group)
head(design) sample2$GroupNormal.Diabetic sample2$GroupNormal.Normal sample2$GroupObese.Diabetic sample2$GroupObese.Normal sex.groupM 1 0 0 1 0 1 2 0 0 0 1 0 3 0 1 0 0 1 4 0 0 0 1 0 5 0 1 0 0 1 6 0 0 1 0 1
my.contrasts <- makeContrasts(OvsN = sample2$GroupObese.Normal- sample2$GroupNormal.Normal, ONvsOD = sample2$GroupObese.Normal-sample2$GroupObese.Diabetic, levels=design)
# Obese_Normal vs Obese_Diabetic example
qlf.ON.OD <- glmQLFTest(fit, contrast=my.contrasts[,"ONvsOD"]) de.ON.OD <- topTags(qlf.ON.OD,n=nrow(qlf.ON.OD),adjust.method="BH", p.value=0.05) dat.ON.OD <- de.ON.OD$table
Thus, my questions are the following;
A) 1) For the 1rst approach: regarding the initial error of full Rank, should I remove manually some coefficients that are not of "interest " ? For example, the aforementioned "condition1.groupObese:condition2.groupNormal" ?
2) Moreover, is the formulation and interpretation of the interaction approach correct ? The coefficient condition1.groupNormal:condition2.groupDiabetic , essentially compares the Normal samples from the Obesity group, with the Diabetic samples from the Diabetes group, correct ?
B) Overall, which of the two approaches should I consider ? Should I expect significant differences between the two approaches for the same comparisons ?
Thank you in advance and excuse me for my long post !!
Dear Aaron,
Thank you for your reply and your input, especially for the average effect of obesity between patient and normal samples.
Best regards,
Fotakis Georgios