DESeq2: linear combination trouble
2
0
Entering edit mode
JamesB • 0
@jamesb-12194
Last seen 7.2 years ago

I am having a lot of trouble designing a formula to look at differential gene expression in my study. This is largely based around errors in my DEseq2 code leading to “Error in checkFullRank(modelMatrix)” warnings. I have looked at the manual and multiple comments/forum threads, but cannot work it out.

 

Essentially I have three animal morphs, two of which are solid colours and one that has stripes. I want to look at the gene expression difference between the three colours, so we have sequenced (RNA-Seq) tissue samples of all three (Red, Purple and Blue), but due to the fact that the colours are on different parts of the Red-Blue body we have sequenced two body landmarks from each; a truncated example can be seen below:

 

Condition       Individual      Tissue landmark

Red                 1                      A

Blue                1                      B

Red                 2                      A

Red                 2                      B

Purple             3                      A

Purple             3                      B

 

I have been able to look at pairwise comparisons of colour specific gene expression (taking into account the landmark) following:

 

d <-data.frame(s1=c(4:8),s2=c(5:9),s3=c(4:8),s4=c(5:9),s5=c(4:8),s6=c(5:9),s7=c(1:5),s8=c(1:5),s9=c(4:8),s10=c(5:9),s11=c(4:8),s12=c(5:9),s13=c(4:8),s14=c(5:9),s15=c(4:8),s16=c(5:9),s17=c(4:8),s18=c(5:9))   #Not real data

row.names(d) <- c("clust1","clust2","clust3","clust4","clust5")

d

countData <- as.matrix(d)

 

colData <- data.frame(row.names = c("s1","s2","s3","s4","s5","s6","s7","s8","s9","s10","s11","s12","s13","s14","s15","s16","s17","s18"),

                     condition = as.factor(c("Red","Blue","Red","Blue","Red","Red","Purple","Purple","Purple","Purple","Red","Red","Red","Red","Purple","Purple","Red","Blue")),

                     indiv = as.factor(c("RB_1","RB_1","RB_2","RB_2","R_1","R_1","P_1","P_1","P_2","P_2","R_2","R_2","R_3","R_3","P_3","P_3","RB_3","RB_3")),

                     landmark = as.factor(c("L1","L2","L1","L2","L1","L2","L1","L2","L1","L2","L1","L2","L1","L2","L1","L2","L1","L2")))

 

colData

 

dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ landmark + condition)

 

However, really I need to include the individual in a paired analysis (which I do not think the above fully accounts for), but here is where I encounter problems.

 

I tried:

dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ landmark + indiv + condition)

 

However, this returns the error:

---

Error in checkFullRank(modelMatrix) :

  the model matrix is not full rank, so the model cannot be fit as specified.

  One or more variables or interaction terms in the design formula are linear

  combinations of the others and must be removed.

 

  Please read the vignette section 'Model matrix not full rank':

---

 

I have checked the manual, and tried many iterations with various variables and “nested” groups, but I just seem to yo-yo between Linear combinations and columns of 0’s.

 

Is what I am wanting possible with the dataset? And if so how would I account for individual variation between my samples?

 

Hope that is clear. Much obliged for any help.                     

 

James

paired samples deseq2 deseq • 1.4k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

The simple answer is that you can't account for pairing in this experiment. If each individual had red, blue and purple conditions then you could, but they don't, so you can't.

One way to think about pairing is in the context of a paired t-test. To do a paired t-test, you take all your pairs, compute the differences, and then test to see if the differences are larger or smaller than zero. So in your example, we would be doing

red - blue
red - blue
red - red
purple - purple
purple - purple
red - red

Which is obviously not what you want. And DESeq2 can't account for the dependence structure inherent in your data (neither can edgeR). I think the only thing you can do is to use limma-voom and estimate the within-animal correlation using duplicateCorrelation.

ADD COMMENT
0
Entering edit mode
JamesB • 0
@jamesb-12194
Last seen 7.2 years ago

Thank you,  James MacDonald, your response is highly appreciated. 

I had sort of come to that conclusion, but glad someone with more experience has confirmed it for me.  Unfortunately it was the only design open to us, but knowing this we will look at alternative set ups in the future.

For now, I will check out the limma-voom option you mention.

ADD COMMENT

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6