Question

Closed:Question on Design formula for DESeq2

0

Entering edit mode

Tim • 0

@tim-22492

Last seen 4.4 years ago

Hello,

I am having some problems with my design for DESeq2.. wondering if anyone could help. I'm using v1.8.2. And please forgive me, I have ~4 weeks total bioinformatics experience (coding and statistics). Below is my colData. I simplified it for myself so I could more easily follow the examples from the vignette, although it is jumbled.

  grp ind cnd
1    X   1   A
2    X   1   B
3    X   2   A
4    X   2   B
5    X   3   A
6    X   3   B
7    Y   1   A
8    Y   1   B
9    Y   2   A
10   Y   2   B
11   Y   3   A
12   Y   3   B
13   Z   1   A
14   Z   1   B
15   N   4   C

However, here, "grp" are my conditions -- I have 3 conditions (X, Y, Z) and one control (N). "ind" are my Cell Types -- also 1, 2, and 3, with the control (4). And "cnd" refers to my Day Harvested -- harvested on Day 8 (A), Day 40 (B), and (C) for the control sample.

In this pilot experiment, I have 15 different samples -- 14 samples that can be described by a T cell type, a condition on the mouse, a time point when it was harvested. And 1 sample as the naive control that does not fall in the other categories. My goals: 1. Generate a PCA to see clustering of the samples. 2. For 3 samples -- Cell Type 1 harvested from 3 different conditions on day 40 (ind=1, cnd=B, grp=X,Y,Z) , find the significant DEGs. 3. Other comparisons like #2.

I have tried designs like design = ~ ind before which works without error, but I am unsure if the results are valid because the samples are not only described by cell type. No matter which way I went, I would get "Matrix not full rank" errors (linear combination or columns with zeros). I know the problem is with my design and that I don't know how to construct the design to meet my goals. Here are the lines from my most recent attempt:

colData <- DataFrame(grp=c(rep(c("X","Y"),each=6), rep("Z", 2),"N"),
                      ind=factor(c("1","1","2","2","3","3","1","1","2","2","3","3","1","1","4")),
                     cnd=c(rep(c("A","B"),7),"C"))

colData$ind.n <- factor(c("1","1","2","2","3","3","1","1","2","2","3","3","1","1","4"))
colData$cnd.n <- c(rep(c("A","B"),7),"C")
as.data.frame(colData)

m1 <- model.matrix(~ grp + grp:ind.n + grp:cnd.n, colData)
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
idx <- which(all.zero)
m1 <- m1[,-idx]
unname(m1)
m1

dds <- DESeqDataSetFromMatrix(countData=mycounts, colData=colData, 
                             design= ~ grp + grp:ind.n + grp:cnd.n, tidy = TRUE)

Running the last line gives me the error:

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  Levels or combinations of levels without any samples have resulted in
  column(s) of zeros in the model matrix.

If it did not, I would run this afterwards:

dds <- DESeq(dds, full = m1, betaPrior = FALSE)

And then use DESeq's plots and results functions to accomplish my goals above (but I haven't explored results() in depth yet)

Just in case to clarify the experiment, each of 14 samples contain the data from 1 of 3 sorted T cell subtypes, which was harvested on either Day 8 or Day 40 from a mouse under 1 of 3 total conditions. The last sample is from a control mouse, naive T cell.

DESeq2 RNASeq • 128 views

ADD COMMENT • link updated 4.4 years ago by Michael Love 41k • written 4.4 years ago by Tim • 0