Subject paired DESeq2 design
1
0
Entering edit mode
dr ▴ 10
@dr-9473
Last seen 4 months ago
United States

Hi,

I have an experimental design where I took the left and right brain hemispheres from mice across several time points following treatment at time 0 to the left brain hemisphere only. Therefore the right hemisphere serves as a paired control for the left hemisphere at each of the time points. Here's the samples data.frame:

library(dplyr)
df <- data.frame(animal_id = c("id1","id1","id2","id2","id3","id3","id4","id4","id5","id5","id6","id6","id7","id7","id8","id8","id9","id9","id10","id10","id11","id11","id12","id12","id13","id13","id14","id14","id15","id15","id16","id16","id17","id17","id18","id18","id19","id19","id20","id20","id21","id21","id22","id22","id23","id23","id24","id24","id25","id25","id26","id26","id27","id27","id28","id28","id29","id29","id30","id30"),
time_point = c(0,0,0,0,2,2,2,2,2,2,3,3,3,3,3,3,6,6,6,6,6,6,9,9,9,9,14,14,14,14,14,14,14,14,26,26,26,26,26,26,26,26,50,50,50,50,50,50,50,50,74,74,170,170,170,170,170,170,170,170),
hemisphere = c("L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R"),
cov1 = c(99,99,98,98,42.2,42.2,47.6,47.6,38.7,38.7,73.5,73.5,40.6,40.6,37.4,37.4,29.9,29.9,35.1,35.1,38.9,38.9,33.5,33.5,37.9,37.9,38,38,37.3,37.3,45.2,45.2,40.4,40.4,40.3,40.3,39.6,39.6,38.3,38.3,38.9,38.9,37.7,37.7,41.1,41.1,42.8,42.8,37.5,37.5,41.1,41.1,40.8,40.8,42.8,42.8,39,39,40.9,40.9),
cov2 = c(28,28,27.3,27.3,28.2,28.2,28.1,28.1,25.6,25.6,30,30,30.1,30.1,30.3,30.3,30.2,30.2,28.3,28.3,31.6,31.6,28.9,28.9,31.5,31.5,26.4,26.4,26.1,26.1,27.5,27.5,26.4,26.4,23.6,23.6,26.5,26.5,25,25,22.1,22.1,27.8,27.8,23.2,23.2,23.2,23.2,21.1,21.1,25.9,25.9,25.7,25.7,25.4,25.4,26.7,26.7,19,19),
stringsAsFactors = F) %>%
dplyr::mutate(sample_id = paste0(animal_id,"_",hemisphere,"_",time_point))


As you can see I have 30 animals where for each I have both hemispheres at each time point, however the number of animals at each time point varies between 1 to 4.

What I'm interested in is how the difference in gene expression between the left and right hemispheres changes at each time point relative to the first time point, while controlling for the covariates (cov1 and cov2). For this reason I convert time_point to a factor, as well as animal_id and hemisphere (setting hemisphere R as baseline):

df$time_point <- factor(df$time_point)
df$animal_id <- factor(df$animal_id)
df$hemisphere <- factor(df$hemisphere, levels = c("R","L"))


Now I simulate a read count matrix:

set.seed(1)
count.mat <- matrix(as.integer(runif(10000*60 ,10, 200000)), nrow = 10000, ncol = 60,dimnames = list(paste0("gene",1:10000), df$sample_id))  So as the DESeq2 vignette part on Model matrix not full rank describes, my data have variable(s) that can be formed by the combination of other factor levels. For this reason running: DESeq2::DESeqDataSetFromMatrix(countData = count.mat, colData = df, design =~ cov1 + cov2 + animal_id + hemisphere*time_point)  I get the error: Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed. Please read the vignette section 'Model matrix not full rank': vignette('DESeq2')  If I follow the DESeq2 vignette part on Group-specific condition effects, individuals nested within groups for fixing this issue the suggestion is to create a pseudo animal_id like this: df$pseudo_animal_id <- factor(unlist(lapply(unique(df$time_point),function(t)as.integer(factor(dplyr::filter(df,time_point == t)$animal_id)))))


And to create this design matrix:

design.mat <- model.matrix(~ cov1 + cov2 + time_point + time_point:pseudo_animal_id + time_point:hemisphere,data=df)


And then to remove the all-zero columns (empty levels):

all.zero.idx <- unname(which(apply(design.mat, 2, function(x) all(x==0))))
design.mat <- design.mat[,-all.zero.idx]


And then to create the data object with a simplified design matrix:

dds <- DESeq2::DESeqDataSetFromMatrix(countData = count.mat, colData = df, design =~ animal_id + hemisphere)


And then to run DESeq2 with design.mat:

dds <- DESeq2::DESeq(dds, full = design.mat, parallel=T)


The last step fails with the same error as above:

using supplied model matrix
Error in checkFullRank(full) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.

vignette('DESeq2')


But if I eliminate cov1 and cov2 from the design matrix it runs fine.

But it's not clear to me how ti fix this covariates issue if I still wanted to account for them.

My higher level question is whether modeling the response as the difference in number of reads between the left and right hemispheres (L-R) per each animal wouldn't simplify this problem.

In that case the design would only be: cov1 + cov2 + time_point.

I guess it's probably not straight forward for the obvious reason that I'll have negative integers due to my subtraction, so I guess my question is what are my options? And in addition, if I have to either give up cov1 and cov2 or 'animal_id`, what would conceptually be a better choice?

DESeq2 paired design • 278 views
0
Entering edit mode
@mikelove
Last seen 20 hours ago
United States

For guidance on statistical design and interpreting coefficients, I recommend to work with a local statistician or someone familiar with linear models in R. I unfortunately only have sufficient time to answer software related questions on the support site.