Is there any chance to solve "the model matrix is not full rank" in DESeq2?
My design matrix is a count matrix, one row per gene and one column per sample,
The sample information (colData in dds) is a dataframe with two column, a column for condition containing pre and post treatment and column containing subject Id.
I am asking this question with regards to do paired samples
The problem is happened when
dds_1 <- DESeqDataSetFromMatrix(countData = df, colData = sample_info, design = ~ subject + condition)
This is the error I get:
Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. Levels or combinations of levels without any samples have resulted in column(s) of zeros in the model matrix.
I read the vignette section 'Model matrix not full rank': but I could not find any solution for the problem.
Is there any solution for this problem?
My design matrix is a count matrix, one row per gene and one column per sample,
The sample information (colData in dds) is a dataframe with two column, a column for condition containing pre and post treatment and column containing subject Id.
I am asking this question with regards to do paired samples The problem is happened when
dds_1 <- DESeqDataSetFromMatrix(countData = df, colData = sample_info, design = ~ subject + condition)
Also, there is a whole section devoted to fixing a model matrix that has columns of all zeros, which for sure will fix your problem. You need to show what you have done to try to fix the issue.
Thank you for your prompt reply but even after removing that, the problem was there still.
Both Basti and I have asked you to show your design matrix and show what you have done. You refuse to do that, and instead you say you did stuff and it didn't work. We already know you did stuff and it didn't work, or you wouldn't have posted here in the first place. We need to know exactly what you did, and exactly what your design matrix looks like, and the only way to know that is if you provide code and your design matrix. If you want help, then please provide what we ask for.
My design matrix is with 100 samples ( 50 samples before treatment and the same 50 samples after treatment and with around 3 million peptide
bigdf_t = transpose(tinydf)
colnames(bigdf_t) <- bigdf_t[1, ] rownames(bigdf_t) = colnames(tinydf) bigdf_t <- bigdf_t[-1,] bigdf_t <- bigdf_t[!(row.names(bigdf_t) %in% c("pre_post")),]
pep_all_zeros = colnames(tinydf[ , colSums(tinydf==0) == 100]) remove rows with all zero values bigdf_t <- bigdf_t[!(row.names(bigdf_t) %in% pep_all_zeros),]
colnames(bigdf_t)[1:50] <- paste("pre", colnames(bigdf_t[1:50]), sep = "_") colnames(bigdf_t)[51:100] <- paste("post", colnames(bigdf_t[51:100]), sep = "_")
sample_info = data.frame(matrix(nrow = ncol(bigdf_t), ncol = 2)) rownames(sample_info) = colnames(bigdf_t) names(sample_info)1 <- "condition" names(sample_info)2 <- "time"
sample_info$condition[sample_info$condition[1:50] == ''] <- 'Pre'
sample_info[1:50,1] <- replace(sample_info[1:50,1], is.na(sample_info[1:50,1]), "Pre") sample_info[51:100,1] <- replace(sample_info[51:100,1], is.na(sample_info[51:100,1]), "Post") sample_info[1:50,2] <- replace(sample_info[1:50,2], is.na(sample_info[1:50,2]), "before") sample_info[51:100,2] <- replace(sample_info[51:100,2], is.na(sample_info[51:100,2]), "after")
all(rownames(sample_info) == colnames(bigdf_t))
chars <- sapply(bigdf_t, is.character)
bigdf_t[ , chars] <- as.data.frame(apply(bigdf_t[ , chars], 2, as.numeric)) bigdf_t = round((10000 * bigdf_t)) # DSeq2 only accept integer and the data is read per million (multiply by 1000A0 as R can not handle big integer)
DSeq2 analysis
dds <- DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ time + condition) (Here gives me the full rank error)
Hope I could provide you your needed information.
Would you please help me in this regard?
We just need to know the sample_info dataset to infer the design matrix, but from the small example of sample_info you give here, I see no difference between condition and time variable, they are the same the variable with just different names
Thanks a lot for your reply.
But what else can I have there in the column "time"? It is the only information that I have from the dataset. Even if I consider the subject id, still the problem is there
If I want to analyze this data which are coming from the paired sample, what should I do?
I would highly appreciate if you could help me in this regard?