DESeq2 package : Error in checkFullRank(modelMatrix) while designing multi factor
1
0
Entering edit mode
iammiso • 0
@iammiso-18808
Last seen 6.0 years ago

HI All.

 

I'm sorry if many people have already asked questions. But I have searched so many questions and documents so far, but they did not answer my question.

I have difficulty using design factor in DESeqDataSetFromMatrix. 

Rather than using only group (case-by-control) information to compare, we wanted to use other additional data to compare.

 

So I put the multi-factors into the design command. I thought it would affect other results if I included multi-factors, as opposed to just Group(case, control) information.

 


This is My Design:

Group   Project_Id          Gender  Age_at_Diagnosis

Case     TCGA-A6-6654    female   65

Case     TCGA-AA-3972    male     72

Case     TCGA-G4-6311    male     80

Case     TCGA-AA-3667    female   36

Case     TCGA-F4-6854    female   77

Case     TCGA-A6-6650    female   69

Control   TCGA-AA-3520    female    86

Control   TCGA-A6-5659     male       82

Control   TCGA-AA-3517    male       60

Control   TCGA-A6-2685     female    48

Control   TCGA-AA-3518     female   81

Control   TCGA-AA-3697    male       77

 

Below is the code I have bee trying to run.

> coldata.clinic$Group <- factor(coldata.clinic$Group, levels = c("Control","Case"))

> coldata.clinic$Project_Id <- factor(coldata.clinic$Project_Id)

> coldata.clinic$Gender <- factor(coldata.clinic$Gender)

> coldata.clinic$Age_at_Diagnosis <- factor(coldata.clinic$Age_at_Diagnosis)

 

> dds.clinic <- DESeqDataSetFromMatrix(countData = cts,

+                              colData = coldata.clinic,

+                              design = ~Group+Project_Id+Gender+Age_at_Diagnosis)

Error in checkFullRank(modelMatrix) :

  the model matrix is not full rank, so the model cannot be fit as specified.

  One or more variables or interaction terms in the design formula are linear

  combinations of the others and must be removed.

 

  Please read the vignette section 'Model matrix not full rank':

 

  vignette('DESeq2')

 

Design have included several factors, but I only need to compare case and control, without comparing the other factors(Project_id, Gender, Age_at_Diagnosis). I just want other factors to affect when comparing cases and controls.

And one of these factors can not be excluded.

How should I proceed in this case?

 


I searched for related problems and saw the design factors linked to *. In this case, the above error does not occur.

> dds.clinic <- DESeqDataSetFromMatrix(countData = cts,

+                               colData = coldata.clinic,

+                               design = ~Group*Project_Id*Gender*Age_at_Diagnosis)

What is the difference between + and *?

And what should I use for my analysis?

 

Could not solve it for a long time. Hope many comments from experts.            

 

DESeq2 multiple factor design design model design matrix rna-seq • 2.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 6 days ago
United States

Project ID seems to be a unique value for every sample. You can’t include such a term in the design because it removes any sense of replication.

ADD COMMENT
0
Entering edit mode

Hi Michael,

 

Some of Project_Id contain duplicates. Do I still have to remove it?

And not all design factors are replication. Can't I consider the way I want other factors to affect when comparing cases and controls?

ADD REPLY
0
Entering edit mode

I don’t really have enough information to answer a question. What kind of replication is there? I assume the table above is not the complete table. For how many samples? You cant include this variable as you have it and I don’t have enough information to say much more.

ADD REPLY
0
Entering edit mode

I'm sorry if the explanation is not good.

I have 470 case samples and 40 control samples. The goal is to compare the expression between case and control.

Since I know what clinical information the case and control samples have, I would like to add information such as TCGA_id, gender, and age at diagnosis to affect the comparative analysis of the expression between case and control.
(TCGA_id seems all different, but some overlap.)


And this information is not information about replication experiment samples.

 

I hope this is enough explanation for you.

ADD REPLY
0
Entering edit mode

I can’t answer the question without knowing what the nature of replicates are.

By the way, with 400+ samples, I tend to use limma for such analyses, because it is much faster than a GLM approach. You will still have to figure out how to deal with the replicates, either collapsing or using the duplicateCorrelation function.

ADD REPLY

Login before adding your answer.

Traffic: 446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6