DESeq2 with two continuous variable in a design
2
0
Entering edit mode
sofia • 0
@725046bd
Last seen 17 months ago
Canada

Hi everyone, Hi Michael,

I am using DESeq2 to analyze small RNA sequencing results, and I am very uncertain about the results I am getting for a particular design, so I would really appreciate some help.

The following is my coldata and my design

head(colDesign)
            Age        BMI

sample_01   17.0      23.03     
sample_02   16.1      23.66     
sample_03   10.4      18.56     
sample_04   15.3      39.57     
sample_05   14.5      25.19     
sample_06   18.1      18.88     

dds=DESeqDataSetFromMatrix(countData = cts_bmi,
                              colData = colDesign,
                              design = ~ Age+BMI+Age:BMI) 
DESeq(dds)
resultsNames(dds)
[1] "Intercept" "Age"    "BMI"     "Age.BMI"

As you can see, I want to study the interaction between 2 continuous variables, age and BMI. I read the interaction examples on the help page and I also read Michael's answer on a thread regarding the interaction between a discrete and a continuous variable, from what I learned is that, there's no reference point for the continuous variable, it's all embedded in the intercept? So for my design, results(dds, name="Age.BMI") will give the effect of BMI across all ages? Is that correct? And also if I want to use LRT test, should the reduced model be reduced=~Age+BMI?

Thank you in advance, any help is greatly appreciated!

DESeq2 RNASeq R design • 1.6k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

Basically, correct. I'd recommend however to center and scale the numeric covariates for model fitting improvement. Then the coefficients represent changes with one SD of the variable. You can back-convert back by multiplying the coefficient by SD(Age) or SD(BMI) if you want the coefficients on original scale.

ADD COMMENT
0
Entering edit mode

Thank you very much Mike! I have another similar question, so I have a design design= Sex+Sex:Age, and the resultsNames are "Intercept" "Sex_M_vs_F" "SexF.Gestational.Age..wks." "SexM.Gestational.Age..wks." If my understanding is right, results(dds, name="Sex_M_vs_F) is the main effect (the gender effect across all ages), and if I do results(dds, contrast=list("SexF.Gestational.Age..wks." "SexM.Gestational.Age..wks.")), it will test if the age effect is different between males and females? Isn't that the same as the main effect?

ADD REPLY
0
Entering edit mode

See the interactions section of the vignette, and then for further questions I'd recommend consulting with a local statistician. In particular, the main effect is not averaging across all ages.

ADD REPLY
0
Entering edit mode
Laia ▴ 10
@239caad3
Last seen 3 days ago
Belgium

Hi,

I have a similar question. In my design I have two continuous variables that are the conditions at which I cultured my cells: different substrate stiffnesses (3 stiffnesses), and different flow rates applied (5 flow rates). I wrote down the units in colData because otherwise it gave error when fitting the model.

> print(colData)
                                       stiffness       shear
AA_2_val_sorted.bam   100_kpa       25_dyn
AC_2_val_sorted.bam    10_kpa        25_dyn
AE_2_val_sorted.bam     1_kpa         15_dyn
AF_2_val_sorted.bam     1_kpa         15_dyn
AH_2_val_sorted.bam     1_kpa         25_dyn
AI_2_val_sorted.bam    10_kpa          5_dyn
AJ_2_val_sorted.bam   100_kpa        5_dyn
AL_2_val_sorted.bam   100_kpa       15_dyn
AM_2_val_sorted.bam   100_kpa       25_dyn
AO_2_val_sorted.bam    10_kpa        40_dyn
AR_2_val_sorted.bam     1_kpa         15_dyn
AS_2_val_sorted.bam     1_kpa          25_dyn
(...)

Samples are independent (only 1 measure per sample = only 1 timepoint), and I have 3 or 4 replicates per condition. For 1_kpa, there is no 40_dyn flow condition, only static, 5_dyn, 15_dyn, 25_dyn (4 flow rates instead of 5).

I want to see if there is an interaction of shear and stiffness (hope I have enough power), but when I fit the design:

dds <- DESeqDataSetFromMatrix(countData = counts_data,
                              colData = colData,
                              design = ~ stiffness + shear + stiffness:shear)

I get:

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')
In addition: Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors

I checked this, and I read several answers in Bioconductor, but still don't know how should I fit my model. Do I need to nest some factor to avoid 1kpa having only 4 flow conditions while 10kpa and 100kpa have both 5 flow conditions?

Any help with understanding this would be very much appreciated.

ADD COMMENT
0
Entering edit mode

I'd recommend consulting with a local statistician or someone familiar with linear models in R, for help designing your statistical analysis.

ADD REPLY

Login before adding your answer.

Traffic: 191 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6