Question: Analysis with DESeq2: should I put all the samples from three conditions in the ColData and CountData or perform the analysis separately
gravatar for AROA SUÁREZ VEGA
10 months ago by

Hello! I am performing my differential expression RNA-Seq analysis with DESeq2.

I have the following design, with three conditions

ID_seq  ID_animal            Condition           

C8DRYANXX_4_22           LUFO209              FO         

C8DRYANXX_4_25           LUFO219              FO         

C8DRYANXX_5_14           LUFO169              FO         

C8DRYANXX_6_18           LUFO177              FO         

C8DRYANXX_4_23           LUFO218              Control

C8DRYANXX_5_15           LUFO171              Control                

C8DRYANXX_6_20           LUFO181              Control

C8DRYANXX_6_21           LUFO197              Control

C8F23ACXX_7_20             LUFO181              Control                

C8DRYANXX_4_27           LUFO238              LU          

C8DRYANXX_5_16           LUFO173              LU          

C8DRYANXX_6_19           LUFO179              LU          

C8EB2ANXX_5_27            LUFO238              LU          

C8F23ACXX_7_19             LUFO179              LU          

C8F23ACXX_8_13             LUFO163              LU          

HHGFTBBXX_6_11           LUFO215              LU          

HHGFTBBXX_6_12           LUFO234              LU          

And the PCA of my data:

When I perform the analysis like this:

>dds <- DESeqDataSetFromMatrix(DE_genesCondition, colData, design = ~Condition)

>DESeq.dsCollapsed <- collapseReplicates( dds, groupby = dds$ID_animal)

>DESeq.dsCollapsed <-DESeq(DESeq.dsCollapsed)

And, I obtain the following results:

FOvsControl: 37 differentially expressed genes (DEG)

LUvsControl: 2515 DEG

LUvsFO: 817 DEG

However, when I perform the analyses independently, that is, indicating in the colData dataframe only the samples within the different contrast (for example, only Control and LU samples) and running DESeq separately three times, I obtain these results:

FOvsControl: 237 DEG

LUvsControl: 1992 DEG

LUvsFO: 672 DEG

As it can be seen, the results change from one to another approach. And the first thing that draws my attention is the high increase of the DEG in FOvsControl, that could be due to the reduction of the dispersion caused by the LU samples when you run the DESeq function in the first approach. However, in order to make the analysis of my experiment, I do not know which of these two approaches is the most correct. Could anyone help me?


ADD COMMENTlink modified 10 months ago by Michael Love19k • written 10 months ago by AROA SUÁREZ VEGA20


ADD REPLYlink written 10 months ago by AROA SUÁREZ VEGA20
gravatar for Michael Love
10 months ago by
Michael Love19k
United States
Michael Love19k wrote:

This is discussed in one of the Frequently Asked Questions (FAQ) in the DESeq2 vignette. Please check there for the explanation, and post a comment on this answer if you have more questions.

ADD COMMENTlink written 10 months ago by Michael Love19k

Thank you very much for your help, sorry I hadn't read that

ADD REPLYlink written 10 months ago by AROA SUÁREZ VEGA20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 282 users visited in the last hour