Question

collapseReplicates changes order of factors

0

Entering edit mode

@antonio-miguel-de-jesus-domingues-5182

Last seen 11 weeks ago

Germany

IGNORE this issue. It is likely that I did something wrong at some point since I am having problems reproducing the problem myself.

tldr: it seems that after `collapseReplicates` the levels of newly created factors are shifted affecting the results. **If** this observation is correct and not a due to a mistake i my design, I suggest adding a warning in either in the function itself, or a note in the vignette.

I was doing an analysis with a few variables and in one of my analysis I want to compare a mix of two of those: `DevStage` combined with `Treatment`. Following usual instructions, instead of creating a more complicated model, I simply created a new condition combining these variables:

(my apologies for having to source the sample data but it was the minimal example I could use)

library("DESeq2")

source("https://gist.githubusercontent.com/adomingues/65ddd8a658b547952407eb5c01072e77/raw/98206cb399dcfeef6acda6513631eeb3c2be8fff/deseq2_factors.R")

samples$NewCondition <- factor(paste(samples$DevStage, samples$Treatment, sep = "_"))

ddsTmp <- makeExampleDESeqDataSet(m=28)
colnames(ddsTmp) <- rownames(samples)

countdata <- assay(ddsTmp)

dds <- DESeqDataSetFromMatrix(
  countData = countdata,
  colData = samples,
  design = ~ NewCondition
)
colData(dds)

DataFrame with 28 rows and 5 columns
                                                sample DevStage     Type
                                              <factor> <factor> <factor>
wt_Fish_1dpf_rep1_mRNA.paired   wt_Fish_1dpf_rep1_mRNA     1dpf   paired
wt_Fish_1dpf_rep2_mRNA.paired   wt_Fish_1dpf_rep2_mRNA     1dpf   paired
wt_Fish_2dpf_rep1_mRNA.paired   wt_Fish_2dpf_rep1_mRNA     2dpf   paired
wt_Fish_2dpf_rep2_mRNA.paired   wt_Fish_2dpf_rep2_mRNA     2dpf   paired
wt_Fish_3dpf_rep1_mRNA.paired   wt_Fish_3dpf_rep1_mRNA     3dpf   paired
...                                                ...      ...      ...
wt_PGCs_6dpf_rep2_mRNA.paired   wt_PGCs_6dpf_rep2_mRNA     6dpf   paired
wt_PGCs_6dpf_rep3_mRNA.paired   wt_PGCs_6dpf_rep3_mRNA     6dpf   paired
wt_PGCs_10dpf_rep1_mRNA.paired wt_PGCs_10dpf_rep1_mRNA    10dpf   paired
wt_PGCs_10dpf_rep2_mRNA.paired wt_PGCs_10dpf_rep2_mRNA    10dpf   paired
wt_PGCs_10dpf_rep3_mRNA.paired wt_PGCs_10dpf_rep3_mRNA    10dpf   paired
                                 Treatment  NewCondition
                               <character>      <factor>
wt_Fish_1dpf_rep1_mRNA.paired      control  1dpf_control
wt_Fish_1dpf_rep2_mRNA.paired      control  1dpf_control
wt_Fish_2dpf_rep1_mRNA.paired      control  2dpf_control
wt_Fish_2dpf_rep2_mRNA.paired      control  2dpf_control
wt_Fish_3dpf_rep1_mRNA.paired      control  3dpf_control
...                                    ...           ...
wt_PGCs_6dpf_rep2_mRNA.paired      treated  6dpf_treated
wt_PGCs_6dpf_rep3_mRNA.paired      treated  6dpf_treated
wt_PGCs_10dpf_rep1_mRNA.paired     treated 10dpf_treated
wt_PGCs_10dpf_rep2_mRNA.paired     treated 10dpf_treated
wt_PGCs_10dpf_rep3_mRNA.paired     treated 10dpf_treated

Here what I am trying to compare is the effect of `treatment` at each `DevStage`:

- 1dpf_treated vs 1dpf_control
- 2dpf_treated vs 2dpf_control
- ...

Since in this set-up we also sequenced some samples with single and paired end reads, after QC I merged the replicated for the DGE analysis:

ddsColl <- collapseReplicates(dds, dds$sample, dds$Type)
ddsColl <- DESeq(ddsColl)
colData(ddsColl)
DataFrame with 25 rows and 7 columns
                                         sample DevStage     Type   Treatment
                                       <factor> <factor> <factor> <character>
wt_Fish_10dpf_rep1_mRNA wt_Fish_10dpf_rep1_mRNA    10dpf   paired     control
wt_Fish_10dpf_rep2_mRNA wt_Fish_10dpf_rep2_mRNA    10dpf   paired     control
wt_Fish_1dpf_rep1_mRNA   wt_Fish_1dpf_rep1_mRNA     1dpf   paired     control
wt_Fish_1dpf_rep2_mRNA   wt_Fish_1dpf_rep2_mRNA     1dpf   paired     control
wt_Fish_2dpf_rep1_mRNA   wt_Fish_2dpf_rep1_mRNA     2dpf   paired     control
...                                         ...      ...      ...         ...
wt_PGCs_3dpf_rep2_mRNA   wt_PGCs_3dpf_rep2_mRNA     3dpf   paired     treated
wt_PGCs_3dpf_rep3_mRNA   wt_PGCs_3dpf_rep3_mRNA     3dpf   paired     treated
wt_PGCs_6dpf_rep1_mRNA   wt_PGCs_6dpf_rep1_mRNA     6dpf   paired     treated
wt_PGCs_6dpf_rep2_mRNA   wt_PGCs_6dpf_rep2_mRNA     6dpf   paired     treated
wt_PGCs_6dpf_rep3_mRNA   wt_PGCs_6dpf_rep3_mRNA     6dpf   paired     treated
                         NewCondition runsCollapsed        sizeFactor
                             <factor>   <character>         <numeric>
wt_Fish_10dpf_rep1_mRNA 10dpf_control        paired  0.94359672670364
wt_Fish_10dpf_rep2_mRNA 10dpf_control        paired 0.944675357175298
wt_Fish_1dpf_rep1_mRNA   1dpf_control        paired 0.976524986018752
wt_Fish_1dpf_rep2_mRNA   1dpf_control        paired 0.944935039261004
wt_Fish_2dpf_rep1_mRNA   2dpf_control        paired 0.988405339505884
...                               ...           ...               ...
wt_PGCs_3dpf_rep2_mRNA   3dpf_treated        paired 0.971152123564414
wt_PGCs_3dpf_rep3_mRNA   3dpf_treated        paired 0.926392628153864
wt_PGCs_6dpf_rep1_mRNA   6dpf_treated paired,single  1.97410478743642
wt_PGCs_6dpf_rep2_mRNA   6dpf_treated        paired 0.972588828338043
wt_PGCs_6dpf_rep3_mRNA   6dpf_treated        paired 0.952043225967793

However when looking at the results, the FC and p-values did not match what I could see in either the IGV tracks or the read counts (not shown here). I dug deeper and found that the issue were the sifted factor levels after collapsing the replicates, specifically for rows after the libraries where collapsed:

# Before the first row collapsing
# levels match what is expected
ddsColl$NewCondition[13]
str(ddsColl$NewCondition[13])
levels(ddsColl$NewCondition)[2]

# After the level name and value do not match
ddsColl$NewCondition[15]
str(ddsColl$NewCondition[15])
levels(ddsColl$NewCondition)[5]

The way I solve it was to create the `NewCondition` after `collapseReplicates`. It is possible that this is a quite specific issue to my particular experimental design or I made a mistake somewhere. Nonetheless I would like to alert the developers and users for this in case someone else runs into similar issue, or when running many comparisons doesn't visually confirm the results of all of them. I noticed becasue there is a set of "control" genes in my set-up that checked to see the effect in all comparisons. Otherwise it would have gone unnoticed.

SessionInfo:

deseq2 bug • 885 views

ADD COMMENT • link 5.4 years ago António Miguel de Jesus Domingues ▴ 490

score 0 · Answer 1 · 2018-12-03

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 5 hours ago

United States

I'm trying to reproduce this but I don't see the factor levels change after collapseReplicates.

Here, before and after, "B" is the reference level:

> dds <- makeExampleDESeqDataSet()
> dds$sample <- rep(1:6,each=2)
> dds$condition
 [1] A A A A A A B B B B B B
Levels: A B
> dds$condition <- relevel(dds$condition, "B")
> dds$condition
 [1] A A A A A A B B B B B B
Levels: B A
> dds <- collapseReplicates(dds, groupby=dds$sample)
> dds$condition
[1] A A A B B B
Levels: B A

Can you explain more, perhaps using this small example, what you are seeing?

ADD COMMENT • link 5.4 years ago Michael Love 41k

0

Entering edit mode

I actually tried to create a smaller example in the same way, and like you also failed. That is why I mention that this could be a localized issue. That said, I will try to replicate with a small subset of and get back to you.

ADD REPLY • link 5.4 years ago António Miguel de Jesus Domingues ▴ 490

0

Entering edit mode

I am also having problems replicating it so it is likely that at some I made a mistake. I will add a note at the top of the post. Apologies for the time lost but I was certain that something was going on.

ADD REPLY • link 5.4 years ago António Miguel de Jesus Domingues ▴ 490