Question: Why DESeq2 gives large logfold for group with all zero count?
0
gravatar for lanhuong
2.3 years ago by
lanhuong20
lanhuong20 wrote:

I have a following issue when running DESeq2 with a continuous effect variable.

In my experiment there are 2 time points and 2 groups. Each sample (individual) has a sample at timepoint 1 and timepoint 2, but belongs to only one of the groups.

dsgn <- "~1 + Batch + Group + Timepoint + Timepoint:Group + Timepoint:Group:X"
dds <- DESeq2::DESeqDataSetFromMatrix(
  countData = countData, 
  colData = colData,
  design = formula(dsgn)
)
dds <- DESeq2::estimateSizeFactors(dds, "poscounts")
dds <-  DESeq2::DESeq(dds)
resGAT0 <- results(dds, name = "GroupA.Timepoint0.X")

I am interested in the effect of a continuous variable X within each group A or B at each time point 0 or 1. I thought, the code above would achieve this. However, when inspecting results I found that many features (species) declared significant by DESeq2 (with a large logfold change) in the 'resGAT0' table had in fact all zero counts in the original data within that particular group A at time point 0. Is there some issue with the way I set up my DESeq2 analysis? In that case how to I obtain the actual effect for X within each of the 4 groups/time points combinations?

Is the issue related to the fact that the DESeq2::DESeq(dds) command returned a warning:

195 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Thanks you!

 

 

 

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by lanhuong20
Answer: Why DESeq2 gives large logfold for group with all zero count?
0
gravatar for Michael Love
2.3 years ago by
Michael Love25k
United States
Michael Love25k wrote:

It would be helpful if you could print the full colData. 

ADD COMMENTlink written 2.3 years ago by Michael Love25k

Another comment on this:

"the effect of a continuous variable X within each group A or B at each time point 0 or 1"

The term you have pulled out is: whether the difference in the slope in log counts associated with X from time=0 compared to timepoint=1 is itself different in group A vs group B. It's not exactly the same as the description of the term you are after. Another note is that it's typical to have e.g. time=0 and group=A as the reference level, but it seems you have it reversed.

I might suggest that you combine time and group into a single factor, call that "group" and then use a design of ~group:X, which will create a slope for each combination of time (0 or 1) and group (A or B as you have it).

ADD REPLYlink written 2.3 years ago by Michael Love25k

Thank you for a quick response!

I found that including the term Timepoint first would give me the correct ordering:

dsgn <- ~1 + Batch + Timepoint +  Group + Timepoint:Group + Timepoint:Group:X

My full column names for the design matrix would be (DESeq2::resultsNames(dds)):

[1] "Intercept"           "Batch_G2_vs_G1"      "Batch_G3_vs_G1"      "Batch_G4_vs_G1"    
[5] "Batch_G5_vs_G1"      "Timepoint_0_vs_1"    "Group_B_vs_A"        "Timepoint0.GroupB" 
[9] "Timepoint1.GroupA.X" "Timepoint0.GroupA.X" "Timepoint1.GroupB.X" "Timepoint0.GroupB.X"

1) In the current notation then what should be the terms I need to pull out to get the effect (slope) of the X variable in each group?

2) I like your idea of creating a new 4 level category factor variable, Cat, e.g. Tmpt0_GroupA. In this way my design should be probably:

dsgn <- ~1 + Batch + Cat + Cat:X

With this design I get:

[1] "Intercept"                       "Batch_G2_vs_G1"                                  
[3] "Batch_G3_vs_G1"                   "Batch_G4_vs_G1"                                  
[5] "Batch_G5_vs_G1"                   "Cat_Tmp1_GroupB_vs_Tmp1_GroupA"
[7] "Cat_Tmp0_GroupA_vs_Tmp1_GroupA"   "Cat_Tmp0_GroupB_vs_Tmp1_GroupA"
[9] "CatTmp1_GroupA.X"                 "CatTmp1_GroupB.X"           
[11] "CatTmp0_GroupA.X"                "CatTmp0_GroupB.X"      

This would produce for intercept coefficients and 4 X-slope coefficient the same way as the previous design. I assume in this case I can simply call e.g.

results(dds, name = "CatTmp0_GroupA.X")

to obtain the logfolds?

How would doing the same look like in the previous notation?

Thank you!

Lan

 

 

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by lanhuong20

hi,

"What should be the terms I need to pull out to get the effect (slope) of the X variable in each group?" - rather than answer this, which takes a long time, I would prefer you to use the design I recommended, which gives you four coefficients, one for the slope in each group, which is what you want. You can then pull them out with results(dds, name="...").

Your code above isn't what I was suggesting, instead use:

dds$z <- factor(paste0(dds$group, dds$time))
design(dds) <- ~z:x

If batch is not confounded with group you can add batch too: ~batch + z:x

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Michael Love25k

Wouldn't this mean that all groups share the same intercept? I am interested in the effect (slope) of the variable x in each group, and each group should be allowed to have dissimilar mean (intercept).

 

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by lanhuong20

You can add 'group' if it is not confounded with batch, to have different intercepts.

ADD REPLYlink written 2.3 years ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 341 users visited in the last hour