DESeq2 - couple of clarifications
1
1
Entering edit mode
@federico-marini-6465
Last seen 3 months ago
Germany

Hey Mike,

a couple of questions on DESeq2, but first of all, some code to make my questions reproducible:

library(airway)
library(DESeq2)
library(magrittr)
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                              colData = colData(airway),
                                              design=~cell+dex)
dds_airway <- DESeq(dds_airway)

 

  • alpha & independentFiltering. Can it be a tiny bug that when I set independentFiltering to FALSE, then the alpha is somehow not "set" in the DESeqResults object? Please compare the outcomes of these commands
(results(dds_airway,contrast=c("dex","trt","untrt"),alpha= 0.05,independentFiltering = T))  %>% summary
(results(dds_airway,contrast=c("dex","trt","untrt"),alpha= 0.05,independentFiltering = F))  %>% summary
(results(dds_airway,contrast=c("dex","trt","untrt"),alpha= 0.05,independentFiltering = F))  %>% summary(alpha=0.05)
  • For an app development, I am trying to cover "automatically" the cases where the covariate is a factor, a continuous one or also where the levels are more than two. Quick check I am doing it right, according to the documentation:
     factor -> contrast = the 3-element vector
     numeric -> name = the character name of the numeric
     more than 2 levels -> rerun DESeq with "LRT" as test and then use the full & reduced model to specify the contrast

    Moreover, are you by chance aware of a dataset where there was a (possibly meaningful) use of a continuous covariate? As a toy case I am using airway with the read length and I am (correctly) getting very few hits. Or if not, do you know a robust way of simulating such a dataset?
  • I have seen you recommending the salmon path now for generating the counts, especially after the DTE/DGE/DTU paper of you and Charlotte. I found it a little harder to explain to the cooperation partners with the extra modeling-step already at the counting level, and this is kind of keeping me in the "old and safe" featureCounts-based approach. Do you have a suggestion on how to sell at best the advantages of the new method, well, apart from linking to your paper?

 

Thank you in advance!

Federico

deseq2 • 1.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Regarding 'alpha' in results() and summary(), when you have independentFiltering=TRUE, then the alpha is used by the function to optimize the independent filtering, and then it's used again as a relevant threshold by summary() when alpha is not explicitly provided to summary(). If you have independentFiltering=FALSE, then alpha is ignored by results() and not passed to summary(). I've clarified this just now in the help page.

The second question sounds right, although for a factor with more than two levels, sometimes users want to do 2-3 pairwise (B vs A, C vs A, sometimes C vs B), and sometimes they want a LRT. 

When other developers have worked on wrappers for DESeq2 (for example, ReportingTools), they've encountered a number of headaches by trying to call results() internally to their software, because it takes a lot of effort to provide all the functionality that results() provides. This is why I've often recommended that, if possible, developers let users interface with DESeq2::results() directly, and then operate on the DESeqResults table instead. But it's up to you. 

I don't have a publicly available, processed dataset in mind with a numeric covariate, but I'm sure many exist. The trick is that you first need to do some exploration to make sure that a linear relationship between the covariate and log counts makes sense, i.e. to rule out the possibility of saturation, or convex or concave patterns.

Re: selling the new methods, it's good to keep in mind that the estimated counts are highly correlated with the unique counts. The bonus is: much faster and more efficient generation of these matrices, possibility to recover multi-mapping reads through probabilistic assignment, avoids any potential issues with DTU which could throw off inference from gene-level unique counts.

ADD COMMENT
0
Entering edit mode

Thank you for the clarification!

As for the LRT vs pairwise, you are right. I wanted at least to prompt the user that (s)he can perform the lrt test when more than 2 levels are available.

I also had my personal small portion of pain with using ReportingTools, so I know what you mean - it is still quite a great tool, kudos to the developers for it!

Thanks for the tip on the dataset, I will look deeper - and in the meanwhile hope some other user might already have been looking for the same thing.

Finally, good points for the new method selling. I also found a recent presentation by Charlotte @CSAMA, so I gathered enough info on becoming a good prophet for the novel approach.

ADD REPLY

Login before adding your answer.

Traffic: 1011 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6