Question

DESeq2 with data containing 3 factors or more and outlier detection

3

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

hi Olga, I CC the Bioconductor mailing list. Answers in line below. On Thu, May 8, 2014 at 8:57 AM, solgakar at bi.technion.ac.il <solgakar at="" bi.technion.ac.il=""> wrote: > > > - If I have data with three factors: condition, age and group. > > The levels of condition are: treated and untreated > > The levels of age are: young and adult > > The levels of group are: 1 and 2 > > Suppose I want to receive the result for: treated-1 vs. treated-2 for age=young and the result for: treated-1 vs. treated-2 in age=adult. > > From what I know, there is no way to receive interaction between three factors at a time. Yes, this is true for betaPrior=TRUE, the default. I only implemented the shrinkage of effects for models with terms up to order two. There is no limitation, however if you set betaPrior=FALSE. > > With that assumption, how can I receive these results? Is it better to separate the data into two separate sets: young and adult, and in each to receive the result : treated-1 vs. treated-2. > > Or is there a better way to receive the results I want using only one full data set containing all the samples? > > I had an idea to combine the factor age and condition into one factor: age_condition (with levels young_treated, young_untreated, adult_treated, adult_untreated), since I am not interested in the interaction between those two. Yes, this combining levels seems like the cleanest approach when users have a lot of groups they are interested in comparing. > Using there method, if I would call the results function using the "list" option in contrast with: "age_conditionyoung_treated.group1", "age_conditionyoung_treated.group2", will I receive the result I expect? yes, this would work, but for the group 1 vs 2 effect for treated and young, you need to also add in the main effect for group 1 vs 2: results(dds, contrast=list(c("group1","age_conditionyoung_treated.grou p1"),c("group2","age_conditionyoung_treated.group2") The contrast as you have it is for testing whether the 1 vs 2 effect is different in young and treated samples than the main 1 vs 2 effect. Another option, with a caveat. You might even consider combining all three factors into, say a variable 'g', if you are interested in more easily extracting the contrasts between groups, for example: results(dds, contrast=c("g","young_treated_condition1","young_treated_condition2")) The caveat is: it is easier for comparing groups, but doesn't provide access to the interactions. Additionally, this option will give the same contrasts of groups with betaPrior=FALSE, but the contrasts of groups will not be exactly the same with this option and using a beta prior. This is a modeling choice that the user must decided, whether the fold changes for only only the interaction terms should be moderated (using a design with interaction terms), or whether the fold changes from each unique combination of level should be moderated (here combining all factors into one factor). > - I refer to the outlier detection algorithm. is the detection of outlier Is done separately for each result or is it done once for an entire data set? The outlier filtering is done for the entire gene, for simplicity of implementation. However, the user can perform any filtering they chose by setting cooksCutoff=FALSE, and then using the matrix of Cook's distances which are stored in assays(dds)[["cooks"]]. For example: mcols(dds)$myOutlierFlag <- rowSums( assays(dds)[["cooks"]][ , c(2,3,4) ] > 10 ] res$pvalue[ mcols(dds)$myOutlierFlag ] <- NA res$padj <- p.adjust( res$pvalue, method="BH" ) The above code sets a flag for genes when samples 2,3 and 4 have a Cook's distance over 10. One can find their own cutoff by looking at the Cook's values over the proportion that the maximum count contributes to the total count for a row (here I use a very simple unexported function): plot( DESeq2:::propMaxOfTotal(counts(dds)[ , c(2,3,4) ] ), apply(assays(dds)[["cooks"]][ , c(2,3,4) ] , 1, max) ) Mike > > Meaning, if one sample is set to be an outlier, will the gene receive "NA" in the pvalue and adjusted pvalue for all comparisons or only those that sample_1 is relevant to? > > I ask this question since if sample_1 is an outlier but I am interested in the comparison only regarding samples 2-6, I wouldn't want to lose the information for this gene in this comparison. > > - Is there an easy way to detect which sample is the outlier for a specific gene in a specific comparison if I want to flag it in any way? > > Thank you very much, > > Olga Karinsky. > > > > > >

• 2.2k views

ADD COMMENT • link 10.0 years ago Michael Love 41k