Search
Question: DESeq2 design subcategories
0
gravatar for rbronste
8 months ago by
rbronste60
rbronste60 wrote:

Hi,

Wondering how I can modify the following DESeq2 design to extract subcategories, for instance binding events differentially enriched in only Male Vehicle over others, as currently it is set to give me sex differential peaks from the ChIP-seq dataset

In addition, would this be possible with the current input matrix design that I have outlined below?

sampleNames<-c("MBV1","FBV1", "MBE7, "FBE3")
sampleSex<-c("male", "female", "male", "female")
sampleTreatment<-c("vehicle", "vehicle", "BB", "BB")
colData<-data.frame(sampleName=sampleNames, sex=sampleSex, treatment=sampleTreatment)

dds<-DESeqDataSetFromMatrix(countData = countData,
                            colData = colData,
                            design =  ~ treatment + sex)

Thank you!

ADD COMMENTlink modified 8 months ago by Michael Love19k • written 8 months ago by rbronste60
2
gravatar for Michael Love
8 months ago by
Michael Love19k
United States
Michael Love19k wrote:

If you only have four samples, you don't have enough replicates to fit a male specific treatment effect. You need at least one residual degree of freedom, and to have sex specific treatment effects would be 4 coefficients and 4 samples = 0 residual degrees of freedom.

ADD COMMENTlink written 8 months ago by Michael Love19k

Actually I have considerably more samples and did not include them in the code given an error from Bioconductor about inappropriate language usage, from long string of characters. Hopefully it works in this reply.

sampleNames<-c("MBV1",    "MBV2",    "MBV3",    "FBV1",    "FBV2",    "FBV3", "MBE7", "MBE8", "MBE9", "FBE1", "FBE2", "FBE3")
sampleSex<-c("male", "male", "male", "female", "female", "female", "male", "male", "male", "female", "female", "female")
sampleTreatment<-c("vehicle", "vehicle", "vehicle", "vehicle", "vehicle", "vehicle", "BB", "BB", "BB", "BB", "BB", "BB")

So it looks something more like what I have above, can you give some advice about the design I was interested in? Thank you very much!

ADD REPLYlink written 8 months ago by rbronste60
1
> table(sampleSex, sampleTreatment)
         sampleTreatment
sampleSex BB vehicle
   female  3       3
   male    3       3

Then you can use ~sex + sex:treatment to extract sex-specific treatment effects. You can pull them out and build a results table using the 'name' argument of results().

ADD REPLYlink written 8 months ago by Michael Love19k

Exactly what I was after, thank you! 

ADD REPLYlink written 8 months ago by rbronste60

The one thing I am a little confused about is here I can tell how to extract the "sexmale.treatmentBB" and "sexfemale.treatmentBB" for this particular design as I want the peaks specific to all four individual conditions (e.g. Male vehicle, male BB, female Vehicle, female BB) 

Thanks!

resultsNames(dds)

[1] "Intercept"                  "sex_male_vs_female"         "sexfemale.treatmentvehicle" "sexmale.treatmentvehicle"
ADD REPLYlink written 8 months ago by rbronste60
1

I think a point of confusion here is that DESeq2 performs comparisons on counts, it doesn’t do what you describe. You have four conditions so you can compare B vs A and D vs C and then also the contrast of those two.  Regarding which level is chosen as the reference level, see the vignette section about releveling factors.

ADD REPLYlink written 8 months ago by Michael Love19k

Right so all I am trying to do is perform analysis on counts where I can get differential calls on for instance all samples that are both male and BB (treatment), so you're indicating that DESeq2 can't take the counts from those specific samples relative to all others to get a differential signature? 

ADD REPLYlink written 8 months ago by rbronste60
1

Not really. It’s not clear to me how best to compare A vs others. You can compare A vs the average of others, but I’m not big fan of this approach because the average of multiple groups is an artificial point. You can still get a LFC of 0 when the level of A is clearly distinct from all other groups (if the average LFCs of A vs each other group is 0). I’d prefer to do three pairwise comparisons and merge the FDR sets.

ADD REPLYlink written 8 months ago by Michael Love19k

Ok I understand what you're saying in this regard. So one question is, I did the following:

dds<-DESeqDataSetFromMatrix(countData = countData,

                            colData = colData,
                            design =  ~ sex + sex:treatment)

What this did in terms of identifying specific binding counts is show me instances where, for either male vehicle or male BB, there are very low counts compared with the rest of the binding matrix which includes female vehicle and BB. I guess what I wanted was the opposite where it shows me the higher counts for one triplet of replicates (e.g. Male BB) vs the 3 other conditions. So it seems like it worked in the way you describe however doing the opposite. Does that make sense? Thanks again for your input!

ADD REPLYlink written 8 months ago by rbronste60
1

Then use coefficient  averaging:

Use a design of ~0 + group

Then results(dds, contrast=c(1, -1/3, -1/3, -1/3)), etc.

ADD REPLYlink written 8 months ago by Michael Love19k

Ok will give it a try, thanks! I guess I was not representing to you that there is not necessarily a covariate here with separate counts but rather combinations of conditions/sex such as Male BB which are actually representative of a single count column, or rather of 3 replicate count columns. So the group makes sense.

ADD REPLYlink modified 8 months ago • written 8 months ago by rbronste60

A bit of a tangential question, but do you have any suggestions on how to effectively visualize results of something like this comparison, basically some kind of 4-way (dimension) graph that shows differential peaks unique to each of the 4 categories? Thanks!

ADD REPLYlink written 6 months ago by rbronste60
1

Humans are best at seeing two dimensions at a time. I'd say the best way to look at counts across multiple groups and multiple significant genes is a heatmap, and we have some code for that in the vignette, and there are even more complex packages for building multi-heatmap diagrams, e.g. ComplexHeatmap.

ADD REPLYlink written 6 months ago by Michael Love19k

Thanks for the tip! The code in the vignette refers to doing this with raw counts and I was more interested in doing this with output of results() such as from:

results(dds, contrast=c(1, -1/3, -1/3, -1/3))

With all four possible coefficients plotted. Thanks again!

ADD REPLYlink written 6 months ago by rbronste60
1

Also see the vignette section on LFC thresholds, on how to select only positive LFC.

ADD REPLYlink written 8 months ago by Michael Love19k

In your experience is there a good approach to picking this positive LFC threshold? I know its experiment specific, but just curious about your thoughts. Thanks.

ADD REPLYlink written 8 months ago by rbronste60
1

I only recommended it here because you explicitly asked for it above:

"what I wanted was the opposite where it shows me the higher counts for one triplet of replicates (e.g. Male BB) vs the 3 other conditions"

ADD REPLYlink written 8 months ago by Michael Love19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 280 users visited in the last hour