Question

DESeq2 design subcategories

0

Entering edit mode

rbronste ▴ 60

@rbronste-12189

Last seen 4.4 years ago

Hi,

Wondering how I can modify the following DESeq2 design to extract subcategories, for instance binding events differentially enriched in only Male Vehicle over others, as currently it is set to give me sex differential peaks from the ChIP-seq dataset

In addition, would this be possible with the current input matrix design that I have outlined below?

sampleNames<-c("MBV1","FBV1", "MBE7, "FBE3")
sampleSex<-c("male", "female", "male", "female")
sampleTreatment<-c("vehicle", "vehicle", "BB", "BB")
colData<-data.frame(sampleName=sampleNames, sex=sampleSex, treatment=sampleTreatment)

dds<-DESeqDataSetFromMatrix(countData = countData,
                            colData = colData,
                            design =  ~ treatment + sex)

Thank you!

deseq2 deseq differential binding analysis • 1.8k views

ADD COMMENT • link updated 6.4 years ago by Michael Love 41k • written 6.4 years ago by rbronste ▴ 60

score 2 · Answer 1 · 2017-12-13

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 7 hours ago

United States

If you only have four samples, you don't have enough replicates to fit a male specific treatment effect. You need at least one residual degree of freedom, and to have sex specific treatment effects would be 4 coefficients and 4 samples = 0 residual degrees of freedom.

ADD COMMENT • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Actually I have considerably more samples and did not include them in the code given an error from Bioconductor about inappropriate language usage, from long string of characters. Hopefully it works in this reply.

sampleNames<-c("MBV1",    "MBV2",    "MBV3",    "FBV1",    "FBV2",    "FBV3", "MBE7", "MBE8", "MBE9", "FBE1", "FBE2", "FBE3")
sampleSex<-c("male", "male", "male", "female", "female", "female", "male", "male", "male", "female", "female", "female")
sampleTreatment<-c("vehicle", "vehicle", "vehicle", "vehicle", "vehicle", "vehicle", "BB", "BB", "BB", "BB", "BB", "BB")

So it looks something more like what I have above, can you give some advice about the design I was interested in? Thank you very much!

ADD REPLY • link 6.4 years ago rbronste ▴ 60

1

Entering edit mode

> table(sampleSex, sampleTreatment)
         sampleTreatment
sampleSex BB vehicle
   female  3       3
   male    3       3

Then you can use ~sex + sex:treatment to extract sex-specific treatment effects. You can pull them out and build a results table using the 'name' argument of results().

ADD REPLY • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Exactly what I was after, thank you!

ADD REPLY • link 6.4 years ago rbronste ▴ 60

0

Entering edit mode

The one thing I am a little confused about is here I can tell how to extract the "sexmale.treatmentBB" and "sexfemale.treatmentBB" for this particular design as I want the peaks specific to all four individual conditions (e.g. Male vehicle, male BB, female Vehicle, female BB)

Thanks!

resultsNames(dds)

[1] "Intercept"                  "sex_male_vs_female"         "sexfemale.treatmentvehicle" "sexmale.treatmentvehicle"

ADD REPLY • link 6.4 years ago rbronste ▴ 60

1

Entering edit mode

I think a point of confusion here is that DESeq2 performs comparisons on counts, it doesn’t do what you describe. You have four conditions so you can compare B vs A and D vs C and then also the contrast of those two. Regarding which level is chosen as the reference level, see the vignette section about releveling factors.

ADD REPLY • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Right so all I am trying to do is perform analysis on counts where I can get differential calls on for instance all samples that are both male and BB (treatment), so you're indicating that DESeq2 can't take the counts from those specific samples relative to all others to get a differential signature?

ADD REPLY • link 6.4 years ago rbronste ▴ 60

1

Entering edit mode

Not really. It’s not clear to me how best to compare A vs others. You can compare A vs the average of others, but I’m not big fan of this approach because the average of multiple groups is an artificial point. You can still get a LFC of 0 when the level of A is clearly distinct from all other groups (if the average LFCs of A vs each other group is 0). I’d prefer to do three pairwise comparisons and merge the FDR sets.

ADD REPLY • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Ok I understand what you're saying in this regard. So one question is, I did the following:

dds<-DESeqDataSetFromMatrix(countData = countData,

                            colData = colData,
                            design =  ~ sex + sex:treatment)

What this did in terms of identifying specific binding counts is show me instances where, for either male vehicle or male BB, there are very low counts compared with the rest of the binding matrix which includes female vehicle and BB. I guess what I wanted was the opposite where it shows me the higher counts for one triplet of replicates (e.g. Male BB) vs the 3 other conditions. So it seems like it worked in the way you describe however doing the opposite. Does that make sense? Thanks again for your input!

ADD REPLY • link 6.4 years ago rbronste ▴ 60

1

Entering edit mode

Then use coefficient averaging:

Use a design of ~0 + group

Then results(dds, contrast=c(1, -1/3, -1/3, -1/3)), etc.

ADD REPLY • link 6.4 years ago Michael Love 41k

0

Entering edit mode

Ok will give it a try, thanks! I guess I was not representing to you that there is not necessarily a covariate here with separate counts but rather combinations of conditions/sex such as Male BB which are actually representative of a single count column, or rather of 3 replicate count columns. So the group makes sense.

ADD REPLY • link 6.4 years ago rbronste ▴ 60

0

Entering edit mode

A bit of a tangential question, but do you have any suggestions on how to effectively visualize results of something like this comparison, basically some kind of 4-way (dimension) graph that shows differential peaks unique to each of the 4 categories? Thanks!

ADD REPLY • link 6.2 years ago rbronste ▴ 60

1

Entering edit mode

Humans are best at seeing two dimensions at a time. I'd say the best way to look at counts across multiple groups and multiple significant genes is a heatmap, and we have some code for that in the vignette, and there are even more complex packages for building multi-heatmap diagrams, e.g. ComplexHeatmap.

ADD REPLY • link 6.2 years ago Michael Love 41k

0

Entering edit mode

Thanks for the tip! The code in the vignette refers to doing this with raw counts and I was more interested in doing this with output of results() such as from:

results(dds, contrast=c(1, -1/3, -1/3, -1/3))

With all four possible coefficients plotted. Thanks again!

ADD REPLY • link 6.2 years ago rbronste ▴ 60

1

Entering edit mode

Also see the vignette section on LFC thresholds, on how to select only positive LFC.

ADD REPLY • link 6.4 years ago Michael Love 41k

0

Entering edit mode

In your experience is there a good approach to picking this positive LFC threshold? I know its experiment specific, but just curious about your thoughts. Thanks.

ADD REPLY • link 6.4 years ago rbronste ▴ 60

1

Entering edit mode

I only recommended it here because you explicitly asked for it above:

"what I wanted was the opposite where it shows me the higher counts for one triplet of replicates (e.g. Male BB) vs the 3 other conditions"

ADD REPLY • link 6.4 years ago Michael Love 41k