Question

Deseq 2 multiple comparisons (3 groups, 24 samples)

0

Entering edit mode

Matthew.baldwin • 0

@matthewbaldwin-17095

Last seen 2.3 years ago

United Kingdom

Dear all,

I am hoping you can help the correct test for my experimental design which is as follows: I tested the cells of 3 different donors (rotator cuff tears), on 8 different nano fibre scaffolds (different anisotropy; random vs aligned and different fibre diameters; 300nm,1000nm,2000 and 4000).

I wish to compare the effects of each nano scaffold against all other scaffolds e.g. aligned300 vs aligned 1000 or aligned300 vs random4000, creating 56 possible comparisons.

Do do this I have constructed the following data frame:

samplenames<-meta_data$SampleID 
samples <- data.frame((samplenames),donor=as.factor(c(rep("1",4),rep("2",4),rep("3",4))), anisotrophy=as.factor(c(rep("aligned",12), rep("random",12))), diameter=as.factor(rep(c("300","1000","2000","4000"),6)))

X.samplenames. <fctr>	donor <fctr>	anisotrophy <fctr>	diameter <fctr>
i25-Healthy1-Aligned-300	1	aligned	300
i26-Healthy1-Aligned-1000	1	aligned	1000
i27-Healthy1-Aligned-2000	1	aligned	2000
i28-Healthy1-Aligned-4000	1	aligned	4000
i33-Healthy2-Aligned-300	2	aligned	300
i34-Healthy2-Aligned-1000	2	aligned	1000
i35-Healthy2-Aligned-2000	2	aligned	2000
i36-Healthy2-Aligned-4000	2	aligned	4000
i41-Healthy3-Aligned-300	3	aligned	300
i42-Healthy3-Aligned-1000	3	aligned	1000

Q1: Having read the Deseq2 manual I think that in order to undertake the multiple comparisons, I need to perform the LRT analysis, is this correct?

With this in mind I have undertaken the following:

dds_multi<- DESeqDataSetFromMatrix(countData =healthy, colData=samples, design=~donor+anisotrophy+diameter+donor:anisotrophy+donor:diameter+anisotrophy:diameter)
keep <- rowSums(counts(dds_multi)) >= 10
dds <- dds_multi[keep,]
dds_multi <- DESeq(dds_multi, test="LRT", reduced=~donor + anisotrophy + diameter)

This gives me the following results:

resultsNames(dds_multi)

[1] "Intercept"                      "donor_2_vs_1"                  
 [3] "donor_3_vs_1"                   "anisotrophy_random_vs_aligned" 
 [5] "diameter_2000_vs_1000"          "diameter_300_vs_1000"          
 [7] "diameter_4000_vs_1000"          "donor2.anisotrophyrandom"      
 [9] "donor3.anisotrophyrandom"       "donor2.diameter2000"           
[11] "donor3.diameter2000"            "donor2.diameter300"            
[13] "donor3.diameter300"             "donor2.diameter4000"           
[15] "donor3.diameter4000"            "anisotrophyrandom.diameter2000"
[17] "anisotrophyrandom.diameter300"  "anisotrophyrandom.diameter4000"

I then report each comparison in turn until I get results for each of the 56 possible comparisons, for example:

res1<- results(dds_multi, alpha = 0.05, name = "diameter_2000_vs_1000", test="Wald") or 
res2<- results(dds_multi, alpha = 0.05, contrast=list(c("diameter_2000_vs_1000","anisotrophyrandom.diameter2000")),  test="Wald")

Q2: Is this the correct way to do this?

thanks for any help that can be offered,

Mat

deseq • 1.3k views

ADD COMMENT • link updated 5.6 years ago by Michael Love 41k • written 5.6 years ago by Matthew.baldwin • 0

score 0 · Answer 1 · 2018-08-29

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 2 minutes ago

United States

I’m not sure doing all pairwise comparisons of 8 levels is the best way to approach this (and the interaction model doesn’t seem to be what you want I’d guess). What is your end goal with this analysis?

ADD COMMENT • link 5.6 years ago Michael Love 41k

0

Entering edit mode

Thank Michael,

Grateful for your help. The goal for the analysis are several fold:

1) Which factor (anisotrophy or diameter) produces the greatest change at an RNA level for tendon cells

2) Which genes are influenced by different anisotrophy (random vs aligned)

3) Which genes are influences by different diameter (300, 1000, 2000, 4000nm). And is the effect of diameter further modified by the scaffolds overall alignment (e.g. 300 random vs 300 random)

thanks,

Mat

ADD REPLY • link 5.6 years ago Matthew.baldwin • 0

0

Entering edit mode

Did you consider modeling diameter as a numeric? Do you have any idea how expression will change over diameter?

ADD REPLY • link 5.6 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

It would be fine to model as a diameter as a numeric e.g. 0.3, 1,2,4. We believe that the greatest difference will be for the smallest i.e. 0.3um and largest diameters i.e. 4um

ADD REPLY • link 5.6 years ago Matthew.baldwin • 0

0

Entering edit mode

So one option is ~donor + condition + diameter, with diameter as a numeric. This assumes no interaction between condition and diameter. Another option is ~donor + condition + condition:diameter, where you will get different diameter slopes for the two conditions.

ADD REPLY • link 5.6 years ago Michael Love 41k

0

Entering edit mode

Thanks Michael.

I have thought further and noted in the vignette about grouping interactions. I have refined my 3 questions:

1) For scaffolds with aligned fibre orientation, what is the effect of fibre diameter ?( i.e. 300Aligned vs 1000Aligned, 300Aligned vs 2000Aligned and 300Aligned vs 4000Aligned)

2) For scaffolds with random fibre orientation, what is effect of fibre diameter ? (i.e. 300random vs 1000random, 300random vs 2000random and 300random vs 4000random)

3) For a fixed fibre diameter, what effect does loss of scaffold alignment have? (i.e 300random vs 300 aligned, 1000random vs 1000aligned, 2000random vs 2000aligned, 4000random vs 4000aligned).

As such I have simplified the design to:

dds_disease<- DESeqDataSetFromMatrix(countData =disease, colData=samples_disease, design=~donor+anisotrophy+diameter+anisotrophy:diameter)
keep <- rowSums(counts(dds_disease)) >= 10
dds_disease <- dds_disease[keep,]
dds_disease$group <- factor(paste0(dds_disease$anisotrophy, dds_disease$diameter))
dds_disease$group <- relevel(dds_disease$group, ref = "aligned300")
design(dds_disease) <- ~ group
dds_disease <- DESeq(dds_disease)
resultsNames(dds_disease)

This produces the following comparisons:

[1] "Intercept" "group_aligned300_vs_random300" "group_aligned1000_vs_random300" "group_aligned2000_vs_random300" "group_aligned4000_vs_random300" "group_random1000_vs_random300" "group_random2000_vs_random300" "group_random4000_vs_random300"

This will allow me to answer question 1. For example:

res_disease_LFC <- lfcShrink(dds_disease, coef="group_aligned4000_vs_aligned300", type="apeglm")

However, since I cannot use the contrast argument under the lfcShrink function, to generate the comparisons I want I think I need to relevel and then rerun the Deseq2 function. Is this permitted? i.e.

dds_disease$group <- relevel(dds_disease$group, ref = "random300")
dds_disease <- DESeq(dds_disease)
resultsNames(dds_disease)

This then generates the below comparisons, which I can use to answer question 2:

"Intercept"                      "group_aligned300_vs_random300"  "group_aligned1000_vs_random300" "group_aligned2000_vs_random300"  "group_aligned4000_vs_random300" "group_random1000_vs_random300"  "group_random2000_vs_random300"  "group_random4000_vs_random300"

thanks again for all the help. Very much appreciated,

Mat

ADD REPLY • link 5.5 years ago Matthew.baldwin • 0

0

Entering edit mode

hi Mat,

Looks good. Yes, to use apeglm for LFC shrinkage, you need to relevel, but you only need to rerun nbinomWaldTest() actually, which should be much faster. So you don't need to rerun dispersion estimation.

ADD REPLY • link 5.5 years ago Michael Love 41k