Question: Deseq 2 multiple comparisons (3 groups, 24 samples)
0
14 months ago by
Matthew.baldwin0 wrote:

Dear all,

I am hoping you can help the correct test for my experimental design which is as follows: I tested the cells of 3 different donors (rotator cuff tears), on 8 different nano fibre scaffolds (different anisotropy; random vs aligned and different fibre diameters; 300nm,1000nm,2000 and 4000).

I wish to compare the effects of each nano scaffold against all other scaffolds e.g. aligned300 vs aligned 1000 or aligned300 vs random4000, creating 56 possible comparisons.

Do do this I have constructed the following data frame:

samplenames<-meta_dataSampleID samples <- data.frame((samplenames),donor=as.factor(c(rep("1",4),rep("2",4),rep("3",4))), anisotrophy=as.factor(c(rep("aligned",12), rep("random",12))), diameter=as.factor(rep(c("300","1000","2000","4000"),6)))​ X.samplenames. <fctr> donor <fctr> anisotrophy <fctr> diameter <fctr> i25-Healthy1-Aligned-300 1 aligned 300 i26-Healthy1-Aligned-1000 1 aligned 1000 i27-Healthy1-Aligned-2000 1 aligned 2000 i28-Healthy1-Aligned-4000 1 aligned 4000 i33-Healthy2-Aligned-300 2 aligned 300 i34-Healthy2-Aligned-1000 2 aligned 1000 i35-Healthy2-Aligned-2000 2 aligned 2000 i36-Healthy2-Aligned-4000 2 aligned 4000 i41-Healthy3-Aligned-300 3 aligned 300 i42-Healthy3-Aligned-1000 3 aligned 1000 Q1: Having read the Deseq2 manual I think that in order to undertake the multiple comparisons, I need to perform the LRT analysis, is this correct? With this in mind I have undertaken the following: dds_multi<- DESeqDataSetFromMatrix(countData =healthy, colData=samples, design=~donor+anisotrophy+diameter+donor:anisotrophy+donor:diameter+anisotrophy:diameter) keep <- rowSums(counts(dds_multi)) >= 10 dds <- dds_multi[keep,] dds_multi <- DESeq(dds_multi, test="LRT", reduced=~donor + anisotrophy + diameter)​ This gives me the following results: resultsNames(dds_multi)​ [1] "Intercept" "donor_2_vs_1" [3] "donor_3_vs_1" "anisotrophy_random_vs_aligned" [5] "diameter_2000_vs_1000" "diameter_300_vs_1000" [7] "diameter_4000_vs_1000" "donor2.anisotrophyrandom" [9] "donor3.anisotrophyrandom" "donor2.diameter2000" [11] "donor3.diameter2000" "donor2.diameter300" [13] "donor3.diameter300" "donor2.diameter4000" [15] "donor3.diameter4000" "anisotrophyrandom.diameter2000" [17] "anisotrophyrandom.diameter300" "anisotrophyrandom.diameter4000" I then report each comparison in turn until I get results for each of the 56 possible comparisons, for example: res1<- results(dds_multi, alpha = 0.05, name = "diameter_2000_vs_1000", test="Wald")​ or res2<- results(dds_multi, alpha = 0.05, contrast=list(c("diameter_2000_vs_1000","anisotrophyrandom.diameter2000")), test="Wald")  Q2: Is this the correct way to do this? thanks for any help that can be offered, Mat deseq • 328 views ADD COMMENTlink modified 14 months ago by Michael Love26k • written 14 months ago by Matthew.baldwin0 Answer: Deseq 2 multiple comparisons (3 groups, 24 samples) 0 14 months ago by Michael Love26k United States Michael Love26k wrote: I’m not sure doing all pairwise comparisons of 8 levels is the best way to approach this (and the interaction model doesn’t seem to be what you want I’d guess). What is your end goal with this analysis? ADD COMMENTlink written 14 months ago by Michael Love26k Thank Michael, Grateful for your help. The goal for the analysis are several fold: 1) Which factor (anisotrophy or diameter) produces the greatest change at an RNA level for tendon cells 2) Which genes are influenced by different anisotrophy (random vs aligned) 3) Which genes are influences by different diameter (300, 1000, 2000, 4000nm). And is the effect of diameter further modified by the scaffolds overall alignment (e.g. 300 random vs 300 random) thanks, Mat ADD REPLYlink written 14 months ago by Matthew.baldwin0 Did you consider modeling diameter as a numeric? Do you have any idea how expression will change over diameter? ADD REPLYlink written 14 months ago by Michael Love26k Hi Michael, It would be fine to model as a diameter as a numeric e.g. 0.3, 1,2,4. We believe that the greatest difference will be for the smallest i.e. 0.3um and largest diameters i.e. 4um ADD REPLYlink written 14 months ago by Matthew.baldwin0 So one option is ~donor + condition + diameter, with diameter as a numeric. This assumes no interaction between condition and diameter. Another option is ~donor + condition + condition:diameter, where you will get different diameter slopes for the two conditions. ADD REPLYlink written 14 months ago by Michael Love26k Thanks Michael. I have thought further and noted in the vignette about grouping interactions. I have refined my 3 questions: 1) For scaffolds with aligned fibre orientation, what is the effect of fibre diameter ?( i.e. 300Aligned vs 1000Aligned, 300Aligned vs 2000Aligned and 300Aligned vs 4000Aligned) 2) For scaffolds with random fibre orientation, what is effect of fibre diameter ? (i.e. 300random vs 1000random, 300random vs 2000random and 300random vs 4000random) 3) For a fixed fibre diameter, what effect does loss of scaffold alignment have? (i.e 300random vs 300 aligned, 1000random vs 1000aligned, 2000random vs 2000aligned, 4000random vs 4000aligned). As such I have simplified the design to: dds_disease<- DESeqDataSetFromMatrix(countData =disease, colData=samples_disease, design=~donor+anisotrophy+diameter+anisotrophy:diameter) keep <- rowSums(counts(dds_disease)) >= 10 dds_disease <- dds_disease[keep,] dds_diseasegroup <- factor(paste0(dds_disease$anisotrophy, dds_disease$diameter))
dds_disease$group <- relevel(dds_disease$group, ref = "aligned300")
design(dds_disease) <- ~ group
dds_disease <- DESeq(dds_disease)
resultsNames(dds_disease)

This produces the following comparisons:

[1] "Intercept"                      "group_aligned300_vs_random300"  "group_aligned1000_vs_random300" "group_aligned2000_vs_random300" "group_aligned4000_vs_random300" "group_random1000_vs_random300"  "group_random2000_vs_random300"  "group_random4000_vs_random300"

This will allow me to answer question 1. For example:

res_disease_LFC <- lfcShrink(dds_disease, coef="group_aligned4000_vs_aligned300", type="apeglm")

However, since I cannot use the contrast argument under the lfcShrink function, to generate the comparisons I want I think I need to relevel and then rerun the Deseq2 function. Is this permitted? i.e.

dds_disease$group <- relevel(dds_disease$group, ref = "random300")
dds_disease <- DESeq(dds_disease)
resultsNames(dds_disease)

This then generates the below comparisons, which I can use to answer question 2:

"Intercept"                      "group_aligned300_vs_random300"  "group_aligned1000_vs_random300" "group_aligned2000_vs_random300"  "group_aligned4000_vs_random300" "group_random1000_vs_random300"  "group_random2000_vs_random300"  "group_random4000_vs_random300"

thanks again for all the help. Very much appreciated,

Mat

hi Mat,

Looks good. Yes, to use apeglm for LFC shrinkage, you need to relevel, but you only need to rerun nbinomWaldTest() actually, which should be much faster. So you don't need to rerun dispersion estimation.