Dear all,
I am hoping you can help the correct test for my experimental design which is as follows: I tested the cells of 3 different donors (rotator cuff tears), on 8 different nano fibre scaffolds (different anisotropy; random vs aligned and different fibre diameters; 300nm,1000nm,2000 and 4000).
I wish to compare the effects of each nano scaffold against all other scaffolds e.g. aligned300 vs aligned 1000 or aligned300 vs random4000, creating 56 possible comparisons.
Do do this I have constructed the following data frame:
samplenames<meta_data$SampleID samples < data.frame((samplenames),donor=as.factor(c(rep("1",4),rep("2",4),rep("3",4))), anisotrophy=as.factor(c(rep("aligned",12), rep("random",12))), diameter=as.factor(rep(c("300","1000","2000","4000"),6)))
X.samplenames. <fctr> 
donor <fctr> 
anisotrophy <fctr> 
diameter <fctr> 


i25Healthy1Aligned300  1  aligned  300  
i26Healthy1Aligned1000  1  aligned  1000  
i27Healthy1Aligned2000  1  aligned  2000  
i28Healthy1Aligned4000  1  aligned  4000  
i33Healthy2Aligned300  2  aligned  300  
i34Healthy2Aligned1000  2  aligned  1000  
i35Healthy2Aligned2000  2  aligned  2000  
i36Healthy2Aligned4000  2  aligned  4000  
i41Healthy3Aligned300  3  aligned  300  
i42Healthy3Aligned1000  3  aligned  1000 



Q1: Having read the Deseq2 manual I think that in order to undertake the multiple comparisons, I need to perform the LRT analysis, is this correct?
With this in mind I have undertaken the following:
dds_multi< DESeqDataSetFromMatrix(countData =healthy, colData=samples, design=~donor+anisotrophy+diameter+donor:anisotrophy+donor:diameter+anisotrophy:diameter) keep < rowSums(counts(dds_multi)) >= 10 dds < dds_multi[keep,] dds_multi < DESeq(dds_multi, test="LRT", reduced=~donor + anisotrophy + diameter)
This gives me the following results:
resultsNames(dds_multi)
[1] "Intercept" "donor_2_vs_1" [3] "donor_3_vs_1" "anisotrophy_random_vs_aligned" [5] "diameter_2000_vs_1000" "diameter_300_vs_1000" [7] "diameter_4000_vs_1000" "donor2.anisotrophyrandom" [9] "donor3.anisotrophyrandom" "donor2.diameter2000" [11] "donor3.diameter2000" "donor2.diameter300" [13] "donor3.diameter300" "donor2.diameter4000" [15] "donor3.diameter4000" "anisotrophyrandom.diameter2000" [17] "anisotrophyrandom.diameter300" "anisotrophyrandom.diameter4000"
I then report each comparison in turn until I get results for each of the 56 possible comparisons, for example:
res1< results(dds_multi, alpha = 0.05, name = "diameter_2000_vs_1000", test="Wald") or res2< results(dds_multi, alpha = 0.05, contrast=list(c("diameter_2000_vs_1000","anisotrophyrandom.diameter2000")), test="Wald")
Q2: Is this the correct way to do this?
thanks for any help that can be offered,
Mat
Thank Michael,
Grateful for your help. The goal for the analysis are several fold:
1) Which factor (anisotrophy or diameter) produces the greatest change at an RNA level for tendon cells
2) Which genes are influenced by different anisotrophy (random vs aligned)
3) Which genes are influences by different diameter (300, 1000, 2000, 4000nm). And is the effect of diameter further modified by the scaffolds overall alignment (e.g. 300 random vs 300 random)
thanks,
Mat
Did you consider modeling diameter as a numeric? Do you have any idea how expression will change over diameter?
Hi Michael,
It would be fine to model as a diameter as a numeric e.g. 0.3, 1,2,4. We believe that the greatest difference will be for the smallest i.e. 0.3um and largest diameters i.e. 4um
So one option is ~donor + condition + diameter, with diameter as a numeric. This assumes no interaction between condition and diameter. Another option is ~donor + condition + condition:diameter, where you will get different diameter slopes for the two conditions.
Thanks Michael.
I have thought further and noted in the vignette about grouping interactions. I have refined my 3 questions:
1) For scaffolds with aligned fibre orientation, what is the effect of fibre diameter ?( i.e. 300Aligned vs 1000Aligned, 300Aligned vs 2000Aligned and 300Aligned vs 4000Aligned)
2) For scaffolds with random fibre orientation, what is effect of fibre diameter ? (i.e. 300random vs 1000random, 300random vs 2000random and 300random vs 4000random)
3) For a fixed fibre diameter, what effect does loss of scaffold alignment have? (i.e 300random vs 300 aligned, 1000random vs 1000aligned, 2000random vs 2000aligned, 4000random vs 4000aligned).
As such I have simplified the design to:
This produces the following comparisons:
[1] "Intercept" "group_aligned300_vs_random300" "group_aligned1000_vs_random300" "group_aligned2000_vs_random300" "group_aligned4000_vs_random300" "group_random1000_vs_random300" "group_random2000_vs_random300" "group_random4000_vs_random300"
This will allow me to answer question 1. For example:
However, since I cannot use the contrast argument under the lfcShrink function, to generate the comparisons I want I think I need to relevel and then rerun the Deseq2 function. Is this permitted? i.e.
This then generates the below comparisons, which I can use to answer question 2:
thanks again for all the help. Very much appreciated,
Mat
hi Mat,
Looks good. Yes, to use apeglm for LFC shrinkage, you need to relevel, but you only need to rerun nbinomWaldTest() actually, which should be much faster. So you don't need to rerun dispersion estimation.