edgeR & blocking with complex experimental design
1
0
Entering edit mode
Molly • 0
@2cc0015d
Last seen 7 days ago
United States

Hi support team,

I'm an RNA-seq & edgeR beginner trying to wrap my head around whether/how to use blocking in my complex experimental setup. Thanks for taking a look! This post is a doozy so I've put the main questions in bold.

I am studying the petals of two flower species that have contrasting petal morphologies. For my RNA-seq project, I collected petals from both species at four developmental stages, and dissected each petal into proximal and distal halves. I collected five replicates each, so I have 80 samples total (2 species, 4 stages, 2 halves, 5 replicates). Each proximal replicate consists of all proximal petal-halves from a single flower, and has a corresponding replicate containing the distal halves of the same petals (ie, each proximal-distal sample pair came from the same flower, like samples 1 & 21 in the table below):

> sample.info
species tissue stage
1        A   prox     1
2        A   prox     1
3        A   prox     1
4        A   prox     1
5        A   prox     1
6        A   prox     2
7        A   prox     2
8        A   prox     2
9        A   prox     2
10       A   prox     2
11       A   prox     3
12       A   prox     3
13       A   prox     3
14       A   prox     3
15       A   prox     3
16       A   prox     4
17       A   prox     4
18       A   prox     4
19       A   prox     4
20       A   prox     4
21       A   dist     1
22       A   dist     1
23       A   dist     1
24       A   dist     1
25       A   dist     1
26       A   dist     2
27       A   dist     2
28       A   dist     2
29       A   dist     2
30       A   dist     2
31       A   dist     3
32       A   dist     3
33       A   dist     3
34       A   dist     3
35       A   dist     3
36       A   dist     4
37       A   dist     4
38       A   dist     4
39       A   dist     4
40       A   dist     4
41       B   prox     1
42       B   prox     1
43       B   prox     1
44       B   prox     1
45       B   prox     1
46       B   prox     2
47       B   prox     2
48       B   prox     2
49       B   prox     2
50       B   prox     2
51       B   prox     3
52       B   prox     3
53       B   prox     3
54       B   prox     3
55       B   prox     3
56       B   prox     4
57       B   prox     4
58       B   prox     4
59       B   prox     4
60       B   prox     4
61       B   dist     1
62       B   dist     1
63       B   dist     1
64       B   dist     1
65       B   dist     1
66       B   dist     2
67       B   dist     2
68       B   dist     2
69       B   dist     2
70       B   dist     2
71       B   dist     3
72       B   dist     3
73       B   dist     3
74       B   dist     3
75       B   dist     3
76       B   dist     4
77       B   dist     4
78       B   dist     4
79       B   dist     4
80       B   dist     4


I've been using exact tests for pairwise comparisons between proximal & distal tissue within a stage and species, or between species A & B at the same stage within proximal tissue, for example. One big question I have is whether I need to be treating the prox/dist samples that come from the same flower as pairs, like in the edgeR User's Guide examples where control and treatment samples come from the same patient. It's not quite the same scenario because they are different tissue types and I can't consider one a control, but any advice you have on that front would be appreciated!

I'd also like to do some broader DE analyses such as comparing the proximal half with the distal half across all developmental stages of one species, or comparing the entire stage 1 petal (prox and dist) between species, for example. From the edgeR User's Guide, I think I have three options here assuming I don't have to account for sample pairs (let's use the first example of the entire prox half vs entire distal half):

• group all proximal tissue together and all distal tissue together and do a pairwise comparison with an exact test
• use a GLM approach & group samples by stage and tissue type (8 groups for one species), set up multiple contrasts between prox & dist at different stages, and find DE genes that are common between these contrasts.
• compare prox vs. dist using stage as a blocking factor

Do you have advice as to which of the three (or something else entirely) would be most appropriate for this experimental setup? I have tried each and gotten very different results:

• For the first case using an exact test, around 70 genes were DE between prox & dist. They are all super robustly DE btw prox and dist at all developmental stages
• For the GLM approach, I used the red/white furry/smooth example on p.32 of the User's Guide as inspiration, but my setup is much more complex and results in 16 contrasts to cover each pair of stage/tissue type (stage1 prox vs stage 1 dist, stage 1 prox vs stage 2 dist, etc). There are no genes in common across all 16 contrasts; across the four contrasts where the stages matched (ie stage 3 prox vs stage 3 dist), there are only 5 genes in common.
• When I set up the analysis with blocking, there are ~2500 genes DE between proximal and distal sides. I still don't entirely understand what blocking is doing and what that DE list means - I know it's not a compilation of all genes DE between prox and dist at any stage, right? Any illumination you could provide would be amazing! (or, maybe this isn't a case where blocking is appropriate, in which case I'm off the hook :)

Thank you for taking the time to read this lengthy post, I really appreciate it!

Molly

Here is the session info if you need it:

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.5     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3     tibble_3.1.1    ggplot2_3.3.3
[9] tidyverse_1.3.1 edgeR_3.30.3    limma_3.44.3

loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 locfit_1.5-9.4   splines_4.0.0    haven_2.4.1      lattice_0.20-44  rhdf5_2.32.4     colorspace_2.0-1
[8] vctrs_0.3.8      generics_0.1.0   utf8_1.2.1       rlang_0.4.11     pillar_1.6.0     glue_1.4.2       withr_2.4.2
[15] DBI_1.1.1        dbplyr_2.1.1     modelr_0.1.8     readxl_1.3.1     lifecycle_1.0.0  munsell_0.5.0    gtable_0.3.0
[22] cellranger_1.1.0 rvest_1.0.0      fansi_0.4.2      broom_0.7.6      Rcpp_1.0.6       scales_1.1.1     backports_1.2.1
[29] jsonlite_1.7.2   fs_1.5.0         hms_1.0.0        stringi_1.5.3    grid_4.0.0       cli_2.5.0        tools_4.0.0
[36] magrittr_2.0.1   crayon_1.4.1     pkgconfig_2.0.3  ellipsis_0.3.2   xml2_1.3.2       reprex_2.0.0     lubridate_1.7.10
[43] assertthat_0.2.1 httr_1.4.2       rstudioapi_0.13  Rhdf5lib_1.10.1  R6_2.5.0         compiler_4.0.0

edgeR blocking • 96 views
1
Entering edit mode
Yunshun Chen ▴ 750
@yunshun-chen-5451
Last seen 2 days ago
Australia

I wouldn't recommend the exact test as it can't handle complex designs like the one you have.

Your second approach is equivalent to fitting an interactive model whereas your third approach is to fit an additive model. The additive model assumes the differences between dist and prox are consistent across all stages. You may want to explore your data (eg. using an MDS plot) to see if such assumption is valid prior to your model selection.