edgeR design matrix for multiple factors
1
0
Entering edit mode
Debayan ▴ 20
@debayan-24308
Last seen 16 months ago
Spain

Hi Everyone, I apologize if my question is very basic, I am kind of new to Linear models, though I have been trying to familiarize myself with it lately. I have the following setup : 4 treatments , 3 replicates each in 4 different time points. My aim is to get for each Treatment ( AK,A6,A1 ) comparisons between Timepoints ( 7 vs. 3, 14 vs. 3 and 21 vs. 3). But these comparisons should have a reference level of AN at Timepoint 3. Thus, I want for AK : 7 vs. 3, 14 vs. 3 and 21 vs. 3, and the same for A6 and A1, but all of these with AN 3 as a reference. Following this tutorial : https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf Section 3.3.1 . I am showing R code for the design .

dd1$TimePoint=relevel(as.factor(dd1$TimePoint),ref="3")
dd1$Treatment=relevel(as.factor(dd1$Treatment),ref="AN")

dd1$group=paste0(dd1$TimePoint,dd1\$Treatment)

dd1
SampleID TimePoint Treatment Replicate group
1    AN0301         3        AN         1   3AN
2    AN0302         3        AN         2   3AN
3    AN0303         3        AN         3   3AN
4    AN0701         7        AN         1   7AN
5    AN0702         7        AN         2   7AN
6    AN0703         7        AN         3   7AN
7    AN1401        14        AN         1  14AN
8    AN1402        14        AN         2  14AN
9    AN1403        14        AN         3  14AN
10   AN2101        21        AN         1  21AN
11   AN2102        21        AN         2  21AN
12   AN2103        21        AN         3  21AN
13   AK0301         3        AK         1   3AK
14   AK0302         3        AK         2   3AK
15   AK0303         3        AK         3   3AK
16   AK0701         7        AK         1   7AK
17   AK0702         7        AK         2   7AK
18   AK0703         7        AK         3   7AK
19   AK1401        14        AK         1  14AK
20   AK1402        14        AK         2  14AK
21   AK1403        14        AK         3  14AK
22   AK2101        21        AK         1  21AK
23   AK2102        21        AK         2  21AK
24   AK2103        21        AK         3  21AK
25   A60301         3        A6         1   3A6
26   A60302         3        A6         2   3A6
27   A60303         3        A6         3   3A6
28   A60701         7        A6         1   7A6
29   A60702         7        A6         2   7A6
30   A60703         7        A6         3   7A6
31   A61401        14        A6         1  14A6
32   A61402        14        A6         2  14A6
33   A61403        14        A6         3  14A6
34   A62101        21        A6         1  21A6
35   A62102        21        A6         2  21A6
36   A62103        21        A6         3  21A6
37   A10301         3        A1         1   3A1
38   A10302         3        A1         2   3A1
39   A10303         3        A1         3   3A1
40   A10701         7        A1         1   7A1
41   A10702         7        A1         2   7A1
42   A10703         7        A1         3   7A1
43   A11401        14        A1         1  14A1
44   A11402        14        A1         2  14A1
45   A11403        14        A1         3  14A1
46   A12101        21        A1         1  21A1
47   A12102        21        A1         2  21A1
48   A12103        21        A1         3  21A1

mm1=model.matrix(~ 0+group,data=dd1)

colnames(mm1)
[1] "group14A1" "group14A6" "group14AK" "group14AN" "group21A1" "group21A6" "group21AK" "group21AN" "group3A1"
[10] "group3A6"  "group3AK"  "group3AN"  "group7A1"  "group7A6"  "group7AK"  "group7AN"

sessionInfo( )
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /nfs/software/as/el7.2/EasyBuild/CRG/software/OpenBLAS/0.3.9-GCC-9.3.0/lib/libopenblas_sandybridgep-r0.3.9.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] DESeq2_1.30.0               csaw_1.24.3
[3] SummarizedExperiment_1.20.0 Biobase_2.50.0
[5] MatrixGenerics_1.2.1        matrixStats_0.58.0
[7] GenomicRanges_1.42.0        GenomeInfoDb_1.26.2
[9] IRanges_2.24.1              S4Vectors_0.28.1
[11] BiocGenerics_0.36.0         edgeR_3.32.1
[13] limma_3.46.0

loaded via a namespace (and not attached):
[1] httr_1.4.1               bit64_0.9-7              splines_4.0.0
[7] blob_1.2.1               GenomeInfoDbData_1.2.4   Rsamtools_2.6.0
[10] progress_1.2.2           pillar_1.4.3             RSQLite_2.2.0
[13] lattice_0.20-41          glue_1.4.0               digest_0.6.25
[16] RColorBrewer_1.1-2       XVector_0.30.0           colorspace_1.4-1
[19] Matrix_1.2-18            XML_3.99-0.3             pkgconfig_2.0.3
[22] biomaRt_2.46.2           genefilter_1.72.1        zlibbioc_1.36.0
[25] purrr_0.3.4              xtable_1.8-4             scales_1.1.0
[28] BiocParallel_1.24.1      tibble_3.0.1             openssl_1.4.1
[31] annotate_1.68.0          ggplot2_3.3.0            ellipsis_0.3.0
[34] GenomicFeatures_1.42.1   survival_3.1-12          magrittr_1.5
[37] crayon_1.3.4             memoise_1.1.0            xml2_1.3.2
[40] tools_4.0.0              prettyunits_1.1.1        hms_0.5.3
[43] lifecycle_0.2.0          stringr_1.4.0            munsell_0.5.0
[46] locfit_1.5-9.4           DelayedArray_0.16.1      AnnotationDbi_1.52.0
[49] Biostrings_2.58.0        compiler_4.0.0           rlang_0.4.5
[52] grid_4.0.0               RCurl_1.98-1.2           rappdirs_0.3.1
[55] bitops_1.0-6             gtable_0.3.0             DBI_1.1.0
[58] curl_4.3                 R6_2.4.1                 GenomicAlignments_1.26.0
[61] dplyr_0.8.5              rtracklayer_1.50.0       bit_1.1-15.2
[64] stringi_1.4.6            Rcpp_1.0.4.6             vctrs_0.2.4
[67] geneplotter_1.68.0       dbplyr_1.4.3             tidyselect_1.0.0

Now my idea is to use contrasts like group7AK-group3AK, group14AK-group3AK and so on. My question is : Is this setup correct ? Would it actually show changes with respect to the reference level of group3AN ? Also since these are time points, is it ok to use TimePoint as a factor ?

Thank you very much, Debayan

MutlipleFactors DesignMatrix edgeR • 1.0k views
1
Entering edit mode
@gordon-smyth
Last seen 11 hours ago
WEHI, Melbourne, Australia

Is this setup correct?

Looks fine.

is it ok to use TimePoint as a factor ?

Yes, certainly.

these comparisons should have a reference level of AN at Condition 3.

No, you are making comparisons directly. The concept of a reference level doesn't arise.

0
Entering edit mode

Hi Gordon, Thank you very much for the clarification. If I created the model matrix with an intercept, i.e. mm1=model.matrix(~ group,data=dd1) , would my comparisons like group7AK-group3AK be with reference to the reference level of group ( lets say group3AN )? I mean the value of the intercept would be the mean of the reference level of the group ? Thanks again, Debayan

1
Entering edit mode

There is no such thing as making a comparison relative to a reference level. I think you might be misinterpreting something you've read. The comparison of 7AK vs 3AK is just that, and will be same regardless of whether there is an intercept in the model or not.

0
Entering edit mode

Hi Gordon, Thanks, I understand now the confusion I was having. Thanks again, Debayan