Model matrix for a complex study design using aldex.glm module from ALDEx2 R package
0
0
Entering edit mode
amir • 0
@5d4df873
Last seen 9 months ago
Japan

Hello,

I have a question about the complex study design in the R ALDEx2 package (https://bioconductor.org/packages/devel/bioc/vignettes/ALDEx2/inst/doc/ALDEx2_vignette.html#46_Complex_study_designs_and_the_aldexglm_module). My goal is to compare more than two groups (e.g. two or more treatments vs control). In a simple case with two groups, we use a vector with two groups:

data(selex)
#subset only the last 400 features for efficiency
selex.sub <- selex[1:400,]
conds <- c(rep("NS", 7), rep("S", 7))

Where seven samples are from the "NS" (non-selective) and the other seven are from the "S" (selective) condition. This is clear. But I don't understand why we don't do the same in the complex case with a model matrix:

data(selex)
selex.sub <- selex[1:500, ]
 covariates <- data.frame("A" = sample(0:1, 14, replace = TRUE),
     "B" = c(rep(0, 7), rep(1, 7)),
     "Z" = sample(c(1,2,3), 14, replace=TRUE))
mm <- model.matrix(~ A + Z + B, covariates)

In this case, the model matrix will be the following (I set the random seed to 1):

   (Intercept) A Z B
1            1 0 1 0
2            1 1 2 0
3            1 0 2 0
4            1 0 2 0
5            1 1 2 0
6            1 0 3 0
7            1 0 1 0
8            1 0 3 1
9            1 1 1 1
10           1 1 1 1
11           1 0 1 1
12           1 0 1 1
13           1 0 2 1
14           1 0 1 1
attr(,"assign")
[1] 0 1 2 3

Why do we use the sample() command? Why do we allow Z to take values 2 or 3? Shouldn't we have a matrix where values don't overlap, like:

   (Intercept) A B Z
1            1 1 0 0
2            1 1 0 0
3            1 1 0 0
4            1 1 0 0
5            1 0 1 0
6            1 0 1 0
7            1 0 1 0
8            1 0 1 0
9            1 0 0 1
10           1 0 0 1
11           1 0 0 1
12           1 0 0 1
13           1 0 0 1
14           1 0 0 1
attr(,"assign")
[1] 0 1 2 3

I noticed that if I follow the tutorial, the output of aldex.glm.effect(x) is a list of two elements (A and B). Whereas if I use my matrix of non-overlapping values, aldex.glm.effect(x) gives a list of 3 (A, B, Z). What could be the reason? And what do we use here as a reference group (I mean, what is considered a control group)? It might sound like a simple question, but I'm a beginner at ALDEx2. I would be grateful for any help or insight.

If it helps, here is the information from sessionInfo():

> sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ALDEx2_1.32.0         zCompositions_1.4.0-1 truncnorm_1.0-9       NADA_1.6-2           
[5] survival_3.5-5        MASS_7.3-60          

loaded via a namespace (and not attached):
 [1] utf8_1.2.3                  generics_0.1.3              bitops_1.0-7               
 [4] xml2_1.3.4                  lattice_0.21-8              magrittr_2.0.3             
 [7] grid_4.3.0                  Matrix_1.5-5                GenomeInfoDb_1.36.0        
[10] ggtext_0.1.2                fansi_1.0.4                 scales_1.2.1               
[13] codetools_0.2-19            cli_3.6.1                   crayon_1.5.2               
[16] rlang_1.1.1                 XVector_0.40.0              Biobase_2.60.0             
[19] munsell_0.5.0               splines_4.3.0               DelayedArray_0.26.3        
[22] S4Arrays_1.0.4              tools_4.3.0                 parallel_4.3.0             
[25] BiocParallel_1.34.2         dplyr_1.1.2                 colorspace_2.1-0           
[28] ggplot2_3.4.2               GenomeInfoDbData_1.2.10     SummarizedExperiment_1.30.2
[31] RcppZiggurat_0.1.6          Rfast_2.0.7                 BiocGenerics_0.46.0        
[34] vctrs_0.6.2                 R6_2.5.1                    matrixStats_0.63.0         
[37] stats4_4.3.0                lifecycle_1.0.3             zlibbioc_1.46.0            
[40] S4Vectors_0.38.1            IRanges_2.34.0              pkgconfig_2.0.3            
[43] pillar_1.9.0                gtable_0.3.3                glue_1.6.2                 
[46] Rcpp_1.0.10                 tibble_3.2.1                GenomicRanges_1.52.0       
[49] tidyselect_1.2.0            rstudioapi_0.14             MatrixGenerics_1.12.2      
[52] compiler_4.3.0              RCurl_1.98-1.12             gridtext_0.1.5
ALDEx2 modelmatrix covariates • 492 views
ADD COMMENT

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6