edgeR: DEGs in a CRD factorial (2x2x2)
1
0
Entering edit mode
tsl0026 • 0
@16f5d2f8
Last seen 9 weeks ago
United States

Hey guys, I need some help regarding using edgeR in CRD w/ factorial arrangement experiments. The main problem here is to fit the group/design into the linear model.

To briefly present the trial design:

Main factors: 3 -> Bird type (CM or S), Age (01 or 14), vaccination status (NO or YES). That gave me a 2 x 2 x 2 factorial arrangement.

Objective: Compare the differences in gene expression in each organ (HG or TC) within each timepoint (24 or 48). For that, analyze main factors interactions or if they act together on each gene in each organ/timepoint.

edgeR approach used: Experiments with all combinations of multiple factors

The groups:

[1] CM.14.NO  CM.14.NO  CM.14.NO  CM.14.YES CM.14.YES CM.14.YES CM.14.YES CM.01.NO  CM.01.NO 
[10] CM.01.NO  CM.01.YES CM.01.YES CM.01.YES CM.01.YES S.14.NO   S.14.NO   S.14.NO   S.14.NO  
[19] S.14.YES  S.14.YES  S.14.YES  S.14.YES  S.01.NO   S.01.NO   S.01.NO   S.01.YES  S.01.YES 
[28] S.01.YES  S.01.YES  S.01.YES 
Levels: CM.01.NO CM.01.YES CM.14.NO CM.14.YES S.01.NO S.01.YES S.14.NO S.14.YES

The design matrix:

[1] "(Intercept)", "temp.samples$TYPES"                                        
[3] "temp.samples$AGE14", "temp.samples$VACCYES"                                      
[5] "temp.samples$TYPES_temp.samples$AGE14" "temp.samples$TYPES_temp.samples$VACCYES"                   
[7] "temp.samples$AGE14_temp.samples$VACCYES" "temp.samples$TYPES_temp.samples$AGE14_temp.samples$VACCYES"

Code:

group <- factor(paste(temp.samples$TYPE, temp.samples$AGE, temp.samples$VACC, sep = "." ))
design <- model.matrix(~ temp.samples$TYPE*temp.samples$AGE*temp.samples$VACC, data=temp.samples) 
colnames(design) <- gsub(":", "_", colnames(design))

# Create DGEList and fit it in the model
temp.y <- DGEList(counts = temp.data, group = group)
temp.keep <- filterByExpr(temp.y) #filter out low expressed genes
temp.y <- temp.y[temp.keep, , keep.lib.sizes = FALSE]
temp.y <- calcNormFactors(temp.y) #normalize
temp.y <- estimateDisp(temp.y, design, robust = TRUE)
temp.y <- estimateGLMCommonDisp(temp.y, design = design)
temp.y <- estimateGLMTrendedDisp(temp.y, design = design)
temp.y <- estimateGLMTagwiseDisp(temp.y, design = design)
fit <- voomLmFit(temp.y, design, temp.samples)
colnames(fit)

# Get all levels of the factor
levels <- levels(temp.y$samples$group)

# Generate all combinations of levels
combinations <- combn(levels, 2)

# Create contrasts for all combinations
contrasts <- apply(combinations, 2, function(x) {
contrast <- paste(x[1], "-", x[2])
if (all(x %in% colnames(design))) {
  # Create contrast matrix
  contr.matrix <- makeContrasts(contrasts = contrast, levels = design)
  } else {
    message(paste("Skipping contrast:", contrast, "because not all levels exist in the design matrix."))
    contr.matrix <- NULL
  }
    contr.matrix
})

#Apply F-Test
lrt <- glmLRT(fit, coef= 8)
> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8  LC_CTYPE=Portuguese_Brazil.utf8    LC_MONETARY=Portuguese_Brazil.utf8
[4] LC_NUMERIC=C                       LC_TIME=Portuguese_Brazil.utf8    

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] UpSetR_1.4.0         eulerr_7.0.0         ggplot2_3.4.4        dplyr_1.1.3          tibble_3.2.1         org.Gg.eg.db_3.18.0 
 [7] AnnotationDbi_1.64.0 IRanges_2.36.0       S4Vectors_0.40.0     Biobase_2.62.0       BiocGenerics_0.48.0  edgeR_4.0.0         
[13] limma_3.58.0        

loaded via a namespace (and not attached):
 [1] utf8_1.2.4              generics_0.1.3          bitops_1.0-7            RSQLite_2.3.1           lattice_0.21-8         
 [6] magrittr_2.0.3          evaluate_0.23           grid_4.3.1              fastmap_1.1.1           blob_1.2.4             
[11] plyr_1.8.9              GenomeInfoDb_1.38.5     DBI_1.2.1               gridExtra_2.3           httr_1.4.7             
[16] fansi_1.0.5             scales_1.3.0            Biostrings_2.70.0       cli_3.6.1               rlang_1.1.1            
[21] crayon_1.5.2            XVector_0.42.0          splines_4.3.1           munsell_0.5.0           bit64_4.0.5            
[26] withr_3.0.0             cachem_1.0.8            tools_4.3.1             memoise_2.0.1           colorspace_2.1-0       
[31] locfit_1.5-9.8          GenomeInfoDbData_1.2.11 vctrs_0.6.4             R6_2.5.1                png_0.1-8              
[36] lifecycle_1.0.4         zlibbioc_1.48.0         KEGGREST_1.42.0         bit_4.0.5               pkgconfig_2.0.3        
[41] gtable_0.3.4            pillar_1.9.0            glue_1.6.2              Rcpp_1.0.11             statmod_1.5.0          
[46] xfun_0.41               tidyselect_1.2.0        highr_0.10              rstudioapi_0.15.0       knitr_1.45             
[51] compiler_4.3.1          RCurl_1.98-1.12
edgeR • 399 views
ADD COMMENT
1
Entering edit mode

The objective that you have stated

Compare the differences in gene expression in each organ (HG or TC) within each timepoint (24 or 48).

doesn't make sense because organ and timepoint are not factors in your experiment.

What is your actual objective?

ADD REPLY
0
Entering edit mode

Hey Gordon,

Organ and timepoint are not factors. The factors are the treatments (bird type, age, and vaccination status).

the objective is to test the interaction between the three main factors (bird type, age, and vaccination status) on each organ, per time point.

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia

Your code is enormously more complicated than the approach recommended in the edgeR User's Guide. You seem to be trying to combine several different mutually exclusive analysis strategies and hence getting into the inevitable pickle.

The correct design matrix is much simpler, all you need is

design <- model.matrix(~group)
colnames(design) <- levels(group)

To create the DGEList and fit the model:

y <- DGEList(counts=data,group=group)
keep <- filterByExpr(y)
y <- y[keep,,keep.lib.sizes=FALSE]
y <- normLibSizes(y)
fit <- glmQLFit(y,design)

That's all that is needed. Now we can test for differential expression (DE). For example, does vaccination status cause DE for each bird type of each age?

Cont.matrix <- makeContrasts(
   Vacc.CM.01 = CM.01.YES - CM.01.NO,
   Vacc.CM.14 = CM.14.YES - CM.14.NO,
   Vacc.S.01  =  S.01.YES -  S.01.NO,
   Vacc.S.14  =  S.14.YES -  S.14.NO,
   levels=design)

Test for vaccination effects in CM birds of age 01:

q <- glmQLFTest(fit, contrasts=Cont.matrix[,"Vacc.CM.01"])
topTags(q)

And so on for the other contrasts.

ADD COMMENT
0
Entering edit mode

In this example that you state gives pairwise comparisons in which is possible to check the effects of only one main factor (example: vaccination status). I already have the pairwise results for each main factor, that its pretty much straightforward to get with edgeR. Right now I want to test the interaction between bird type, age, and vaccination status. It does not seems possible to do it with edgeR.

ADD REPLY
0
Entering edit mode

The group-means design matrix in my example allows you to test the effects of any combination of factors. There is no restriction of any kind.

It is easy to test for interaction using either the design matrix in my example or using a factorial design matrix. The documentation tells you how and there are also answers on this forum that tell you how. For example, you could simply use

design <- model.matrix(~TYPE*AGE*VACC)
fit <- glmQLFit(y, design)
q <- glmQLFTest(fit)
topTags(q)

That's all. There's no need for contrasts. edgeR just picks out and tests the 3-way interaction automatically.

ADD REPLY
0
Entering edit mode

Thank you Gordon !!

It is my first time using both, the package and the forum, so I'm pretty much sure your explanation it will be very useful (besides my common doubt).

ADD REPLY

Login before adding your answer.

Traffic: 458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6