Multi-level design RRBS analysis with voomLmFit
1
1
Entering edit mode
@altintasali-11054
Last seen 8 days ago
Denmark

Hi all,

I have a multi-level design for an RRBS experiment, which is identical to the multi-level design mentioned in Section 9.7 of limma's User Guide.

Experimental design

# Experimental design
meta <- data.frame(library = as.factor(1:12),
                   patient = as.factor(rep(1:6, each = 2)),
                   disease = rep(c("healthy", "sick"), each = 6),
                   tissue = rep(c("liver","muscle"), times = 6))
meta$group <- paste(meta$disease, meta$tissue, sep = ".")

#  library patient disease tissue          group
#        1       1 healthy  liver  healthy.liver
#        2       1 healthy muscle healthy.muscle
#        3       2 healthy  liver  healthy.liver
#        4       2 healthy muscle healthy.muscle
#        5       3 healthy  liver  healthy.liver
#        6       3 healthy muscle healthy.muscle
#        7       4    sick  liver     sick.liver
#        8       4    sick muscle    sick.muscle
#        9       5    sick  liver     sick.liver
#       10       5    sick muscle    sick.muscle
#       11       6    sick  liver     sick.liver
#       12       6    sick muscle    sick.muscle

I was considering using voomLmFit (or alternatively voom together with duplicateCorrelation) in order to make comparisons across tissues and disease groups while blocking for patient IDs.

(I must admit that I got the idea of using voom from the DMRcate package as it internally uses voom).

Question 1: Do you think if it is ok to use voomLmFit (or voom together with duplicateCorrelation) for multi-level design RRBS analysis?

Well if the answer is no, then my code and questions below do not make sense, but let me give it a try anyways...

Toy dataset

Here is the toy data set for the RRBS experiment. Please note that _Me and _Un stands for "methylated" and "unmethylated" counts, respectively.

set.seed(123)
counts <- matrix(data = round(runif(n = 24000, min = 0, max = 100)),
                 ncol = nrow(meta)*2)
cols <- expand.grid(c("Me","Un"), meta$library)
colnames(counts) <- paste(cols[,2], cols[,1], sep = "_")
rownames(counts) <- paste("CpG", 1:nrow(counts), sep = "_")
head(counts)

Differentially methylated CpGs

Below is how I set up the parameters for voomLmFit.

library(edgeR)
design <- modelMatrixMeth(~0 + group, meta)
colnames(design) <- gsub("group", "", colnames(design))

v <- voomLmFit(counts = counts, 
               design = design, 
               block = rep(meta$patient, each = 2),
               sample.weights = TRUE)

Please not that I needed to use block = rep(meta$patient, each = 2) to match the blocking factor (meta$patient) for the count matrix (count). Also, I have used sample.weights = TRUE, which should be equivalent of using voomWithQualityWeights.

Question 2: Do the parameters make sense for voomLmFit especially considering that I needed to use block = rep(meta$patient, each = 2).

voomLmfit step is followed by contrast.fit and eBayes afterwards:

contrasts <- makeContrasts(
    healthyVSsick_liver = healthy.liver - sick.liver, 
    healthyVSsick_muscle = healthy.muscle - sick.muscle, 
    liverVSmuscle_healthy = healthy.liver - healthy.muscle,
    liverVSmuscle_sick = sick.liver - sick.muscle, 
    levels = design)
colnames(contrasts)

DMRs <- list()
for (i in colnames(contrasts)){
  fit <- contrasts.fit(v, contrasts = contrasts[,i])  
  fit <- eBayes(fit)
  DMRs[[i]] <- topTable(fit, 
                        coef = colnames(contrasts[,i]))
}
DMRs

Thank you so much for your time and insights in advance.

edgeR voomLmFit limma duplicateCorrelation voom • 1.9k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

I don't see any problems with the approach you've taken. Yes, you can use voomLmFit instead of edgeR in this way. Yes, you can use duplicateCorrelation after expanding the blocking variable for the Me/Un structure of the data.

Later

As discussed later (Zero intra-block correlation in RRBS analysis using voomLmFit with block), the "Poisson-trick" approach taken by edgeR and limma for methylation data includes a conditioning term for each sample, which prevents the use of duplicateCorrelation or random blocks. The code you show will run but the intra-block correlation will be set to zero.

ADD COMMENT
0
Entering edit mode

Dear Gordon,

Thank you so much for your invaluable feedback. Now, I am more confident using voomLmFit for RRBS data.

Warmest regards!

ADD REPLY

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6