Question: voomWithQualityWeights, design and weights
1
gravatar for anne.biton
19 months ago by
anne.biton10
United States
anne.biton10 wrote:

Hello,

I am using voom to perform differential expression analysis in an RNA-Seq project comparing 8 WT mice to 7 KO mice. For each mouse i have ~4 brain tissues A, B, C, D and i am interested in the comparison between KO and WT within and across tissues.

There are two sequencing batches, two flow cells in each batch, and within one of these sequencing batches i have another batch associated with the day of RNA extraction. The number of samples from the same condition/batch/tissue varies between 1 and 4.

The exploratory analysis of the data shows that the tissue effect is very strong, the first principal component (PC) captures 40% of the variance and separates tissue A from tissues BCD ; the following PCs (PC2 to PC7) captures signal associated with the annotated batches and/or tissue. There is a lot of variation within tissue related to the batches and some outlier samples within tissue. Only PC9 shows (light) separation of KO vs WT.

Since i have an unbalanced paired design, i chose to use the random effect approach in voom, blocking on the mouse variable. I merged the status (KO or WT) and the tissue into a combined factor (status_tissue) therefore including 8 levels and created the following design matrix including the batch as a covariate.   

design <- model.matrix(~ 0 + status_tissue + batch_extract_RNA, data = annotforde)
dge  <- DGEList(dataused[, rownames(design)], 
                remove.zeros = TRUE)

I then ran voomWithQualityWeights with the code below, hoping it would help tackle the observed heterogeneity, and latter tested for the wanted contrasts using eBayes (robust = TRUE mode).

dge <- calcNormFactors(dge, method = 'TMM')
vv <- voomWithQualityWeights(dge, design, plot = TRUE)
corfit <- duplicateCorrelation(vv, design,
                               block=annotforde[rownames(design),]$mouse)
vv <- voomWithQualityWeights(dge,design,plot=TRUE, 
            block= annotforde[rownames(design),]$mouse,
            correlation=corfit$consensus)
fitvoom <- lmFit(vv, design, 
                  block = annotforde[rownames(design),]$mouse, 
                  correlation = corfit$consensus)

> corfit$consensus
[1] 0.1302782

voomWithQualityWeights plots

 

With voomWithQualityWeights and the current design, I obtain the following number of genes with adjusted p-value below 5%: all tissues: 69, tissue A: 8, tissue B: 15, tissue C: 123, tissue D: 83.

Using the unweighted version of voom, i obtain the following number of genes with adjusted p-value below 5%: all tissues: 37, tissue A: 10, tissue B: 1, tissue C: 5, tissue D: 12.

I see association of the weights with the different batches and tissues (plots below), and also some association with the flow cells. The inclusion in the design of the flow cell decreases the number of genes with p.adj < 0.05 for tissue C by 75% and increases it for tissue D by 50%.

My questions:
* Is the design used as input for voomWithQualityWeights correct or should i rather compute the weights by group for example?
* Is it unusual to get such a big difference between unweighted voom and classical voom ?
* Any advice on the inclusion of the flow cell in the design ? its inclusion has basically no effect on the unweighted voom but huge effect on the weighted voom.

 

Best,
Anne

 

voom weitghs distribution

 

 

 

 

 

 

 

 

 

ADD COMMENTlink modified 19 months ago by Matthew Ritchie750 • written 19 months ago by anne.biton10
Answer: voomWithQualityWeights, design and weights
0
gravatar for Matthew Ritchie
19 months ago by
Australia
Matthew Ritchie750 wrote:

Your approach sounds reasonable to me.

I would typically do an MDS plot over the first few dimensions to see what variables are important in a given data set. If there is good separation between samples based on any of these (tissue, genotype (hopefully yes to both!), flow cell, day of collection (less interesting, but often important)), then I would definitely include them in the model provided there are sufficient degrees of freedom available for their estimation.

If there is further sample heterogeneity, then running voomWithQualityWeights using the design matrix you've arrived at can often help get you more differential expression, as you have observed here.

ADD COMMENTlink written 19 months ago by Matthew Ritchie750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 324 users visited in the last hour