Question: Clustering differential expression results from voomWithQualityWeights
gravatar for hrishi27n
2.5 years ago by
hrishi27n20 wrote:

Hello all,

I am very new to RNA-seq analysis and have a very specific question about using voomWithQualityWeights

For my analysis, I am using the expected counts generated from rsem tool to run my differential expression analysis. 

I am using edgeR for TMM normalization and then voomWithQualityWeights before fitting my model. I have also removed genes with a zero counts from the expected count dataframe. 

Overall I see ~600 genes differentially expressed between my two groups. I want to run hierarchal clustering on the 600 DE genes. It seems that using expected counts values for these 600 genes doesn't really produce two separate clusters, which I assume I should see since these genes are DE. 

I would like to know, what variable(v$E, v$weights) from the voomWithQualityWeights actually goes into the lmFit, so that I could use same variable to run hierarchal clustering. 

Apologies if this post is confusing, the following is my code. 

## DataFrame with expected-counts from rsem. 
CountFrame <- (fread(CountMatrix, header=TRUE,sep=",",data.table=FALSE))
rownames(CountFrame) <- CountFrame[,1]
CountFrame[,1] <- NULL
keep <- CountFrame[rowSums(CountFrame==0) <= 25,] ## I have 86 samples
CountFrame <- keep

## DataFrame describing the grouping along with other covariates(age,sex)
phenoFrame <-read.table(phenoMatrix,header=T,sep="\t")
group <-factor(phenoFrame$CATEGORY)
sex <- phenoFrame$SEX
age <-phenoFrame$AGE

myDGElist <-DGEList(counts=CountFrame,group=group)
myNormalized <- calcNormFactors(myDGElist)
design <- model.matrix(~group)
rownames(design) <- colnames(myNormalized)

## voom-limma with weight
v <- voomWithQualityWeights(myNormalized, design=design, normalization="none", plot=FALSE)
fit <- lmFit(v,design)
fit <- eBayes(fit)
top<-topTable(fit, coef=2, number=nrow(fit), adjust.method="BH","P")

### ~600 significantly genes with adjusted p-value <0.05
limma edger rna rsem • 1.4k views
ADD COMMENTlink modified 2.5 years ago by Gordon Smyth38k • written 2.5 years ago by hrishi27n20

As a side note: you mention that you removed genes with zero counts (although you didn't show in your core). You will probably have to filter out a bit more than that. Make sure that you inspect the voom variance:mean plot that is produced when you call voomWithQualityWeights(...., plot=TRUE).

ADD REPLYlink written 2.5 years ago by Steve Lianoglou12k

updated the code

ADD REPLYlink written 2.5 years ago by hrishi27n20
Answer: Using voomWithQualityWeights for differential expression analysis.
gravatar for Gordon Smyth
2.5 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

Both variables (v$E and v$weights) go into lmFit. You can't reproduce the voom() or lmFit() calculations in a hierarchical clustering algorithm.

Are you trying to cluster genes or samples? There really is no requirement for the genes to cluster into two clusters. It is the samples that should separate, but even that might not be clear because hierarchical clustering can't downweight poorer quality samples like voomWithQualityWeights() does.

I will assume that you are trying to make heatmap. The simplest way way to make a heatmap is to pass v$E to heatmap.2(), demonstrated in the limma workflow example:

My favourite method though would be like this. This makes a heatmap of the 100 most DE genes:

logCPM <- cpm(MyNormalized, log=TRUE, prior.count=3)
o <- order(fit$p.value[,2])
coolmap( logCPM[o[1:100],] )

You have to understand though that voomWithQualityWeights() is designed to handle outlier samples, and outlier samples may not cluster nicely in the heatmap. If all the samples separated beautifully according to simple clustering algorithm, then you probably wouldn't need to downweight outlier samples, would you?

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Gordon Smyth38k

Gordon thank you very much for your response. I would like to cluster the samples by using just the differentially expressed genes(I tried using all genes but saw no clusters). I tried doing PCA, hierarchal clustering and even MDS plots on the samples using the rsem values but don't really see to separation. Is it possible to have DGE without samples clustering into two separate groups? 

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by hrishi27n20

Yes, it is possible. I've added more detail to my answer above by way of explanation.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Gordon Smyth38k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 215 users visited in the last hour