Question

Clustering differential expression results from voomWithQualityWeights

0

Entering edit mode

hrishi27n ▴ 20

@hrishi27n-11821

Last seen 2.6 years ago

United States

Hello all,

I am very new to RNA-seq analysis and have a very specific question about using voomWithQualityWeights

For my analysis, I am using the expected counts generated from rsem tool to run my differential expression analysis.

I am using edgeR for TMM normalization and then voomWithQualityWeights before fitting my model. I have also removed genes with a zero counts from the expected count dataframe.

Overall I see ~600 genes differentially expressed between my two groups. I want to run hierarchal clustering on the 600 DE genes. It seems that using expected counts values for these 600 genes doesn't really produce two separate clusters, which I assume I should see since these genes are DE.

I would like to know, what variable(v$E, v$weights) from the voomWithQualityWeights actually goes into the lmFit, so that I could use same variable to run hierarchal clustering.

Apologies if this post is confusing, the following is my code.

## DataFrame with expected-counts from rsem. 
CountFrame <- (fread(CountMatrix, header=TRUE,sep=",",data.table=FALSE))
rownames(CountFrame) <- CountFrame[,1]
CountFrame[,1] <- NULL
keep <- CountFrame[rowSums(CountFrame==0) <= 25,] ## I have 86 samples
CountFrame <- keep

## DataFrame describing the grouping along with other covariates(age,sex)
phenoFrame <-read.table(phenoMatrix,header=T,sep="\t")
group <-factor(phenoFrame$CATEGORY)
sex <- phenoFrame$SEX
age <-phenoFrame$AGE

##edgeR
myDGElist <-DGEList(counts=CountFrame,group=group)
myNormalized <- calcNormFactors(myDGElist)
design <- model.matrix(~group)
rownames(design) <- colnames(myNormalized)

## voom-limma with weight
v <- voomWithQualityWeights(myNormalized, design=design, normalization="none", plot=FALSE)
fit <- lmFit(v,design)
fit <- eBayes(fit)
top<-topTable(fit, coef=2, number=nrow(fit), adjust.method="BH", sort.by="P")

### ~600 significantly genes with adjusted p-value <0.05

RNA limma rsem edgeR • 3.8k views

ADD COMMENT • link updated 7.0 years ago by Gordon Smyth 50k • written 7.0 years ago by hrishi27n ▴ 20

0

Entering edit mode

As a side note: you mention that you removed genes with zero counts (although you didn't show in your core). You will probably have to filter out a bit more than that. Make sure that you inspect the voom variance:mean plot that is produced when you call voomWithQualityWeights(...., plot=TRUE).

ADD REPLY • link 7.0 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

updated the code

ADD REPLY • link 7.0 years ago hrishi27n ▴ 20

score 6 · Accepted Answer · 2017-04-14

Both variables (v$E and v$weights) go into lmFit. You can't reproduce the voom() or lmFit() calculations in a hierarchical clustering algorithm.

Are you trying to cluster genes or samples? There really is no requirement for the genes to cluster into two clusters. It is the samples that should separate, but even that might not be clear because hierarchical clustering can't downweight poorer quality samples like voomWithQualityWeights() does.

I will assume that you are trying to make heatmap. The simplest way way to make a heatmap is to pass v$E to heatmap.2(), demonstrated in the limma workflow example:

https://f1000research.com/articles/5-1408

My favourite method though would be like this. This makes a heatmap of the 100 most DE genes:

logCPM <- cpm(MyNormalized, log=TRUE, prior.count=3)
o <- order(fit$p.value[,2])
coolmap( logCPM[o[1:100],] )

You have to understand though that voomWithQualityWeights() is designed to handle outlier samples, and outlier samples may not cluster nicely in the heatmap. If all the samples separated beautifully according to simple clustering algorithm, then you probably wouldn't need to downweight outlier samples, would you?