Clustering differential expression results from voomWithQualityWeights
1
0
Entering edit mode
hrishi27n ▴ 20
@hrishi27n-11821
Last seen 3 months ago
United States

Hello all,

I am very new to RNA-seq analysis and have a very specific question about using voomWithQualityWeights

For my analysis, I am using the expected counts generated from rsem tool to run my differential expression analysis. 

I am using edgeR for TMM normalization and then voomWithQualityWeights before fitting my model. I have also removed genes with a zero counts from the expected count dataframe. 

Overall I see ~600 genes differentially expressed between my two groups. I want to run hierarchal clustering on the 600 DE genes. It seems that using expected counts values for these 600 genes doesn't really produce two separate clusters, which I assume I should see since these genes are DE. 

I would like to know, what variable(v$E, v$weights) from the voomWithQualityWeights actually goes into the lmFit, so that I could use same variable to run hierarchal clustering. 

Apologies if this post is confusing, the following is my code. 

## DataFrame with expected-counts from rsem. 
CountFrame <- (fread(CountMatrix, header=TRUE,sep=",",data.table=FALSE))
rownames(CountFrame) <- CountFrame[,1]
CountFrame[,1] <- NULL
keep <- CountFrame[rowSums(CountFrame==0) <= 25,] ## I have 86 samples
CountFrame <- keep

## DataFrame describing the grouping along with other covariates(age,sex)
phenoFrame <-read.table(phenoMatrix,header=T,sep="\t")
group <-factor(phenoFrame$CATEGORY)
sex <- phenoFrame$SEX
age <-phenoFrame$AGE

##edgeR
myDGElist <-DGEList(counts=CountFrame,group=group)
myNormalized <- calcNormFactors(myDGElist)
design <- model.matrix(~group)
rownames(design) <- colnames(myNormalized)

## voom-limma with weight
v <- voomWithQualityWeights(myNormalized, design=design, normalization="none", plot=FALSE)
fit <- lmFit(v,design)
fit <- eBayes(fit)
top<-topTable(fit, coef=2, number=nrow(fit), adjust.method="BH", sort.by="P")

### ~600 significantly genes with adjusted p-value <0.05
RNA limma rsem edgeR • 2.2k views
ADD COMMENT
0
Entering edit mode

As a side note: you mention that you removed genes with zero counts (although you didn't show in your core). You will probably have to filter out a bit more than that. Make sure that you inspect the voom variance:mean plot that is produced when you call voomWithQualityWeights(...., plot=TRUE).

ADD REPLY
0
Entering edit mode

updated the code

ADD REPLY
6
Entering edit mode
@gordon-smyth
Last seen 23 minutes ago
WEHI, Melbourne, Australia

Both variables (v$E and v$weights) go into lmFit. You can't reproduce the voom() or lmFit() calculations in a hierarchical clustering algorithm.

Are you trying to cluster genes or samples? There really is no requirement for the genes to cluster into two clusters. It is the samples that should separate, but even that might not be clear because hierarchical clustering can't downweight poorer quality samples like voomWithQualityWeights() does.

I will assume that you are trying to make heatmap. The simplest way way to make a heatmap is to pass v$E to heatmap.2(), demonstrated in the limma workflow example:

  https://f1000research.com/articles/5-1408

My favourite method though would be like this. This makes a heatmap of the 100 most DE genes:

logCPM <- cpm(MyNormalized, log=TRUE, prior.count=3)
o <- order(fit$p.value[,2])
coolmap( logCPM[o[1:100],] )

You have to understand though that voomWithQualityWeights() is designed to handle outlier samples, and outlier samples may not cluster nicely in the heatmap. If all the samples separated beautifully according to simple clustering algorithm, then you probably wouldn't need to downweight outlier samples, would you?

ADD COMMENT
0
Entering edit mode

Gordon thank you very much for your response. I would like to cluster the samples by using just the differentially expressed genes(I tried using all genes but saw no clusters). I tried doing PCA, hierarchal clustering and even MDS plots on the samples using the rsem values but don't really see to separation. Is it possible to have DGE without samples clustering into two separate groups? 

ADD REPLY
0
Entering edit mode

Yes, it is possible. I've added more detail to my answer above by way of explanation.

ADD REPLY

Login before adding your answer.

Traffic: 365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6