Closed:Unsupervised clustering heatmap of gene expression data
0
0
Entering edit mode
Biologist ▴ 110
@biologist-9801
Last seen 4.1 years ago

Hello,

I have obtained Level 3 data gene-level transcription estimates, as in log2(x+1) transformed RSEM normalized count from Xena browser. I would like to plot a hierarchical clustering heatmap with top 30% highly variable genes. I have few doubts for this.

1) Do I need to normalize the data again?

2) How should I apply filtering steps to reduce the number of genes for clustering?

If I need to normalize the data again Do you think the code below is right? Lets think the matrix "h" with rows as genes and sample as columns. The matrix has 20,000 genes

library(limma)
y <- normalizeQuantiles(h) #Quantile Normalization

#keep genes that have about 10 counts or more in at least 14 samples
keep <- rowSums(y > log2(11)) >= 14
table(keep)

keep
FALSE  TRUE 
 3624 16906

y2 <- y[keep,]

library("genefilter")
vars <- apply(y2, 1, IQR) 
f2 <- y2[vars > quantile(vars, 0.7), ] #selecting top 30% highly variable genes

I finally got around 5000 genes with this steps. Do you think this is right for unsupervised clustering?

Thank you

hierarchical clustering heatmap rsem geneexpression • 28 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6