Question: How to select top 10% highly variable genes in microarray data?
0
gravatar for Biologist
21 months ago by
Biologist70
Biologist70 wrote:

Hi,

I use microarray data. I'm using "oligo" R package for background correction and normalisation of expression values. After normalisation I want to calculate Z-score to generate a heatmap.

As they are around 25,000 genes with expression values in the matrix, I want to create a heatmap with only top 10% highly variable genes.

Looking for a best statistical way to select top 10% highly variable genes with which I can plot a heatmap.

With some google search I found the following one:

"normdata" is a matrix with 25,000 genes after background correction and normalisation.

        x <- apply(normdata, 1, IQR) #Calculate IQR
        y <- normdata[x > quantile(x, 0.9), ] #selecting top 10% highly variable genes

Do you think the above code is the right way to select top 10% highly variable genes?

Thank you

microarray oligo R snp6.0 • 1.1k views
ADD COMMENTlink modified 21 months ago by SamGG190 • written 21 months ago by Biologist70
Answer: How to select top 10% highly variable genes in microarray data?
0
gravatar for James W. MacDonald
21 months ago by
United States
James W. MacDonald51k wrote:

That's a way to do it, so long as you also account for NA values. Or you could use varFilter in the genefilter package, which will be much faster.

> z <- matrix(rnorm(1e6), ncol = 10)
> system.time(varFilter(z, var.cutoff = 0.9))
   user  system elapsed
   0.05    0.00    0.05

> fun <- function(z){y <- apply(z, 1, IQR); z[y > quantile(y, 0.9),]}
> system.time(fun(z))
   user  system elapsed
   6.08    0.00    6.14

But even with 1e5 'genes' your way only requires you to wait six extra seconds...

ADD COMMENTlink written 21 months ago by James W. MacDonald51k

Dear James,

Thanks for the reply. I'm not asking about the which is faster. I'm asking whether the above given code can be used for selecting top 10% highly variable genes or not.

And one more question is - Do I need to select top 10% highly variable genes before normalisation or after normalisation?

Thank you

ADD REPLYlink written 21 months ago by Biologist70
Answer: How to select top 10% highly variable genes in microarray data?
0
gravatar for SamGG
21 months ago by
SamGG190
France
SamGG190 wrote:

Hi,

I am not an expert but IMHO your code is correct to achieve your goal.

Selection should take place AFTER normalization, but if your samples are roughly similar there should be not much difference between after or before.

Just a word concerning Z-score. It will relate the data to their dispersion in the heatmap while IQR selection will not use the dispersion at all. I always look at row centred data before using Z-score.

Best.

ADD COMMENTlink written 21 months ago by SamGG190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 391 users visited in the last hour