Search
Question: why genefilter does not work for me ?
0
3.4 years ago by
Nemo80
India
Nemo80 wrote:

I tried to filter out those genes which do not express by gene filter method as follows:

f1 <- pOverA(0.25, 3.5)

ffun1 <- filterfun(f1)

flrGene <- genefilter(data,ffun1)

sum(flrGene)

Then it gives me zero , why ? means I should keep all the genes ? is there any other method to remove those genes with very low expression over samples ?

modified 3.4 years ago • written 3.4 years ago by Nemo80
1
3.4 years ago by
United States
James W. MacDonald46k wrote:

That actually means you should eliminate all genes, as none have more than 25% > 3.5. But this seems really weird to me, so you will have to tell us more, like what these data are, how you generated your 'data' object (bad object name, btw, as there is a function called 'data' as well that you are masking), etc. You should also show us what you get for

head(data)

or

head(exprs(data))

if data is an ExpressionSet.

@James W. MacDonald In fact, the data is a microarray data consisting of 30000 probes 2000 samples , each row represents a gene (probe) and each column a sample. The data are the log-fold changes. by running the head(data) I get something for example like below (since it is a large data set) I only show few column and few rows

 NAME M1 M2 M3 M4 1007_s_at -0.2815 -0.2032 -0.2539 0.041 1053_at -0.0113 0.0285 -0.0675 0.0048 117_at -0.0448 -0.136 -0.2189 0.0637 121_at -0.081 0.1412 0.0464 -0.018 1255_g_at 0.0486 -0.0239 0.0753 -0.067
1

That's sort of weird, as those are Affy IDs, and Affy IDs are single color. Are these paired samples that you have computed fold changes manually?

Anyway, you don't want to use pOverA() for fold change data, as you will have both positive and negative values. pOverA() is intended for single-color expression values, which are strictly positive, and usually range from say 3 to 14 or so, after taking logs.

If you want to filter out genes that don't appear to change, you can just define a fold change that you think isn't different from zero, and then do the test:

fc <- 0.3
## this is just a number - you have to choose something reasonable

ind <- rowSums(abs(data) > fc) > 500

## I choose 500, as that is 25% of 2000, which is what you did previously

data.filt <- data[ind,]



@James W. MacDonald what is weird about it?

You are for sure right that pOver is for positive ones and your solution is a good idea. Thanks!

What is weird about it is that Affy arrays are single color, meaning you only hybridize one sample to the array. Since there is only one sample per array, the data are not by default a ratio (because a ratio implies two samples, and you only hybed one to the array).

So the fact that you apparently have Affy data, but you also seem to have log ratios is not within my expectation for Affy data. So there is evidently more going on with these data than the run of the mill analysis.

@James W. MacDonald what is your suggestion? do you have any reference for it ?

By the way, by setting fc to 0.2, I removed over 20000 of genes, do you think it is a good approach to get raid of the genes which do not highly expressed ?

I have over 5 cell informations, should i keep the same selected genes and discard the other genes ? if so, how should I do it ?

I'm not sure what you are asking here. In addition, as I already mentioned, these data do not fulfill my expectations for Affy data, and I have no idea why you have log ratios rather than log expression values.

I am very hesitant to give any analysis advice as a general rule, and in this case that goes double since I really have no idea about these data, nor what you are trying to do. I would highly recommend that you find a local statistician to help you with this analysis, especially if you are trying to do real science rather than just practicing.