Search
Question: why genefilter does not work for me ?
0
gravatar for Nemo
2.8 years ago by
Nemo60
India
Nemo60 wrote:

I tried to filter out those genes which do not express by gene filter method as follows: 

f1 <- pOverA(0.25, 3.5)

ffun1 <- filterfun(f1)

flrGene <- genefilter(data,ffun1)

sum(flrGene)

Then it gives me zero , why ? means I should keep all the genes ? is there any other method to remove those genes with very low expression over samples ? 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Nemo60
1
gravatar for James W. MacDonald
2.8 years ago by
United States
James W. MacDonald45k wrote:

That actually means you should eliminate all genes, as none have more than 25% > 3.5. But this seems really weird to me, so you will have to tell us more, like what these data are, how you generated your 'data' object (bad object name, btw, as there is a function called 'data' as well that you are masking), etc. You should also show us what you get for

head(data)

or

head(exprs(data))

if data is an ExpressionSet.

ADD COMMENTlink written 2.8 years ago by James W. MacDonald45k

@James W. MacDonald In fact, the data is a microarray data consisting of 30000 probes 2000 samples , each row represents a gene (probe) and each column a sample. The data are the log-fold changes. by running the head(data) I get something for example like below (since it is a large data set) I only show few column and few rows 

NAME M1 M2 M3 M4
1007_s_at -0.2815 -0.2032 -0.2539 0.041
1053_at -0.0113 0.0285 -0.0675 0.0048
117_at -0.0448 -0.136 -0.2189 0.0637
121_at -0.081 0.1412 0.0464 -0.018
1255_g_at 0.0486 -0.0239 0.0753 -0.067
ADD REPLYlink written 2.8 years ago by Nemo60
1

That's sort of weird, as those are Affy IDs, and Affy IDs are single color. Are these paired samples that you have computed fold changes manually?

Anyway, you don't want to use pOverA() for fold change data, as you will have both positive and negative values. pOverA() is intended for single-color expression values, which are strictly positive, and usually range from say 3 to 14 or so, after taking logs.

If you want to filter out genes that don't appear to change, you can just define a fold change that you think isn't different from zero, and then do the test:

fc <- 0.3 
## this is just a number - you have to choose something reasonable

ind <- rowSums(abs(data) > fc) > 500

## I choose 500, as that is 25% of 2000, which is what you did previously

data.filt <- data[ind,]

ADD REPLYlink written 2.8 years ago by James W. MacDonald45k

@James W. MacDonald what is weird about it?

You are for sure right that pOver is for positive ones and your solution is a good idea. Thanks! 

I am wondering whether I can have your email address to send you an email ? 

ADD REPLYlink written 2.8 years ago by Nemo60

What is weird about it is that Affy arrays are single color, meaning you only hybridize one sample to the array. Since there is only one sample per array, the data are not by default a ratio (because a ratio implies two samples, and you only hybed one to the array).

So the fact that you apparently have Affy data, but you also seem to have log ratios is not within my expectation for Affy data. So there is evidently more going on with these data than the run of the mill analysis.

ADD REPLYlink written 2.8 years ago by James W. MacDonald45k

@James W. MacDonald what is your suggestion? do you have any reference for it ? 

By the way, by setting fc to 0.2, I removed over 20000 of genes, do you think it is a good approach to get raid of the genes which do not highly expressed ? 

I have over 5 cell informations, should i keep the same selected genes and discard the other genes ? if so, how should I do it ?

ADD REPLYlink written 2.8 years ago by Nemo60

I'm not sure what you are asking here. In addition, as I already mentioned, these data do not fulfill my expectations for Affy data, and I have no idea why you have log ratios rather than log expression values.

I am very hesitant to give any analysis advice as a general rule, and in this case that goes double since I really have no idea about these data, nor what you are trying to do. I would highly recommend that you find a local statistician to help you with this analysis, especially if you are trying to do real science rather than just practicing.

ADD REPLYlink written 2.8 years ago by James W. MacDonald45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 154 users visited in the last hour