Question

why genefilter does not work for me ?

0

Entering edit mode

Nemo ▴ 80

@nemo-7332

Last seen 8.0 years ago

India

I tried to filter out those genes which do not express by gene filter method as follows:

f1 <- pOverA(0.25, 3.5)

ffun1 <- filterfun(f1)

flrGene <- genefilter(data,ffun1)

sum(flrGene)

Then it gives me zero , why ? means I should keep all the genes ? is there any other method to remove those genes with very low expression over samples ?

R microarray genefilter • 2.2k views

ADD COMMENT • link 11.0 years ago Nemo ▴ 80

score 1 · Accepted Answer · 2015-02-13

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 days ago

United States

That actually means you should eliminate all genes, as none have more than 25% > 3.5. But this seems really weird to me, so you will have to tell us more, like what these data are, how you generated your 'data' object (bad object name, btw, as there is a function called 'data' as well that you are masking), etc. You should also show us what you get for

head(data)

or

head(exprs(data))

if data is an ExpressionSet.

ADD COMMENT • link 11.0 years ago James W. MacDonald 68k

0

Entering edit mode

@James W. MacDonald In fact, the data is a microarray data consisting of 30000 probes 2000 samples , each row represents a gene (probe) and each column a sample. The data are the log-fold changes. by running the head(data) I get something for example like below (since it is a large data set) I only show few column and few rows

NAME	M1	M2	M3	M4
1007_s_at	-0.2815	-0.2032	-0.2539	0.041
1053_at	-0.0113	0.0285	-0.0675	0.0048
117_at	-0.0448	-0.136	-0.2189	0.0637
121_at	-0.081	0.1412	0.0464	-0.018
1255_g_at	0.0486	-0.0239	0.0753	-0.067

ADD REPLY • link 11.0 years ago Nemo ▴ 80

1

Entering edit mode

That's sort of weird, as those are Affy IDs, and Affy IDs are single color. Are these paired samples that you have computed fold changes manually?

Anyway, you don't want to use pOverA() for fold change data, as you will have both positive and negative values. pOverA() is intended for single-color expression values, which are strictly positive, and usually range from say 3 to 14 or so, after taking logs.

If you want to filter out genes that don't appear to change, you can just define a fold change that you think isn't different from zero, and then do the test:

fc <- 0.3 
## this is just a number - you have to choose something reasonable

ind <- rowSums(abs(data) > fc) > 500

## I choose 500, as that is 25% of 2000, which is what you did previously

data.filt <- data[ind,]

ADD REPLY • link 11.0 years ago James W. MacDonald 68k

0

Entering edit mode

@James W. MacDonald what is weird about it?

You are for sure right that pOver is for positive ones and your solution is a good idea. Thanks!

I am wondering whether I can have your email address to send you an email ?

ADD REPLY • link 11.0 years ago Nemo ▴ 80

0

Entering edit mode

What is weird about it is that Affy arrays are single color, meaning you only hybridize one sample to the array. Since there is only one sample per array, the data are not by default a ratio (because a ratio implies two samples, and you only hybed one to the array).

So the fact that you apparently have Affy data, but you also seem to have log ratios is not within my expectation for Affy data. So there is evidently more going on with these data than the run of the mill analysis.

ADD REPLY • link 11.0 years ago James W. MacDonald 68k

0

Entering edit mode

@James W. MacDonald what is your suggestion? do you have any reference for it ?

By the way, by setting fc to 0.2, I removed over 20000 of genes, do you think it is a good approach to get raid of the genes which do not highly expressed ?

I have over 5 cell informations, should i keep the same selected genes and discard the other genes ? if so, how should I do it ?

ADD REPLY • link 11.0 years ago Nemo ▴ 80

0

Entering edit mode

I'm not sure what you are asking here. In addition, as I already mentioned, these data do not fulfill my expectations for Affy data, and I have no idea why you have log ratios rather than log expression values.

I am very hesitant to give any analysis advice as a general rule, and in this case that goes double since I really have no idea about these data, nor what you are trying to do. I would highly recommend that you find a local statistician to help you with this analysis, especially if you are trying to do real science rather than just practicing.

ADD REPLY • link 11.0 years ago James W. MacDonald 68k