Question

Non-specific filtering methodogies for ExpressionSet in R/Bioconductor

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 7 months ago

Germany/Heidelberg/German Cancer Resear…

I'm currently preprocessing 34 cel files in R for finding differentially expressed genes using various statistical tests. I would like to ask before any statistical interference, which way of nonspecific filtering is optimal for my normalized ExpressionSet ?

Should I use criteria such as variance of standard deviation via genefilter package, or also filter regarding present/absent calls ??

My Affymetrix platform is HG-U133 plus2.0 array.

Thanks in advance !!!

genefilter R bioconductor differential gene expression • 3.2k views

ADD COMMENT • link updated 11.1 years ago by Gordon Smyth 53k • written 11.1 years ago by svlachavas ▴ 840

score 2 · Answer 1 · 2015-01-08

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

There isn't really an 'optimal' filtering method. As with most things, there are tradeoffs involved when you are excluding data, and people tend to have their own opinions about what is and isn't a reasonable thing to do.

As you already know, there are methods in the genefilter package that can be used to filter data in a non-specific manner, and you can also remove probesets based on present/absent calls. Your goal as an analyst is to understand the tradeoffs involved with any filtering method you might care to use, and to have a defensible reason for those you choose.

ADD COMMENT • link 11.1 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you for your answer !!! i understand that there is not a "gold standard" regarding non-specific filtering based on the individual and specific characteristics of the dataset under investigation and analysis. My questions refer more about the optional step for filtering based on present/absent calls(MAS5.0 or panp package in R), or after quality control and normalizing perform non-specific filtering based on various options ??

ADD REPLY • link 11.1 years ago svlachavas ▴ 840

score 2 · Answer 2 · 2015-01-11

The filtering that is appropriate for a particular data set depends on the downstream analysis that you intend to do with the filtered results and, to a somewhat lesser extent, on how you preprocessed the Affymetrix data.

Filtering out consistently non-expressed probe-sets by far the most common filtering step, because keeping probe-sets in your analysis that are never expressed is hardly ever useful. Apart from that, it is better not to filter unless you know what you're doing.

If you plan to use limma for the differential expression analysis, then filtering is not much needed, especially if you use trend=TRUE in the eBayes step. I personally prefer to keep it simple. Do some some simple filtering on mean log-expression, or don't filter at all.