Non-specific filtering methodogies for ExpressionSet in R/Bioconductor
2
0
Entering edit mode
svlachavas ▴ 780
@svlachavas-7225
Last seen 8 hours ago
Germany/Heidelberg/German Cancer Resear…

I'm currently preprocessing 34 cel files in R for finding differentially expressed genes using various statistical tests. I would like to ask before any statistical interference, which way of nonspecific filtering is optimal for my normalized ExpressionSet ?

Should I use criteria such as variance of standard deviation via genefilter package, or also filter regarding present/absent calls ??

My Affymetrix platform is HG-U133 plus2.0 array.

2
Entering edit mode
@james-w-macdonald-5106
Last seen 5 days ago
United States

There isn't really an 'optimal' filtering method. As with most things, there are tradeoffs involved when you are excluding data, and people tend to have their own opinions about what is and isn't a reasonable thing to do.

As you already know, there are methods in the genefilter package that can be used to filter data in a non-specific manner, and you can also remove probesets based on present/absent calls. Your goal as an analyst is to understand the tradeoffs involved with any filtering method you might care to use, and to have a defensible reason for those you choose.

0
Entering edit mode

Thank you for your answer !!! i understand that there is not a "gold standard" regarding non-specific filtering based on the individual and specific characteristics of the dataset under investigation and analysis. My questions refer more about the optional step for filtering based on present/absent calls(MAS5.0 or panp package in R), or after quality control and normalizing perform non-specific filtering based on various options ??

2
Entering edit mode
@gordon-smyth
Last seen 27 minutes ago
WEHI, Melbourne, Australia

The filtering that is appropriate for a particular data set depends on the downstream analysis that you intend to do with the filtered results and, to a somewhat lesser extent, on how you preprocessed the Affymetrix data.

Filtering out consistently non-expressed probe-sets by far the most common filtering step, because keeping probe-sets in your analysis that are never expressed is hardly ever useful. Apart from that, it is better not to filter unless you know what you're doing.

If you plan to use limma for the differential expression analysis, then filtering is not much needed, especially if you use trend=TRUE in the eBayes step. I personally prefer to keep it simple. Do some some simple filtering on mean log-expression, or don't filter at all.

0
Entering edit mode

I apologize, I'm very new to microarray data analysis would you please elaborate on the meaning of "mean log-expression".Thanks