Question: Non-specific filtering methodogies for ExpressionSet in R/Bioconductor
0
gravatar for svlachavas
4.7 years ago by
svlachavas700
Greece/Athens/National Hellenic Research Foundation
svlachavas700 wrote:

I'm currently preprocessing 34 cel files in R for finding differentially expressed genes using various statistical tests. I would like to ask before any statistical interference, which way of nonspecific filtering is optimal for my normalized ExpressionSet ?

Should I use criteria such as variance of standard deviation via genefilter package, or also filter regarding present/absent calls ??

My Affymetrix platform is HG-U133 plus2.0 array.

Thanks in advance !!!

ADD COMMENTlink modified 4.7 years ago by Gordon Smyth38k • written 4.7 years ago by svlachavas700
Answer: Non-specific filtering methodogies for ExpressionSet in R/Bioconductor
2
gravatar for James W. MacDonald
4.7 years ago by
United States
James W. MacDonald51k wrote:

There isn't really an 'optimal' filtering method. As with most things, there are tradeoffs involved when you are excluding data, and people tend to have their own opinions about what is and isn't a reasonable thing to do.

As you already know, there are methods in the genefilter package that can be used to filter data in a non-specific manner, and you can also remove probesets based on present/absent calls. Your goal as an analyst is to understand the tradeoffs involved with any filtering method you might care to use, and to have a defensible reason for those you choose.

ADD COMMENTlink written 4.7 years ago by James W. MacDonald51k

Thank you for your answer !!! i understand that there is not a "gold standard" regarding non-specific filtering based on the individual and specific characteristics of the dataset under investigation and analysis. My questions refer more about the optional step for filtering based on present/absent calls(MAS5.0 or panp package in R), or after quality control and normalizing perform non-specific filtering based on various options ??

ADD REPLYlink written 4.7 years ago by svlachavas700
Answer: Non-specific filtering methodogies for ExpressionSet in R/Bioconductor
2
gravatar for Gordon Smyth
4.7 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

The filtering that is appropriate for a particular data set depends on the downstream analysis that you intend to do with the filtered results and, to a somewhat lesser extent, on how you preprocessed the Affymetrix data.

Filtering out consistently non-expressed probe-sets by far the most common filtering step, because keeping probe-sets in your analysis that are never expressed is hardly ever useful. Apart from that, it is better not to filter unless you know what you're doing.

If you plan to use limma for the differential expression analysis, then filtering is not much needed, especially if you use trend=TRUE in the eBayes step. I personally prefer to keep it simple. Do some some simple filtering on mean log-expression, or don't filter at all.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Gordon Smyth38k

I apologize, I'm very new to microarray data analysis would you please elaborate on the meaning of "mean log-expression".Thanks

 

 

ADD REPLYlink written 2.2 years ago by alerodriguez0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 136 users visited in the last hour