Selection of appropriate gene filtering method for discovering statistically significant genes for Affymetrix arrays
2
0
Entering edit mode
svlachavas ▴ 830
@svlachavas-7225
Last seen 6 months ago
Germany/Heidelberg/German Cancer Resear…

I have been preprocessing in R a big data set in order to identify possible differentiated genes for two conditions. My main problem-question is, as i searched in literature, some people using filtering based on present/absent calls to remove probes which are totally(or in a significant persent) absent in their arrays. On the other hand, an important number of methodologies after normalization and quality control, use various implementations of non-specific filtering(for instance based on variance) prior to different statistical tests. My Affymetrix platform is HG-U133 plus2.0 array.

bioconductor differential gene expression independent filtering • 1.9k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

How are you intending to assess differential expression? If you intending to use limma, then you may filter on present/absence calls or an average log-expression, but you must not filter on variance.

All filtering methods that are unsupervised in terms of the sample allocations are considered to be non-specific for filtering purposes, including present/absence calls and average expression.

ADD COMMENT
0
Entering edit mode

For one first approach i want to compare the 34 samples i have into two conditions to compare from the phenoData. I have tried to  perform limma based on variance, but i have read from other threads, as also from other papers that is not recommended to combine variance filtering with limma. On the other hand, if i choose to filter on absent/present calls, in Affymetrix i have roughly two main options: mas5calls & the panp package which can be used for my specific platform. I have used the commands to generate present absent calls, but my main question is if i have also to filter out marginal calls or leave them in my ExpressionSet ? Also other methods i could implement based on this specific big dataset, could be multiple test procedure with unequal variance and fdr correction or the SAM test ? i could paste here a small sample from my script to give me an opinion about filtering out based in absent/present calls.Thank you again for your consideration on this matter !!

Best regards

ADD REPLY
0
Entering edit mode

You seem to be asking the same questions on two different threads, see Non-specific filtering methodogies for ExpressionSet in R/Bioconductor

ADD REPLY
0
Entering edit mode
svlachavas ▴ 830
@svlachavas-7225
Last seen 6 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Gordon,

Please excuse me because i didnt intend to ask for the same question and maybe i was misunderstood !! My basic question(problem) is whether i should use present/absent calls with Limma !! And secondly, if i use some filtering on present/absent calls, in your opinion if i want to remove uninformative probe-sets with Absent calls, what arbitary cut-off should i use ? For instance, i have 34 CEL files/samples in my dataset. So should i filter out these probesets with more of 80% absent calls ? Moreover, after my filtering, is useful to use some algorithm(like kNN) for missing value imputation ?

Best regards

ADD COMMENT

Login before adding your answer.

Traffic: 974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6