averaging multiple probes for same gene on agilent array
2
0
Entering edit mode
alison waller ▴ 180
@alison-waller-2505
Last seen 9.7 years ago
Dear Bioconductor list, I am analysing data from a custom Agilent array with 3600 spots using Limma. There are 3 probes for each gene (usually, however some genes only have one probe), all probes are in duplicate. I would like to obtain an average M value for each gene. Examples of the spot ID's are as below. D137-cbdb_A1587_1 D137-cbdb_A1587_1 D137-cbdb_A1587_2 D137-cbdb_A1587_2 D137-cbdb_A1587_3 D137-cbdb_A1587_3 D138-cbdb_A1594 D138-cbdb_A1594 One option I thought of was to adjust the GAL file to have identical IDs for all of the probes for the same gene and then use the avereps() function. ID Name D137 D137-cbdb_A1587_1 D137 D137-cbdb_A1587_1 D137 D137-cbdb_A1587_2 D137 D137-cbdb_A1587_2 D137 D137-cbdb_A1587_3 D137 D137-cbdb_A1587_3 D138 D138-cbdb_A1594 D138 D138-cbdb_A1594 However, the avereps() function seems more suitable for actual duplicates, for probesets I would like to use some weighted average where probes with intensities which are futher from the mean of the probe set are down weighted (for example the tukey biweight). Does anyone have experience with similar arrays or suggestions of an appropriate function. thank you, alison --------------------------------------------------------- Alison Waller Ph.D alison.waller at utoronto.ca
• 2.2k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Wed, Jul 22, 2009 at 3:00 PM, Alison Waller <alison.waller@utoronto.ca>wrote: > Dear Bioconductor list, > > I am analysing data from a custom Agilent array with 3600 spots using > Limma. > > There are 3 probes for each gene (usually, however some genes only have one > probe), all probes are in duplicate. > > I would like to obtain an average M value for each gene. > > Examples of the spot ID's are as below. > D137-cbdb_A1587_1 > D137-cbdb_A1587_1 > D137-cbdb_A1587_2 > D137-cbdb_A1587_2 > D137-cbdb_A1587_3 > D137-cbdb_A1587_3 > D138-cbdb_A1594 > D138-cbdb_A1594 > > > One option I thought of was to adjust the GAL file to have identical IDs > for all of the probes for the same gene and then use the avereps() function. > > ID Name > > D137 D137-cbdb_A1587_1 > D137 D137-cbdb_A1587_1 > D137 D137-cbdb_A1587_2 > D137 D137-cbdb_A1587_2 > D137 D137-cbdb_A1587_3 > D137 D137-cbdb_A1587_3 > D138 D138-cbdb_A1594 > D138 D138-cbdb_A1594 > > However, the avereps() function seems more suitable for actual duplicates, > for probesets I would like to use some weighted average where probes with > intensities which are futher from the mean of the probe set are down > weighted (for example the tukey biweight). > > Does anyone have experience with similar arrays or suggestions of an > appropriate function. > While this sounds like a good idea, it has some significant disadvantages over keeping the probes separate. I think most folks would suggest that you do your analyses at the probe level, as each probe is measuring the same thing. So, I would suggest summarizing only to the level of the probe and not to the level of the gene. Sean [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Tobias Straub ▴ 430
@tobias-straub-2182
Last seen 9.7 years ago
Hi Alison, I agree that from a biologist point of view a summarization on the gene level is very much wanted, therefore I would prefer summarize as early as possible (before testing for differential expression). I think, however, that the strategy will depend a bit on the rationale of probe design: if probes are e.g. always placed on different exons then you might expect very different Ms and the summarization is very problematic (also from a biological point of view). My personal way to deal with your problem on Agilent arrays is to first filter the probes before gene summarization based on several criteria a) agilent spot quality criteria (whatever you have, whatever you like) b) at present I also apply A-value cutoffs as the Ms are not reliable under and above certain expression levels My gene summarization is based on the assumption that the highest Ms are the most meaningful (maybe the most 'real'), therefore I do not calculate medians or sth similar but simply keep just the probe with the highest median of absolute Ms across the arrays. if most of your genes comprise 3 probes is anyway difficult to average. if anyone has better ideas, I am looking forward to hear them! best Tobias On Jul 22, 2009, at 9:00 PM, Alison Waller wrote: > Dear Bioconductor list, > > I am analysing data from a custom Agilent array with 3600 spots > using Limma. > > There are 3 probes for each gene (usually, however some genes only > have one probe), all probes are in duplicate. > > I would like to obtain an average M value for each gene. > > Examples of the spot ID's are as below. > D137-cbdb_A1587_1 > D137-cbdb_A1587_1 > D137-cbdb_A1587_2 > D137-cbdb_A1587_2 > D137-cbdb_A1587_3 > D137-cbdb_A1587_3 > D138-cbdb_A1594 > D138-cbdb_A1594 > > > One option I thought of was to adjust the GAL file to have identical > IDs for all of the probes for the same gene and then use the > avereps() function. > > ID Name > > D137 D137-cbdb_A1587_1 > D137 D137-cbdb_A1587_1 > D137 D137-cbdb_A1587_2 > D137 D137-cbdb_A1587_2 > D137 D137-cbdb_A1587_3 > D137 D137-cbdb_A1587_3 > D138 D138-cbdb_A1594 > D138 D138-cbdb_A1594 > > However, the avereps() function seems more suitable for actual > duplicates, for probesets I would like to use some weighted average > where probes with intensities which are futher from the mean of the > probe set are down weighted (for example the tukey biweight). > > Does anyone have experience with similar arrays or suggestions of an > appropriate function. > > thank you, > > alison > > --------------------------------------------------------- > Alison Waller Ph.D > alison.waller at utoronto.ca > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D
ADD COMMENT

Login before adding your answer.

Traffic: 544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6