Dealing with multiple probes per gene and multiple locations per probe.
1
0
Entering edit mode
@nathan-harmston-2904
Last seen 9.6 years ago
Hi everyone, Currently the aim of a project I'm working on is to discover pathway signatures (and I am thinking about using an approach like GSEA using KEGG or GO or something more modular). I have seen in some vignettes/tutorials that they recommend reducing the number of probes per gene to one by retaining the probe with the most variation since it will be the most informative. However, would it not be best to take the probe which is closest to the polyA tail of the gene, which according to some sources (in the lab I'm working at) is the most reliable probe in the gene? Is there a good reason for choosing variability over reliability, I have done a quick look through some papers and been unable to find any information which would point me towards one or another (apart from the bioC vignettes). Another problem I was wondering about is trying to deal with the multiple locations per probe problem? I was wondering if a BioConductor package was available for this, since it seems like a frequent issue with microarray analysis. How would you actually deal with this problem, my current approach is too remove probes which hit to multiple locations on the genome (I have a list from http://microarray.csc.mrc.ac.uk/scampa/section.html?id=5 and was going to use nsFilter (if I get it working correctly)). But again this seems like a lot of information is thrown away, is there a good way of dealing with these probes which doesnt result in a throwing away of information? Out of interest, how reliable is the annotation provided? Is it completely derived from the affy annotations. The number of probes where affy entrez id and the ensemblid match is approx 30000, which isn't that great a statistic. How do people tend to deal with problems like this? Sorry for the multiple questions in one post, but I think they are all related to each other. Many thanks in advance, Nathan simultaneously loving and hating R at the same time [[alternative HTML version deleted]]
Annotation GO probe affy Annotation GO probe affy • 1.3k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
On Tue, Jul 15, 2008 at 4:21 AM, Nathan Harmston <iwanttobeabadger at="" googlemail.com=""> wrote: > Hi everyone, > > Currently the aim of a project I'm working on is to discover pathway > signatures (and I am thinking about using an approach like GSEA using KEGG > or GO or something more modular). I have seen in some vignettes/tutorials > that they recommend reducing the number of probes per gene to one by > retaining the probe with the most variation since it will be the most > informative. However, would it not be best to take the probe which is > closest to the polyA tail of the gene, which according to some sources (in > the lab I'm working at) is the most reliable probe in the gene? Is there a > good reason for choosing variability over reliability, I have done a quick > look through some papers and been unable to find any information which would > point me towards one or another (apart from the bioC vignettes). Just to clarify, are you talking about probes or probesets? If you know the answer to which probesets are the most reliable, you could certainly use those. However, in the absence of such information, variability across an experiment that has biological variability is thought to be a surrogate for measuring something important. > Another problem I was wondering about is trying to deal with the multiple > locations per probe problem? I was wondering if a BioConductor package was > available for this, since it seems like a frequent issue with microarray > analysis. How would you actually deal with this problem, my current approach > is too remove probes which hit to multiple locations on the genome (I have a > list from http://microarray.csc.mrc.ac.uk/scampa/section.html?id=5 and was > going to use nsFilter (if I get it working correctly)). But again this seems > like a lot of information is thrown away, is there a good way of dealing > with these probes which doesnt result in a throwing away of information? The concept of using probes that map only once to the genome is not really entirely rational. Instead, one actually wants to use probes that map to only one gene. Whether or not a probe hits anywhere else in the genome (but not another gene) is irrelevant for mRNA expression. The limitation to mapping to transcripts and then to genes is that the transcriptome of any given organism is not entirely known. > Out of interest, how reliable is the annotation provided? Is it completely > derived from the affy annotations. The number of probes where affy entrez id > and the ensemblid match is approx 30000, which isn't that great a statistic. > How do people tend to deal with problems like this? The annotations are derived from a remapping to current annotation sources from the supplied accessions from affy. However, there have been several reannotations based on alignment ideas. A search of the archives might be helpful here. Sean
ADD COMMENT

Login before adding your answer.

Traffic: 609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6