athPkgBuilder data source :missing probesets
4
0
Entering edit mode
Nianhua Li ▴ 870
@nianhua-li-1606
Last seen 8.3 years ago
Hi, Tine, Bjorn, Thomas and other Arabidopsis experts, Thanks a lot for the feedbacks. I will get the update done this week if you could help me to solve the following problem :P In TAIR's probe-to-locus mapping file, for example ftp://ftp.arabidopsis.org/home/tair/ Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt some probesets are mapped to >= 1 locus. However, in annotation packages ath1121501 and ag, all annotations (e.g. agCHRLOC, agENZYME) are indexed by probeset identifier. It assumes a one-to-one mapping between probeset and gene, so that the annotation to a gene is the annotation to a probeset. How to handle the one probeset to multiple locus mappings? I can think 3 possible solutions: 1. pick the "best" locus, but how? 2. mix the annotations to all mapped locus together 3. set to NA Any suggestions are highly appreciated. Many thanks! nianhua
Annotation Annotation • 834 views
ADD COMMENT
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 21 months ago
United States
I would go for the solution that supports a one-to-many relationship for the probe-to-locus mappings. This way there is no information loss. Thomas On Mon 08/14/06 23:25, Nianhua Li wrote: > Hi, Tine, Bjorn, Thomas and other Arabidopsis experts, > > Thanks a lot for the feedbacks. I will get the update done this week if you > could help me to solve the following problem :P > > In TAIR's probe-to-locus mapping file, for example > ftp://ftp.arabidopsis.org/home/tair/ > Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt > > some probesets are mapped to >= 1 locus. However, in annotation packages > ath1121501 and ag, all annotations (e.g. agCHRLOC, agENZYME) are indexed by > probeset identifier. It assumes a one-to-one mapping between probeset and gene, > so that the annotation to a gene is the annotation to a probeset. > > How to handle the one probeset to multiple locus mappings? I can think 3 > possible solutions: > 1. pick the "best" locus, but how? > 2. mix the annotations to all mapped locus together > 3. set to NA > > Any suggestions are highly appreciated. Many thanks! > > nianhua > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437
ADD COMMENT
0
Entering edit mode
Thomas Girke wrote: > I would go for the solution that supports a one-to-many relationship > for the probe-to-locus mappings. This way there is no information loss. > The problem with that approach is that it will break an awful lot of downstream code that believes that these are one-to-one mappings. We really would need a full release cycle (starting in early October) to get such a change to work and to minimize the likely negative effects. We would also like to be sure that there are good reasons for the one-to-many result, it is problematic for other reasons as well. best wishes Robert > Thomas > > On Mon 08/14/06 23:25, Nianhua Li wrote: >> Hi, Tine, Bjorn, Thomas and other Arabidopsis experts, >> >> Thanks a lot for the feedbacks. I will get the update done this week if you >> could help me to solve the following problem :P >> >> In TAIR's probe-to-locus mapping file, for example >> ftp://ftp.arabidopsis.org/home/tair/ >> Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt >> >> some probesets are mapped to >= 1 locus. However, in annotation packages >> ath1121501 and ag, all annotations (e.g. agCHRLOC, agENZYME) are indexed by >> probeset identifier. It assumes a one-to-one mapping between probeset and gene, >> so that the annotation to a gene is the annotation to a probeset. >> >> How to handle the one probeset to multiple locus mappings? I can think 3 >> possible solutions: >> 1. pick the "best" locus, but how? >> 2. mix the annotations to all mapped locus together >> 3. set to NA >> >> Any suggestions are highly appreciated. Many thanks! >> >> nianhua >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
Tine Casneuf ▴ 80
@tine-casneuf-1773
Last seen 8.3 years ago
But to me it is hard to think of a way to choose the 'best' locus that is detected by a probeset. And also if you pick 1 of them, it appears as if only 1 is detected by this probeset, while actually there are more. There are I presume not many cases where a measure from these probesets are useful, so you might as well annotate it with 'multiple'. tine Robert Gentleman wrote: > Thomas Girke wrote: > > I would go for the solution that supports a one-to-many relationship > > for the probe-to-locus mappings. This way there is no information loss. > > > > The problem with that approach is that it will break an awful lot of > downstream code that believes that these are one-to-one mappings. We > really would need a full release cycle (starting in early October) to > get such a change to work and to minimize the likely negative effects. > We would also like to be sure that there are good reasons for the > one-to-many result, it is problematic for other reasons as well. > > > best wishes > Robert > > > > > Thomas > > > > On Mon 08/14/06 23:25, Nianhua Li wrote: > >> Hi, Tine, Bjorn, Thomas and other Arabidopsis experts, > >> > >> Thanks a lot for the feedbacks. I will get the update done this > week if you > >> could help me to solve the following problem :P > >> > >> In TAIR's probe-to-locus mapping file, for example > >> ftp://ftp.arabidopsis.org/home/tair/ > >> Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt > >> > >> some probesets are mapped to >= 1 locus. However, in annotation > packages >> ath1121501 and ag, all annotations (e.g. agCHRLOC, > agENZYME) are indexed by > >> probeset identifier. It assumes a one-to-one mapping between > probeset and gene, > >> so that the annotation to a gene is the annotation to a probeset. > >> > >> How to handle the one probeset to multiple locus mappings? I can > think 3 > >> possible solutions: > >> 1. pick the "best" locus, but how? > >> 2. mix the annotations to all mapped locus together > >> 3. set to NA > >> > >> Any suggestions are highly appreciated. Many thanks! > >> > >> nianhua > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >
ADD COMMENT
0
Entering edit mode
Nianhua, I guess in this case, I'd support your second solution: >> 2. mix the annotations to all mapped loci together In the future it may make a lot of sense to support one-to-many relationships on this level. I bet there are a lot of chips from other organisms where a considerable fraction of probe sets maps to several genes. Thanks again for doing this! Best, Thomas On Wed 08/16/06 09:42, Tine Casneuf wrote: > But to me it is hard to think of a way to choose the 'best' locus that > is detected by a probeset. And also if you pick 1 of them, it appears as > if only 1 is detected by this probeset, while actually there are more. > There are I presume not many cases where a measure from these probesets > are useful, so you might as well annotate it with 'multiple'. > > tine > > > Robert Gentleman wrote: > > >Thomas Girke wrote: > >> I would go for the solution that supports a one-to-many relationship > >> for the probe-to-locus mappings. This way there is no information loss. > >> > > > > The problem with that approach is that it will break an awful lot of > >downstream code that believes that these are one-to-one mappings. We > >really would need a full release cycle (starting in early October) to > >get such a change to work and to minimize the likely negative effects. > >We would also like to be sure that there are good reasons for the > >one-to-many result, it is problematic for other reasons as well. > > > > > > best wishes > > Robert > > > > > > > >> Thomas > >> > >> On Mon 08/14/06 23:25, Nianhua Li wrote: > >>> Hi, Tine, Bjorn, Thomas and other Arabidopsis experts, > >>> > >>> Thanks a lot for the feedbacks. I will get the update done this > >week if you > >>> could help me to solve the following problem :P > >>> > >>> In TAIR's probe-to-locus mapping file, for example > >>> ftp://ftp.arabidopsis.org/home/tair/ > >>> Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt > >>> > >>> some probesets are mapped to >= 1 locus. However, in annotation > >packages >> ath1121501 and ag, all annotations (e.g. agCHRLOC, > >agENZYME) are indexed by > >>> probeset identifier. It assumes a one-to-one mapping between > >probeset and gene, > >>> so that the annotation to a gene is the annotation to a probeset. > >>> > >>> How to handle the one probeset to multiple locus mappings? I can > >think 3 > >>> possible solutions: > >>> 1. pick the "best" locus, but how? > >>> 2. mix the annotations to all mapped locus together > >>> 3. set to NA > >>> > >>> Any suggestions are highly appreciated. Many thanks! > >>> > >>> nianhua > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at stat.math.ethz.ch > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >> > > > -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437
ADD REPLY
0
Entering edit mode
Björn Usadel ▴ 250
@bjorn-usadel-1492
Last seen 8.3 years ago
Hi Nainhua, as Tine pointed out I would suggest you choose multiple.... or you do it the hard way. For our purposes (MapMan visualization based on classification) I test, if the multiple genes hit by one probeset, have a similar function. If this is the case I mix the annotations assuming that it might be a [diverged] gene family, in which case there might be some information left (Affy used to tag them _s_, but affy is way outdated) when I sample the whole class. However, if a probesets turns out to hit genes of different classes [non gene families ancient _x_ tag] (e.g. glycolysis and say proteasom dependent degradation) I annotate the probeset as "hitting multi" and put it in a special "non-evaluate able" class. You could also try to determine if it is really a gene family that is hit, in which case the annotations would be similar as well anyway. But that is a lot of querying and eventually needs manual interaction. Thanks for your work. Cheers, Bj?rn Nianhua Li wrote: > Hi, Tine, Bjorn, Thomas and other Arabidopsis experts, > > Thanks a lot for the feedbacks. I will get the update done this week if you > could help me to solve the following problem :P > > In TAIR's probe-to-locus mapping file, for example > ftp://ftp.arabidopsis.org/home/tair/ > Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt > > some probesets are mapped to >= 1 locus. However, in annotation packages > ath1121501 and ag, all annotations (e.g. agCHRLOC, agENZYME) are indexed by > probeset identifier. It assumes a one-to-one mapping between probeset and gene, > so that the annotation to a gene is the annotation to a probeset. > > How to handle the one probeset to multiple locus mappings? I can think 3 > possible solutions: > 1. pick the "best" locus, but how? > 2. mix the annotations to all mapped locus together > 3. set to NA > > Any suggestions are highly appreciated. Many thanks! > > nianhua > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- -+-+-+-+-+-+-+-+-+-+-+- Bj?rn Usadel, PhD Max Planck Institute of Molecular Plant Physiology System Regulation Group Am M?hlenberg 1 D-14476 Golm Germany Tel (+49 331) 567-8114 Email usadel at mpimp-golm.mpg.de WWW mapman.mpimp-golm.mpg.de
ADD COMMENT
0
Entering edit mode
@justin-borevitz-1002
Last seen 8.3 years ago
Hi We have re-annotated the ath1 probes with the V6 annotation for Arabidopsis. You can find it here http://naturalvariation.org/methods/ath1V6anno.RData in this probe setup 25mers with multiple gene matches are excluded.. We use probe level modeling for gene expression estimates. This is likely more than you wanted but I thought I put it out there as my solution to the problem.. Justin Borevitz Date: Thu, 17 Aug 2006 15:01:24 +0200 From: Bj?rn Usadel <usadel@mpimp-golm.mpg.de> Subject: To: Nianhua Li <nli at="" fhcrc.org=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <44E468A4.7050101 at mpimp-golm.mpg.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Nainhua, as Tine pointed out I would suggest you choose multiple.... or you do it the hard way. For our purposes (MapMan visualization based on classification) I test, if the multiple genes hit by one probeset, have a similar function. If this is the case I mix the annotations assuming that it might be a [diverged] gene family, in which case there might be some information left (Affy used to tag them _s_, but affy is way outdated) when I sample the whole class. However, if a probesets turns out to hit genes of different classes [non gene families ancient _x_ tag] (e.g. glycolysis and say proteasom dependent degradation) I annotate the probeset as "hitting multi" and put it in a special "non-evaluate able" class. You could also try to determine if it is really a gene family that is hit, in which case the annotations would be similar as well anyway. But that is a lot of querying and eventually needs manual interaction. Thanks for your work. Cheers, Bj?rn Nianhua Li wrote: > Hi, Tine, Bjorn, Thomas and other Arabidopsis experts, > > Thanks a lot for the feedbacks. I will get the update done this week if you > could help me to solve the following problem :P > > In TAIR's probe-to-locus mapping file, for example > ftp://ftp.arabidopsis.org/home/tair/ > Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt > > some probesets are mapped to >= 1 locus. However, in annotation packages > ath1121501 and ag, all annotations (e.g. agCHRLOC, agENZYME) are indexed by > probeset identifier. It assumes a one-to-one mapping between probeset and gene, > so that the annotation to a gene is the annotation to a probeset. > > How to handle the one probeset to multiple locus mappings? I can think 3 > possible solutions: > 1. pick the "best" locus, but how? > 2. mix the annotations to all mapped locus together > 3. set to NA > > Any suggestions are highly appreciated. Many thanks! > > nianhua > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- -+-+-+-+-+-+-+-+-+-+-+- Bj?rn Usadel, PhD Max Planck Institute of Molecular Plant Physiology System Regulation Group Am M?hlenberg 1 D-14476 Golm Germany Tel (+49 331) 567-8114 Email usadel at mpimp-golm.mpg.de WWW mapman.mpimp-golm.mpg.de
ADD COMMENT

Login before adding your answer.

Traffic: 290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6