Inconsistent annotation of affy probeset on Affymetrix chip for rat: 230.2
2
0
Entering edit mode
@christoph-preuss-2889
Last seen 9.6 years ago
Hi everyone, We analyzed a global exression microarray data set using gcrma for the normalization step and limma for finding differentially expressed genes. One of the most significant probesets (ProbeSetID annotation "1375535_at") in terms of d.e is annotated as : Probeset "1375535_at" -Gene Symbol: Lpin1 - Location: Chr 6 in the bioconductor package "rat2302" / "rat2302.db". We also looked at the Affymetrix web site, where the same probeset was annoted as "Transcribed sequence" on chromosome X. Affymetrix Annotation RG 230 2.0 Chip: -ProbeSetID: 1375535_at -Target Sequence: >RAT230_2:1375535_AT gaagttagagagctgtttccccactttacattttaaaatatgtatgccaggatntaatca ttcctttaagtgtacacttcaaggagagatgtgccgaataagaaaatagctttctctagc gtgaagggttttgcgtccgccgagttcttaaggtcttttttaagagctactgtgtatgag tgtgtgtatgtgtgcgcatgcatgttcctgcgactagtcattcattcacatggtgatcag acaacaatgggagctggttcgtctaccttatcttgtgggtcctggagttcaatctcagat catcaggctgggcagcaagtgccttcaccctccgagccatcttgccatcccacagctgag cgtctaatatgacattgccgatga Interestingly, the given target sequence for the probeset matches only a mouse sequence and not even a rat mRNA (blastn search). The question is which annotation should we trust? Is there any chance to validate the probeset annotation? Many thanks in advance for any help. cheers, Christoph Preuss (Leibniz-Institute for Arteriosclerosis Research, University of Muenster Germany )
Microarray Annotation limma gcrma Microarray Annotation limma gcrma • 1.3k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.7 years ago
United States
Christoph Preuss wrote: > Hi everyone, > > We analyzed a global exression microarray data set using gcrma for the > normalization step and limma for finding differentially expressed > genes. One of the most significant probesets (ProbeSetID annotation > "1375535_at") in terms of d.e is annotated as : > Probeset "1375535_at" > -Gene Symbol: Lpin1 > - Location: Chr 6 > > in the bioconductor package "rat2302" / "rat2302.db". > > We also looked at the Affymetrix web site, where the same probeset was > annoted as "Transcribed sequence" on chromosome X. > > Affymetrix Annotation RG 230 2.0 Chip: > -ProbeSetID: 1375535_at > -Target Sequence: > >> RAT230_2:1375535_AT >> > gaagttagagagctgtttccccactttacattttaaaatatgtatgccaggatntaatca > ttcctttaagtgtacacttcaaggagagatgtgccgaataagaaaatagctttctctagc > gtgaagggttttgcgtccgccgagttcttaaggtcttttttaagagctactgtgtatgag > tgtgtgtatgtgtgcgcatgcatgttcctgcgactagtcattcattcacatggtgatcag > acaacaatgggagctggttcgtctaccttatcttgtgggtcctggagttcaatctcagat > catcaggctgggcagcaagtgccttcaccctccgagccatcttgccatcccacagctgag > cgtctaatatgacattgccgatga > > Interestingly, the given target sequence for the probeset matches only > a mouse sequence and not even a rat mRNA (blastn search). > > The question is which annotation should we trust? > Is there any chance to validate the probeset annotation? > Many thanks in advance for any help. > > cheers, > > Christoph Preuss > > (Leibniz-Institute for Arteriosclerosis Research, University of > Muenster Germany ) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Hi Christoph, I can only really speak for the Bioconductor annotations which are generated from public sources along with an initial mapping of the probe or probeset to a public accession (usually this is a Genbank, Entrez ID or a related type of ID). In the case of "1375535_at", the probeset is an Affymetrix probeset and so we are ultimately at the mercy of Affymetrix to accurately tell us what this probeset is in this initial mapping, but after this we do the rest ourselves by using public sources. We map the probeset to ID information onto additional information gathered from public sources (primarily NCBI) to get the rest of the information in the package. The file that you get from Affymetrix may also have a lot of the same data as our packages, but unless they describe it somewhere, I don't think we actually know for certain where they collected all of their information from. The only information that we ever actually take from them is the initial mapping of their probeset onto a public accession. I dug up the latest Affymetrix mapping files that we used to generate this package and investigated. From the file that I have (which was collected in late March) the probeset you listed is indicated to be Lpin1, and also to be located on Chromosome 6 which agrees completely with the information that we gathered from NCBI and GoldenPath from this time. As of this morning, NCBI still lists this gene as being Lipin1 and being located on Chromosome 6. However, there is also a field right next to that in the Affymetrix file that is called "Alignments" which lists the X chromosome. But when I pull up an even more recent file from Affymetrix, then I see that they no longer list the location of this gene and have now replaced that value with a "---", they also no longer list the genes name or symbol. But they still list Chromosome "X" in the alignment field and have even assigned different accessions to this probeset. So the short answer is that Affymetrix has changed their mind about what they are claiming this probeset is measuring. I hope this helps you, Marc
ADD COMMENT
0
Entering edit mode
Hi Christoph, I would recommend obtaining the sequences of the actual probes that make up this probeset (from NetAffx), then align them to the latest genome using BLAT, thereby you can convince yourself which mRNA that these probes will be most likely to detect. I find that aligning the probes often tells you far more information than the affymetrix consensus sequence ever wi Be very concerned if your probes start aligning all over the genome!ll. cheers, Mark On 03/07/2008, at 3:47 AM, Marc Carlson wrote: > Christoph Preuss wrote: >> Hi everyone, >> >> We analyzed a global exression microarray data set using gcrma for >> the >> normalization step and limma for finding differentially expressed >> genes. One of the most significant probesets (ProbeSetID annotation >> "1375535_at") in terms of d.e is annotated as : >> Probeset "1375535_at" >> -Gene Symbol: Lpin1 >> - Location: Chr 6 >> >> in the bioconductor package "rat2302" / "rat2302.db". >> >> We also looked at the Affymetrix web site, where the same probeset >> was >> annoted as "Transcribed sequence" on chromosome X. >> >> Affymetrix Annotation RG 230 2.0 Chip: >> -ProbeSetID: 1375535_at >> -Target Sequence: >> >>> RAT230_2:1375535_AT >>> >> gaagttagagagctgtttccccactttacattttaaaatatgtatgccaggatntaatca >> ttcctttaagtgtacacttcaaggagagatgtgccgaataagaaaatagctttctctagc >> gtgaagggttttgcgtccgccgagttcttaaggtcttttttaagagctactgtgtatgag >> tgtgtgtatgtgtgcgcatgcatgttcctgcgactagtcattcattcacatggtgatcag >> acaacaatgggagctggttcgtctaccttatcttgtgggtcctggagttcaatctcagat >> catcaggctgggcagcaagtgccttcaccctccgagccatcttgccatcccacagctgag >> cgtctaatatgacattgccgatga >> >> Interestingly, the given target sequence for the probeset matches >> only >> a mouse sequence and not even a rat mRNA (blastn search). >> >> The question is which annotation should we trust? >> Is there any chance to validate the probeset annotation? >> Many thanks in advance for any help. >> >> cheers, >> >> Christoph Preuss >> >> (Leibniz-Institute for Arteriosclerosis Research, University of >> Muenster Germany ) >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > Hi Christoph, > > I can only really speak for the Bioconductor annotations which are > generated from public sources along with an initial mapping of the > probe or probeset to a public accession (usually this is a Genbank, > Entrez ID or a related type of ID). In the case of "1375535_at", > the probeset is an Affymetrix probeset and so we are ultimately at > the mercy of Affymetrix to accurately tell us what this probeset is > in this initial mapping, but after this we do the rest ourselves by > using public sources. We map the probeset to ID information onto > additional information gathered from public sources (primarily NCBI) > to get the rest of the information in the package. The file that > you get from Affymetrix may also have a lot of the same data as our > packages, but unless they describe it somewhere, I don't think we > actually know for certain where they collected all of their > information from. The only information that we ever actually take > from them is the initial mapping of their probeset onto a public > accession. > > I dug up the latest Affymetrix mapping files that we used to > generate this package and investigated. From the file that I have > (which was collected in late March) the probeset you listed is > indicated to be Lpin1, and also to be located on Chromosome 6 which > agrees completely with the information that we gathered from NCBI > and GoldenPath from this time. As of this morning, NCBI still lists > this gene as being Lipin1 and being located on Chromosome 6. > However, there is also a field right next to that in the Affymetrix > file that is called "Alignments" which lists the X chromosome. But > when I pull up an even more recent file from Affymetrix, then I see > that they no longer list the location of this gene and have now > replaced that value with a "---", they also no longer list the genes > name or symbol. But they still list Chromosome "X" in the alignment > field and have even assigned different accessions to this probeset. > So the short answer is that Affymetrix has changed their mind about > what they are claiming this probeset is measuring. > > > I hope this helps you, > > > Marc > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi, It is actually a bit simpler than Mark has suggested. biocLite("rat2302probe") will get the probe sequences used (at least as reported in late March - but they should not change) then you could ask Herve to build a BSgenome package for Rat, and use Biostrings to do the matching... or save the probes and use BLAT or any other string matcher (MAQ) best wishes Robert Mark Cowley wrote: > Hi Christoph, > I would recommend obtaining the sequences of the actual probes that make > up this probeset (from NetAffx), then align them to the latest genome > using BLAT, thereby you can convince yourself which mRNA that these > probes will be most likely to detect. > I find that aligning the probes often tells you far more information > than the affymetrix consensus sequence ever wi > Be very concerned if your probes start aligning all over the genome!ll. > > cheers, > Mark > > On 03/07/2008, at 3:47 AM, Marc Carlson wrote: > >> Christoph Preuss wrote: >>> Hi everyone, >>> >>> We analyzed a global exression microarray data set using gcrma for the >>> normalization step and limma for finding differentially expressed >>> genes. One of the most significant probesets (ProbeSetID annotation >>> "1375535_at") in terms of d.e is annotated as : >>> Probeset "1375535_at" >>> -Gene Symbol: Lpin1 >>> - Location: Chr 6 >>> >>> in the bioconductor package "rat2302" / "rat2302.db". >>> >>> We also looked at the Affymetrix web site, where the same probeset was >>> annoted as "Transcribed sequence" on chromosome X. >>> >>> Affymetrix Annotation RG 230 2.0 Chip: >>> -ProbeSetID: 1375535_at >>> -Target Sequence: >>> >>>> RAT230_2:1375535_AT >>>> >>> gaagttagagagctgtttccccactttacattttaaaatatgtatgccaggatntaatca >>> ttcctttaagtgtacacttcaaggagagatgtgccgaataagaaaatagctttctctagc >>> gtgaagggttttgcgtccgccgagttcttaaggtcttttttaagagctactgtgtatgag >>> tgtgtgtatgtgtgcgcatgcatgttcctgcgactagtcattcattcacatggtgatcag >>> acaacaatgggagctggttcgtctaccttatcttgtgggtcctggagttcaatctcagat >>> catcaggctgggcagcaagtgccttcaccctccgagccatcttgccatcccacagctgag >>> cgtctaatatgacattgccgatga >>> >>> Interestingly, the given target sequence for the probeset matches only >>> a mouse sequence and not even a rat mRNA (blastn search). >>> >>> The question is which annotation should we trust? >>> Is there any chance to validate the probeset annotation? >>> Many thanks in advance for any help. >>> >>> cheers, >>> >>> Christoph Preuss >>> >>> (Leibniz-Institute for Arteriosclerosis Research, University of >>> Muenster Germany ) >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> Hi Christoph, >> >> I can only really speak for the Bioconductor annotations which are >> generated from public sources along with an initial mapping of the >> probe or probeset to a public accession (usually this is a Genbank, >> Entrez ID or a related type of ID). In the case of "1375535_at", the >> probeset is an Affymetrix probeset and so we are ultimately at the >> mercy of Affymetrix to accurately tell us what this probeset is in >> this initial mapping, but after this we do the rest ourselves by using >> public sources. We map the probeset to ID information onto additional >> information gathered from public sources (primarily NCBI) to get the >> rest of the information in the package. The file that you get from >> Affymetrix may also have a lot of the same data as our packages, but >> unless they describe it somewhere, I don't think we actually know for >> certain where they collected all of their information from. The only >> information that we ever actually take from them is the initial >> mapping of their probeset onto a public accession. >> >> I dug up the latest Affymetrix mapping files that we used to generate >> this package and investigated. From the file that I have (which was >> collected in late March) the probeset you listed is indicated to be >> Lpin1, and also to be located on Chromosome 6 which agrees completely >> with the information that we gathered from NCBI and GoldenPath from >> this time. As of this morning, NCBI still lists this gene as being >> Lipin1 and being located on Chromosome 6. However, there is also a >> field right next to that in the Affymetrix file that is called >> "Alignments" which lists the X chromosome. But when I pull up an even >> more recent file from Affymetrix, then I see that they no longer list >> the location of this gene and have now replaced that value with a >> "---", they also no longer list the genes name or symbol. But they >> still list Chromosome "X" in the alignment field and have even >> assigned different accessions to this probeset. >> So the short answer is that Affymetrix has changed their mind about >> what they are claiming this probeset is measuring. >> >> >> I hope this helps you, >> >> >> Marc >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States
Hi Christoph, Christoph Preuss wrote: > Hi everyone, > > We analyzed a global exression microarray data set using gcrma for the > normalization step and limma for finding differentially expressed > genes. One of the most significant probesets (ProbeSetID annotation > "1375535_at") in terms of d.e is annotated as : > Probeset "1375535_at" > -Gene Symbol: Lpin1 > - Location: Chr 6 > > in the bioconductor package "rat2302" / "rat2302.db". > > We also looked at the Affymetrix web site, where the same probeset was > annoted as "Transcribed sequence" on chromosome X. > > Affymetrix Annotation RG 230 2.0 Chip: > -ProbeSetID: 1375535_at > -Target Sequence: > >> RAT230_2:1375535_AT >> > gaagttagagagctgtttccccactttacattttaaaatatgtatgccaggatntaatca > ttcctttaagtgtacacttcaaggagagatgtgccgaataagaaaatagctttctctagc > gtgaagggttttgcgtccgccgagttcttaaggtcttttttaagagctactgtgtatgag > tgtgtgtatgtgtgcgcatgcatgttcctgcgactagtcattcattcacatggtgatcag > acaacaatgggagctggttcgtctaccttatcttgtgggtcctggagttcaatctcagat > catcaggctgggcagcaagtgccttcaccctccgagccatcttgccatcccacagctgag > cgtctaatatgacattgccgatga > > Interestingly, the given target sequence for the probeset matches only > a mouse sequence and not even a rat mRNA (blastn search). > Using blat, I get a near100% hit for LOC680227 on the X chromosome: http://genome.brc.mcw.edu/cgi-bin/hgTracks?hgsid=1313763&hgt.out3=10x& position=chrX%3A91440116-91478515&hgtgroup_map_close=0&hgtgroup_phenDi s_close=0&hgtgroup_genes_close=0&hgtgroup_rna_close=0&hgtgroup_regulat ion_close=0&hgtgroup_compGeno_close=0&hgtgroup_varRep_close=0 Note that we are just packaging existing annotation data in an easy to use format (using Affy's own probeset ID - Entrez Gene mapping or Probeset ID - UniGene mapping if EG isn't present). And it is not uncommon for different annotation databases to disagree. Since we are not purveyors of annotation data, but instead are just passing on existing data, it is always in your best interest to check for consistency. Best, Jim > The question is which annotation should we trust? > Is there any chance to validate the probeset annotation? > Many thanks in advance for any help. > > cheers, > > Christoph Preuss > > (Leibniz-Institute for Arteriosclerosis Research, University of > Muenster Germany ) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6