Filter out pombe probeset from cerevisiae probesets for yeast2 Affymetrix chip
1
0
Entering edit mode
Guiyuan Lei ▴ 90
@guiyuan-lei-2506
Last seen 9.6 years ago
Hi Jim, Thanks for suggestion. In order to get the gene names/symbols for cerevisiae probesets as much as possible, I donwloaded Yeast2 annotation from Affymetrix http://www.affymetrix.com/Auth/analysis/downloads/na24/ivt/Yeast_2.na2 4.annot.csv.zip Firstly, I found that Bioconductor have got more cerevisiae probesets named than what Affymetrix has. In Yeast2GENENAME (from Bioconductor), 4640 probesets out of 5900 probesets (after filter out 5028 pombe probesets which are in mask file s_cerevisiae.msk) have gene names while there are only 4557 probesets out of 5900 probesets (also after filter out 5028 pombe probesets which are labeled as "pombe" specie in Yeast_2.na24.annot.csv ) have gene symbols in Yeast_2.na24.annot.csv. The Yeast_2.na24.annot.csv I used is the latest file which was updated in November 2007. How could the Affymetrix have less information than third party (like Bioconductor)? Secondly, I found that the s_pombe.zip file from the following Affy web does NOT consist with its own annotation file (Yeast_2.na24.annot.csv mentioned above) http://www.affymetrix.com/Auth/support/downloads/mask_files/s_pombe.zi p There are 5814 probesets are labeled as "cerevisiae" in Yeast_2.na24.annot.csv, so I suppose there are at least 5814 probesets in s_pombe.msk in order to mask cerevisiae probesets, but there are only 5749 probesets in s_pombe.msk. In addtion, the probeset "177968_at" is not in the whole 10928 probesets of Yeast2 chip but is in s_pombe.msk!!! Best regards, Guiyuan On Nov 29, 2007 4:21 PM, James W. MacDonald <jmacdon at="" med.umich.edu=""> wrote: > Hi Guiyuan, > > Guiyuan Lei wrote: > > Hi Jim, > > > > Many thanks. I have checked the s_pombe.msk and s_cerevisiae.msk > > files, the overlap between pombe and cerevisiae are probesets which > > with prefix "AFFX" and "RPTR". One strange thing is that one probeset > > called "177968_at" is in s_pombe.msk but is NOT among the whole 10928 > > probesets! So the overlap are 152 probesets. > > > > I got one more question, for the Yeast2GENENAME, many probesets only > > have ID, but no genename (is "NA"), is it possible to get gene > > name/symbol for all 10928 probesets? > > You might check either netaffx or biomaRt, but if there are no gene > names for certain probesets in the annotation package that usually > indicates that the probesets in question interrogate things that have > yet to be named (e.g., ESTs, inferred genes, etc).
Annotation yeast2 biomaRt Annotation yeast2 biomaRt • 1.0k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.7 years ago
United States
Guiyuan Lei wrote: > Hi Jim, > > Thanks for suggestion. In order to get the gene names/symbols for > cerevisiae probesets as much as possible, I donwloaded Yeast2 > annotation from Affymetrix > http://www.affymetrix.com/Auth/analysis/downloads/na24/ivt/Yeast_2.n a24.annot.csv.zip > > Firstly, I found that Bioconductor have got more cerevisiae probesets > named than what Affymetrix has. In Yeast2GENENAME (from Bioconductor), > 4640 probesets out of 5900 probesets (after filter out 5028 pombe > probesets which are in mask file s_cerevisiae.msk) have gene names > while there are only 4557 probesets out of 5900 probesets (also after > filter out 5028 pombe probesets which are labeled as "pombe" specie in > Yeast_2.na24.annot.csv ) have gene symbols in Yeast_2.na24.annot.csv. > The Yeast_2.na24.annot.csv I used is the latest file which was updated > in November 2007. How could the Affymetrix have less information than > third party (like Bioconductor)? > > Secondly, I found that the s_pombe.zip file from the following Affy > web does NOT consist with its own annotation file > (Yeast_2.na24.annot.csv mentioned above) > http://www.affymetrix.com/Auth/support/downloads/mask_files/s_pombe. zip > There are 5814 probesets are labeled as "cerevisiae" in > Yeast_2.na24.annot.csv, so I suppose there are at least 5814 probesets > in s_pombe.msk in order to mask cerevisiae probesets, but there are > only 5749 probesets in s_pombe.msk. In addtion, the probeset > "177968_at" is not in the whole 10928 probesets of Yeast2 chip but is > in s_pombe.msk!!! > > Best regards, > Guiyuan > > > On Nov 29, 2007 4:21 PM, James W. MacDonald <jmacdon at="" med.umich.edu=""> wrote: > >> Hi Guiyuan, >> >> Guiyuan Lei wrote: >> >>> Hi Jim, >>> >>> Many thanks. I have checked the s_pombe.msk and s_cerevisiae.msk >>> files, the overlap between pombe and cerevisiae are probesets which >>> with prefix "AFFX" and "RPTR". One strange thing is that one probeset >>> called "177968_at" is in s_pombe.msk but is NOT among the whole 10928 >>> probesets! So the overlap are 152 probesets. >>> >>> I got one more question, for the Yeast2GENENAME, many probesets only >>> have ID, but no genename (is "NA"), is it possible to get gene >>> name/symbol for all 10928 probesets? >>> >> You might check either netaffx or biomaRt, but if there are no gene >> names for certain probesets in the annotation package that usually >> indicates that the probesets in question interrogate things that have >> yet to be named (e.g., ESTs, inferred genes, etc). >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Hi guys, It is definitely possible for our annotations to have more information about a particular field (like a gene symbol) than Affymetrix. This is because we don't just repackage our annotation information directly from Affymetrix. Instead we gather ID assignments ONLY from Affymetrix. These would be things like Entrez Gene IDs, Genebank Accessions etc. Our annotation pipeline collects one appropriate gene based ID for each probeset from Affymetrix and this is meant to be basic information ONLY about precisely what gene a particular probe is designed to measure. This minimal information is the only piece of data that we gather from the Affymetrix annotation files. Then we take that ID information to other repositories like NCBI and use these to get information about related stuff like gene symbols. I can't tell you what exact process Affymetrix uses to create their annotations but given the large number of reasonable choices they could make, it seems pretty likely that they do something that is slightly different from what we do. Marc
ADD COMMENT
0
Entering edit mode
Hi, Thank Marc to point out the possibility that Bioconductor annotations to have more information about a particular field (like a gene symbol) than Affymetrix. As for the mask file, the Affymetrix support said that "Only the csv annotation file is updated whereas the mask files are not. Therefore, I would suggest you to create your own mask files that will allow you to see only the data related to the strain of interest. Please see the GCOS manual page 665 for details in masks, the mask is a file containing the list of probesetIDs of interest in a txt format: http://www.affymetrix.com/Auth/support/downloads/manuals/gcos_manual.z ip .. The probesetID can be downloaded using the annotation file http://www.affymetrix.com/Auth/analysis/downloads/na24/ivt/Yeast_2.na2 4.annot.csv.zip .. " Cheers Guiyuan
ADD REPLY

Login before adding your answer.

Traffic: 972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6