Hi all,
we found that there are about 8-9% of all probesets in the hgu133a
package for
which no information on chromosomal location (i.e. base pairs from the
telomere) is available.
However, other public databases like Golden Path offer start and end
positions
for each probeset on the array. What is the reason for this
discrepancy ?
What are the paths AnnBuilder uses in order to map probeset-IDs to
chromosomal
locations ?
Would it be safe to use chromosomal locations obtained from other
sources
directly instead of relying on the hgu133a package ?
Regards,
Hilmar
AnnBuilder does use Golden Path for chromosomal locations of genes but
please
remember that the annotation packages were built a few months ago and
it is thus
not unusual for people to find discrepancies. One solution is to build
the
annotation package yourself using AnnBuilder.
>X-Original-To: jzhang at jimmy.harvard.edu
>Delivered-To: jzhang at jimmy.harvard.edu
>X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on
hypatia.math.ethz.ch
>X-Spam-Level:
>X-Spam-Status: No, score=-0.1 required=5.0 tests=AWL, BAYES_50,
SPF_HELO_PASS
autolearn=no version=3.1.1
>X-Injected-Via-Gmane: http://gmane.org/
>To: bioconductor at stat.math.ethz.ch
>From: Hilmar Berger <hilmar.berger at="" imise.uni-leipzig.de="">
>Date: Thu, 13 Jul 2006 11:54:45 +0000 (UTC)
>Mime-Version: 1.0
>X-Complaints-To: usenet at sea.gmane.org
>X-Gmane-NNTP-Posting-Host: main.gmane.org
>User-Agent: Loom/3.14 (http://gmane.org/)
>X-Loom-IP: 139.18.158.245 (Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.0))
>X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch
>Subject: [BioC] Missing chromosomal locations in hgu133a package
>X-BeenThere: bioconductor at stat.math.ethz.ch
>X-Mailman-Version: 2.1.8
>List-Id: The Bioconductor Project Mailing List
<bioconductor.stat.math.ethz.ch>
>List-Unsubscribe:
<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">,
<mailto:bioconductor-request at="" stat.math.ethz.ch?subject="unsubscribe">
>List-Archive: <https: stat.ethz.ch="" pipermail="" bioconductor="">
>List-Post: <mailto:bioconductor at="" stat.math.ethz.ch="">
>List-Help: <mailto:bioconductor-request at="" stat.math.ethz.ch?subject="help">
>List-Subscribe: <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">,
<mailto:bioconductor-request at="" stat.math.ethz.ch?subject="subscribe">
>Content-Transfer-Encoding: 7bit
>X-PMX-Version: 5.2.0.266434, Antispam-Engine: 2.4.0.264935, Antispam-
Data:
2006.7.13.45432
>X-PMX-Spam: Probability=7%, Report='__CP_URI_IN_BODY 0, __CT 0, __CTE
0,
__CTYPE_CHARSET_QUOTED 0, __CT_TEXT_PLAIN 0, __HAS_MSGID 0,
__MIME_TEXT_ONLY 0,
__MIME_VERSION 0, __SANE_MSGID 0, __STOCK_CRUFT 0, __USER_AGENT 0'
>
>Hi all,
>
>we found that there are about 8-9% of all probesets in the hgu133a
package for
>which no information on chromosomal location (i.e. base pairs from
the
>telomere) is available.
>However, other public databases like Golden Path offer start and end
positions
>for each probeset on the array. What is the reason for this
discrepancy ?
>
>What are the paths AnnBuilder uses in order to map probeset-IDs to
chromosomal
>locations ?
>
>Would it be safe to use chromosomal locations obtained from other
sources
>directly instead of relying on the hgu133a package ?
>
>Regards,
>Hilmar
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
Hi Jianhua,
thanks for Your answer. I'm aware of the fact I could create
annotation
packages on my own using AnnBuilder. However, the comparison I
mentioned
was of GoldenPath data and annotation packages of about the same date
(mid 2005), and even current annotation packages show this discrepancy
in the number of chromosomal locations provided in hgu133a. So I
wondered if there is some good reason to not provide chromosomal
locations for some probesets (e.g. unreliable mappings in GoldenPath)
or
if this is because of special paths AnnBuilder uses to merge all
sources.
Regards,
Hilmar
>thanks for Your answer. I'm aware of the fact I could create
annotation
>packages on my own using AnnBuilder. However, the comparison I
mentioned
>was of GoldenPath data and annotation packages of about the same date
>(mid 2005), and even current annotation packages show this
discrepancy
>in the number of chromosomal locations provided in hgu133a. So I
>wondered if there is some good reason to not provide chromosomal
>locations for some probesets (e.g. unreliable mappings in GoldenPath)
or
>if this is because of special paths AnnBuilder uses to merge all
sources.
AnnBuilder does not make any decision on who gets mapped and who does
not. I
will have to step through the mapping process to tell you what might
be the
problem. Could you give me a few of the probes that are missed again?
Thanks.
>
>Regards,
>Hilmar
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
>>>thanks for Your answer. I'm aware of the fact I could create
annotation
>>>packages on my own using AnnBuilder. However, the comparison I
mentioned
>>>was of GoldenPath data and annotation packages of about the same
date
>>>(mid 2005), and even current annotation packages show this
discrepancy
>>>in the number of chromosomal locations provided in hgu133a. So I
>>>wondered if there is some good reason to not provide chromosomal
>>>locations for some probesets (e.g. unreliable mappings in
GoldenPath) or
>>>if this is because of special paths AnnBuilder uses to merge all
sources.
>>
There may be two reasons.
1. AnnBuilder uses RefLink.txt and RefGene.txt from Golden Path for
annotation.
Location data were missing from the file for some of the
genes/probes(ie.
8847/216856_s_at) in a test run. Need to talk to Golden Path about
htis.
2. The base file used to build the annotation package may not be in
sync with
the current probe-GB mapping provided by AffyMetrix. Nianhua, do you
know if a
fresh download from AffyMetrix was used for release 1.8?
>>
>> AnnBuilder does not make any decision on who gets mapped and who
does not. I
>> will have to step through the mapping process to tell you what
might be the
>> problem. Could you give me a few of the probes that are missed
again?
>>
>> Thanks.
>>
>>
>>
>>>Regards,
>>>Hilmar
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>Search the archives:
>>
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> Jianhua Zhang
>> Department of Medical Oncology
>> Dana-Farber Cancer Institute
>> 44 Binney Street
>> Boston, MA 02115-6084
>>
>
>--
>
>Hilmar Berger
>Studienkoordinator
>Institut f?r medizinische Informatik, Statistik und Epidemiologie
>Universit?t Leipzig
>H?rtelstr. 16-18
>D-04107 Leipzig
>
>Tel. +49 341 97 16 101
>Fax. +49 341 97 16 109
>email: hilmar.berger at imise.uni-leipzig.de
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
> 2. The base file used to build the annotation package may not be in
sync with
> the current probe-GB mapping provided by AffyMetrix. Nianhua, do you
know if a
> fresh download from AffyMetrix was used for release 1.8?
>
The probe-GB mapping from Affymetrix that was used for release 1.8 was
dated
Dec. 19, 2005. Six files were used to get unified probe to Entrez Gene
mapping:
1. probe-GB mapping from Affymetrix
2. probe-UniGene mapping from Affymetrix
3. probe-Entrez Gene mapping from Affymetrix
4. probe-Entrez Gene mapping from DCHIP
5. probe-Entrez Gene mapping from U. Michigan
6. probe-Entrez Gene mapping from another source (EA)
1 is the primary source, and 2-6 are supplimental sources. 1-3 are all
dated Dec
19, 2005. 4-6 are pretty old files (>1 years old).
thanks
nianhua