hs133phsentrezg metadata

0

Entering edit mode

De Bondt, An-7114 [PRDBE] ▴ 190

@de-bondt-an-7114-prdbe-1572

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061012/ b9681ae8/attachment.pl

• 1.2k views

ADD COMMENT • link updated 17.5 years ago by Nianhua Li ▴ 870 • written 17.6 years ago by De Bondt, An-7114 [PRDBE] ▴ 190

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

De Bondt, An-7114 [PRDBE] wrote: > Dear useR, > > The 'hs133phsentrezg' metadata have only 'hs133phsentrezgGENENAME' mapping > info. The 'hgu133plus2' metadata has also 'hgu133plus2CHRLOC' info (besides > lots of other info). How can I find 'hs133phsentrezgCHRLOC' info? I hadn't realized how sparse the information in these annotation packages really is. I think your best bet is to use biomaRt to get the annotation you want. Something like > mart <- useMart("ensembl","hsapiens_gene_ensembl") Checking attributes and filters ... ok > a <- getBM("chromosome_location", "entrezgene", sub("_at", "", ls(hs133phsentrezgGENENAME)[1:10]), mart=mart, output="list") > sapply(a[[1]], length) 1 10 100 1000 10000 10001 10002 10003 10004 10005 62 157 233 1457 907 105 92 371 80 123 > a[[1]][[1]] [1] 63544227 63545175 63546378 63546412 63547557 [6] 63547599 63547672 63548610 63548624 63548943 [11] 63549372 63549373 63549374 63549543 63550044 [16] 63550148 63556679 63556692 63556702 63556866 [21] 63556880 63556894 63556903 63557399 63557422 [26] 63557669 63558246 63558375 63559327 63559846 [31] 63560064 63560292 63560992 63561327 63561328 [36] 63561647 63561650 63550747 63553556 63556291 [41] 63556303 63550430 63550488 63550864 63550878 [46] 63551634 63552081 63552199 63552253 63552624 [51] 63552827 63553507 63554072 63554973 63554974 [56] 63554975 63554981 63554982 63554984 63555253 [61] 63555261 63555962 Should do the trick. HTH, Jim > > Thanks in advance, > An De Bondt > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.6 years ago James W. MacDonald 65k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

Hi An, You should not respond just to me. The goal is to keep these conversations on the list so others can benefit as well. De Bondt, An-7114 [PRDBE] wrote: > Dear Jim, > > Indeed, this is the info I was looking for, thanks! > Could you also give me guidance on how I can get this CHRLOC info into a > metadata package like e.g. hs133phsentrezg? I guess I would have to create > a .CDF file first but I do not know how this file needs to be set up... > Probably a tab delimited file with and as many rows as gene identifiers on > the chip and with the following columns: > gene identifiers on the chip > gene name > chromosome > chromosome_start of the identifier on the chip > chromosome_end of the identifier on the chip > > Is this right or should I post this on the mailing list? Well, trying to reverse-engineer a metaData package is probably more trouble than it is worth. Why exactly do you need this data to be in a package? The rationale for the metaData packages is to supply end users with a single package that has a relatively simple interface to the data, but once you have the data in your working environment, it is there for you to use. Anyway, if you really want the data in an annotation package, you can use AnnBuilder to make one yourself. There are a couple of vignettes in that package that show how to do things, and if you have problems, there are plenty of threads on the list that you can search for common answers. I guess the only compelling reason I can think one might want a package is if the goal is to use annaffy to output annotated tables with your data. Is this the case? If so, you can do the same sort of thing using biomaRt and htmlpage() in the annotate package. There is a vignette in biomaRt that shows how to do that. I have also written some functions for affycoretools that automate the process, but they currently don't include the chromosomal location, mainly because I don't find that information very useful for say, an HTML table. However, if there is interest, I am willing to add that capability. Best, Jim > > Thanks, > An > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at med.umich.edu] > Sent: Thursday, 12 October 2006 17:04 > To: De Bondt, An-7114 [PRDBE] > Cc: 'bioconductor at stat.math.ethz.ch' > Subject: Re: [BioC] hs133phsentrezg metadata > > > De Bondt, An-7114 [PRDBE] wrote: > >>Dear useR, >> >>The 'hs133phsentrezg' metadata have only 'hs133phsentrezgGENENAME' mapping >>info. The 'hgu133plus2' metadata has also 'hgu133plus2CHRLOC' info > > (besides > >>lots of other info). How can I find 'hs133phsentrezgCHRLOC' info? > > > I hadn't realized how sparse the information in these annotation > packages really is. I think your best bet is to use biomaRt to get the > annotation you want. > > Something like > > > mart <- useMart("ensembl","hsapiens_gene_ensembl") > Checking attributes and filters ... ok > > a <- getBM("chromosome_location", "entrezgene", sub("_at", "", > ls(hs133phsentrezgGENENAME)[1:10]), mart=mart, output="list") > > sapply(a[[1]], length) > 1 10 100 1000 10000 10001 10002 10003 10004 10005 > 62 157 233 1457 907 105 92 371 80 123 > > a[[1]][[1]] > [1] 63544227 63545175 63546378 63546412 63547557 > [6] 63547599 63547672 63548610 63548624 63548943 > [11] 63549372 63549373 63549374 63549543 63550044 > [16] 63550148 63556679 63556692 63556702 63556866 > [21] 63556880 63556894 63556903 63557399 63557422 > [26] 63557669 63558246 63558375 63559327 63559846 > [31] 63560064 63560292 63560992 63561327 63561328 > [36] 63561647 63561650 63550747 63553556 63556291 > [41] 63556303 63550430 63550488 63550864 63550878 > [46] 63551634 63552081 63552199 63552253 63552624 > [51] 63552827 63553507 63554072 63554973 63554974 > [56] 63554975 63554981 63554982 63554984 63555253 > [61] 63555261 63555962 > > Should do the trick. > > HTH, > > Jim > > > >>Thanks in advance, >>An De Bondt >> >> >> >> [[alternative HTML version deleted]] >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.5 years ago James W. MacDonald 65k

0

Entering edit mode

De Bondt, An-7114 [PRDBE] ▴ 190

@de-bondt-an-7114-prdbe-1572

Last seen 9.6 years ago

Dear Jim, The need for the info to be in a package is to use the buildMACAT() from the macat library. This function needs as input a data file, the associated biological information per sample and the chip data package. Thanks for your suggestions, An -----Original Message----- From: James W. MacDonald [mailto:jmacdon@med.umich.edu] Sent: Friday, 13 October 2006 15:36 To: De Bondt, An-7114 [PRDBE]; BioConductor_list Subject: Re: [BioC] hs133phsentrezg metadata Hi An, You should not respond just to me. The goal is to keep these conversations on the list so others can benefit as well. De Bondt, An-7114 [PRDBE] wrote: > Dear Jim, > > Indeed, this is the info I was looking for, thanks! > Could you also give me guidance on how I can get this CHRLOC info into a > metadata package like e.g. hs133phsentrezg? I guess I would have to create > a .CDF file first but I do not know how this file needs to be set up... > Probably a tab delimited file with and as many rows as gene identifiers on > the chip and with the following columns: > gene identifiers on the chip > gene name > chromosome > chromosome_start of the identifier on the chip > chromosome_end of the identifier on the chip > > Is this right or should I post this on the mailing list? Well, trying to reverse-engineer a metaData package is probably more trouble than it is worth. Why exactly do you need this data to be in a package? The rationale for the metaData packages is to supply end users with a single package that has a relatively simple interface to the data, but once you have the data in your working environment, it is there for you to use. Anyway, if you really want the data in an annotation package, you can use AnnBuilder to make one yourself. There are a couple of vignettes in that package that show how to do things, and if you have problems, there are plenty of threads on the list that you can search for common answers. I guess the only compelling reason I can think one might want a package is if the goal is to use annaffy to output annotated tables with your data. Is this the case? If so, you can do the same sort of thing using biomaRt and htmlpage() in the annotate package. There is a vignette in biomaRt that shows how to do that. I have also written some functions for affycoretools that automate the process, but they currently don't include the chromosomal location, mainly because I don't find that information very useful for say, an HTML table. However, if there is interest, I am willing to add that capability. Best, Jim > > Thanks, > An > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at med.umich.edu] > Sent: Thursday, 12 October 2006 17:04 > To: De Bondt, An-7114 [PRDBE] > Cc: 'bioconductor at stat.math.ethz.ch' > Subject: Re: [BioC] hs133phsentrezg metadata > > > De Bondt, An-7114 [PRDBE] wrote: > >>Dear useR, >> >>The 'hs133phsentrezg' metadata have only 'hs133phsentrezgGENENAME' mapping >>info. The 'hgu133plus2' metadata has also 'hgu133plus2CHRLOC' info > > (besides > >>lots of other info). How can I find 'hs133phsentrezgCHRLOC' info? > > > I hadn't realized how sparse the information in these annotation > packages really is. I think your best bet is to use biomaRt to get the > annotation you want. > > Something like > > > mart <- useMart("ensembl","hsapiens_gene_ensembl") > Checking attributes and filters ... ok > > a <- getBM("chromosome_location", "entrezgene", sub("_at", "", > ls(hs133phsentrezgGENENAME)[1:10]), mart=mart, output="list") > > sapply(a[[1]], length) > 1 10 100 1000 10000 10001 10002 10003 10004 10005 > 62 157 233 1457 907 105 92 371 80 123 > > a[[1]][[1]] > [1] 63544227 63545175 63546378 63546412 63547557 > [6] 63547599 63547672 63548610 63548624 63548943 > [11] 63549372 63549373 63549374 63549543 63550044 > [16] 63550148 63556679 63556692 63556702 63556866 > [21] 63556880 63556894 63556903 63557399 63557422 > [26] 63557669 63558246 63558375 63559327 63559846 > [31] 63560064 63560292 63560992 63561327 63561328 > [36] 63561647 63561650 63550747 63553556 63556291 > [41] 63556303 63550430 63550488 63550864 63550878 > [46] 63551634 63552081 63552199 63552253 63552624 > [51] 63552827 63553507 63554072 63554973 63554974 > [56] 63554975 63554981 63554982 63554984 63555253 > [61] 63555261 63555962 > > Should do the trick. > > HTH, > > Jim > > > >>Thanks in advance, >>An De Bondt >> >> >> >> [[alternative HTML version deleted]] >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.5 years ago De Bondt, An-7114 [PRDBE] ▴ 190

0

Entering edit mode

Manhong Dai ▴ 200

@manhong-dai-1910

Last seen 9.6 years ago

Hi An, Our custom CDF annotation package has only gene name for each probeset because we designed it this way. A probeset's probes could have matches on different location or chromosomes, even some probes have no match on genome at all, but they belong to this probeset because they all have perfect match on the gene's sequence. So it is difficult to assign a single genome location to the probeset. But we do have Map/Group files for probe's genome location. It would show that most probesets' probes have adjacent genome location, but some don't. Those files are at http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF _download_v8.asp If you are using version 8 of custom cdf. To get more detail, please google 'custom cdf' or just drop me a message. Best, Manhong Dai > Message: 5 > Date: Fri, 13 Oct 2006 09:36:20 -0400 > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > Subject: Re: [BioC] hs133phsentrezg metadata > To: "De Bondt, An-7114 [PRDBE]" <adbondt at="" prdbe.jnj.com="">, > BioConductor_list <bioconductor at="" stat.math.ethz.ch=""> > Message-ID: <452F9654.6000902 at med.umich.edu> > Content-Type: text/plain; charset="utf-8"; format=flowed > > Hi An, > > You should not respond just to me. The goal is to keep these > conversations on the list so others can benefit as well. > > De Bondt, An-7114 [PRDBE] wrote: > > Dear Jim, > > > > Indeed, this is the info I was looking for, thanks! > > Could you also give me guidance on how I can get this CHRLOC info into a > > metadata package like e.g. hs133phsentrezg? I guess I would have to create > > a .CDF file first but I do not know how this file needs to be set up... > > Probably a tab delimited file with and as many rows as gene identifiers on > > the chip and with the following columns: > > gene identifiers on the chip > > gene name > > chromosome > > chromosome_start of the identifier on the chip > > chromosome_end of the identifier on the chip > > > > Is this right or should I post this on the mailing list? > > Well, trying to reverse-engineer a metaData package is probably more > trouble than it is worth. Why exactly do you need this data to be in a > package? The rationale for the metaData packages is to supply end users > with a single package that has a relatively simple interface to the > data, but once you have the data in your working environment, it is > there for you to use. > > Anyway, if you really want the data in an annotation package, you can > use AnnBuilder to make one yourself. There are a couple of vignettes in > that package that show how to do things, and if you have problems, there > are plenty of threads on the list that you can search for common answers. > > I guess the only compelling reason I can think one might want a package > is if the goal is to use annaffy to output annotated tables with your > data. Is this the case? If so, you can do the same sort of thing using > biomaRt and htmlpage() in the annotate package. There is a vignette in > biomaRt that shows how to do that. I have also written some functions > for affycoretools that automate the process, but they currently don't > include the chromosomal location, mainly because I don't find that > information very useful for say, an HTML table. However, if there is > interest, I am willing to add that capability. > > Best, > > Jim > > > > > > Thanks, > > An > > > > > > -----Original Message----- > > From: James W. MacDonald [mailto:jmacdon at med.umich.edu] > > Sent: Thursday, 12 October 2006 17:04 > > To: De Bondt, An-7114 [PRDBE] > > Cc: 'bioconductor at stat.math.ethz.ch' > > Subject: Re: [BioC] hs133phsentrezg metadata > > > > > > De Bondt, An-7114 [PRDBE] wrote: > > > >>Dear useR, > >> > >>The 'hs133phsentrezg' metadata have only 'hs133phsentrezgGENENAME' mapping > >>info. The 'hgu133plus2' metadata has also 'hgu133plus2CHRLOC' info > > > > (besides > > > >>lots of other info). How can I find 'hs133phsentrezgCHRLOC' info? > > > > > > I hadn't realized how sparse the information in these annotation > > packages really is. I think your best bet is to use biomaRt to get the > > annotation you want. > > > > Something like > > > > > mart <- useMart("ensembl","hsapiens_gene_ensembl") > > Checking attributes and filters ... ok > > > a <- getBM("chromosome_location", "entrezgene", sub("_at", "", > > ls(hs133phsentrezgGENENAME)[1:10]), mart=mart, output="list") > > > sapply(a[[1]], length) > > 1 10 100 1000 10000 10001 10002 10003 10004 10005 > > 62 157 233 1457 907 105 92 371 80 123 > > > a[[1]][[1]] > > [1] 63544227 63545175 63546378 63546412 63547557 > > [6] 63547599 63547672 63548610 63548624 63548943 > > [11] 63549372 63549373 63549374 63549543 63550044 > > [16] 63550148 63556679 63556692 63556702 63556866 > > [21] 63556880 63556894 63556903 63557399 63557422 > > [26] 63557669 63558246 63558375 63559327 63559846 > > [31] 63560064 63560292 63560992 63561327 63561328 > > [36] 63561647 63561650 63550747 63553556 63556291 > > [41] 63556303 63550430 63550488 63550864 63550878 > > [46] 63551634 63552081 63552199 63552253 63552624 > > [51] 63552827 63553507 63554072 63554973 63554974 > > [56] 63554975 63554981 63554982 63554984 63555253 > > [61] 63555261 63555962 > > > > Should do the trick. > > > > HTH, > > > > Jim > > > > > > > >>Thanks in advance, > >>An De Bondt > >> > >> > >> > >> [[alternative HTML version deleted]] > >> > >>_______________________________________________ > >>Bioconductor mailing list > >>Bioconductor at stat.math.ethz.ch > >>https://stat.ethz.ch/mailman/listinfo/bioconductor > >>Search the archives: > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > -- > James W. MacDonald, M.S. > Biostatistician > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.5 years ago Manhong Dai ▴ 200

0

Entering edit mode

Hi Manhong, Manhong Dai wrote: > Hi An, > > Our custom CDF annotation package has only gene name for each probeset > because we designed it this way. > > A probeset's probes could have matches on different location or > chromosomes, even some probes have no match on genome at all, but they > belong to this probeset because they all have perfect match on the > gene's sequence. This doesn't make sense to me. How can a probe not match to the genome, yet have a perfect match to a gene's sequence? I was also under the impression that the matching for the probes that remain in an MBNI cdf was first done to the genome, and those probes that didn't blast to the genome were discarded. From http://brainarray.mhri.med.umich.edu/Brainarray/Database/CustomCDF/cdf readme.htm I get: A probe must only hit one UniGene cluster and one genomic location A probe must hit only one genomic location Does this mean a probe that hits < 1 genomic location will be included? I assumed this meant a probe had to hit exactly one location. Best, Jim -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 17.5 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim, In our custom cdf, some hits<1 probes would be used. For example, when a probe has a hit with an allele of a snp, and the snp's another allele has hits=1 match with genome, although the probe has no hit with genome at all, we would use this probe and its genome location as a candidate for all custom CDFs, although the portion of this kind of probes is small. Our UG and ENTREZG custom CDF does have a rule that each probe must only hit one genome location and one UG cluster. But in REFSEQ custom cdf, when a probe has match to a REFSEQ sequence, but no match to genome at all. The probe would still be used because REFSEQ is more reliable than genome. For example, probe 4 of http://arrayanalysis.mbni.med.umich.edu/ps/ps_pb.jsp?p=NM_000019_at&c= Hs133P_Hs_REFSEQ_8 has no match to genome. Best, Manhong Dai On Tue, 2006-10-17 at 14:46 -0400, James W. MacDonald wrote: > Hi Manhong, > > Manhong Dai wrote: > > Hi An, > > > > Our custom CDF annotation package has only gene name for each probeset > > because we designed it this way. > > > > A probeset's probes could have matches on different location or > > chromosomes, even some probes have no match on genome at all, but they > > belong to this probeset because they all have perfect match on the > > gene's sequence. > > This doesn't make sense to me. How can a probe not match to the genome, > yet have a perfect match to a gene's sequence? > > I was also under the impression that the matching for the probes that > remain in an MBNI cdf was first done to the genome, and those probes > that didn't blast to the genome were discarded. From > > http://brainarray.mhri.med.umich.edu/Brainarray/Database/CustomCDF/c dfreadme.htm > > I get: > > A probe must only hit one UniGene cluster and one genomic location > > A probe must hit only one genomic location > > Does this mean a probe that hits < 1 genomic location will be included? > I assumed this meant a probe had to hit exactly one location. > > Best, > > Jim > > >

ADD REPLY • link 17.5 years ago Manhong Dai ▴ 200

0

Entering edit mode

Hi Manhong, OK, I understand that part. However, for most of the annotation data (including the chromosomal location), what is normally supplied is the information at the gene level, rather than the probe level. I guess one could argue that knowing where exactly the probesets are supposed to bind might be of interest, but the annotation packages are intended to annotate probesets to genes. While it is true that some of the probes might bind to different parts of the genome, this can be handled by supplying multiple locations. For instance, in the hgu133plus2 package we have: > get("1007_s_at", hgu133plus2CHRLOC) > get("1007_s_at", hgu133plus2CHRLOC) 6_qbl_hap2 6 6_cox_hap1 6_qbl_hap2 6_cox_hap1 2098794 30959839 2300465 2099260 2300931 6 6_cox_hap1 6 6_qbl_hap2 30960305 2305069 30964443 2103398 Best, Jim Manhong Dai wrote: > Hi Jim, > > In our custom cdf, some hits<1 probes would be used. For example, when > a probe has a hit with an allele of a snp, and the snp's another allele > has hits=1 match with genome, although the probe has no hit with genome > at all, we would use this probe and its genome location as a candidate > for all custom CDFs, although the portion of this kind of probes is > small. > > > Our UG and ENTREZG custom CDF does have a rule that each probe must > only hit one genome location and one UG cluster. > > > But in REFSEQ custom cdf, when a probe has match to a REFSEQ sequence, > but no match to genome at all. The probe would still be used because > REFSEQ is more reliable than genome. > > For example, probe 4 of > http://arrayanalysis.mbni.med.umich.edu/ps/ps_pb.jsp?p=NM_000019_at& c=Hs133P_Hs_REFSEQ_8 has no match to genome. > > > Best, > Manhong Dai > > > > On Tue, 2006-10-17 at 14:46 -0400, James W. MacDonald wrote: > >>Hi Manhong, >> >>Manhong Dai wrote: >> >>>Hi An, >>> >>> Our custom CDF annotation package has only gene name for each probeset >>>because we designed it this way. >>> >>> A probeset's probes could have matches on different location or >>>chromosomes, even some probes have no match on genome at all, but they >>>belong to this probeset because they all have perfect match on the >>>gene's sequence. >> >>This doesn't make sense to me. How can a probe not match to the genome, >>yet have a perfect match to a gene's sequence? >> >>I was also under the impression that the matching for the probes that >>remain in an MBNI cdf was first done to the genome, and those probes >>that didn't blast to the genome were discarded. From >> >>http://brainarray.mhri.med.umich.edu/Brainarray/Database/CustomCDF/c dfreadme.htm >> >>I get: >> >>A probe must only hit one UniGene cluster and one genomic location >> >>A probe must hit only one genomic location >> >>Does this mean a probe that hits < 1 genomic location will be included? >>I assumed this meant a probe had to hit exactly one location. >> >>Best, >> >>Jim >> >> >> -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 17.5 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim, It is always inaccurate to assign one or multiple chromosome locations to each probeset for either Affy's original cdf or our custom CDFs. We know a probeset's genome location should be calculated from its probes' genome location, and it should be a range or multiple ranges under a small percentage of situations. However, it is hard to design a universal criteria to partition probes' location, as it depends on how wide one defines a range should be. Probe '1007_s_at' exactly shows this problem. it seems the result from get("1007_s_at", hgu133plus2CHRLOC) only shows locations, instead of ranges. In addition, the location shows in the current annotation file is likely to be based on older version of genome assembly. One could argue that this result should be genome location of probes, instead of probeset, but this probeset has 16 probes, each of which has a genome location according to our alignment result http://arrayanalysis.mbni.med.umich.edu/ps/ps_pb.jsp?p=1007_s_at&c=Hs1 33P_AFFY_ORIGINAL So we could add CHRLOC to annotation package in our next release of custom CDF, but it would only indicate location of individual probes having genomic sequence match, instead of the genomic spanning of probesets. Best, Manhong Dai On Tue, 2006-10-17 at 16:31 -0400, James W. MacDonald wrote: > Hi Manhong, > > OK, I understand that part. However, for most of the annotation data > (including the chromosomal location), what is normally supplied is the > information at the gene level, rather than the probe level. I guess one > could argue that knowing where exactly the probesets are supposed to > bind might be of interest, but the annotation packages are intended to > annotate probesets to genes. > > While it is true that some of the probes might bind to different parts > of the genome, this can be handled by supplying multiple locations. For > instance, in the hgu133plus2 package we have: > > > get("1007_s_at", hgu133plus2CHRLOC) > > get("1007_s_at", hgu133plus2CHRLOC) > 6_qbl_hap2 6 6_cox_hap1 6_qbl_hap2 6_cox_hap1 > 2098794 30959839 2300465 2099260 2300931 > 6 6_cox_hap1 6 6_qbl_hap2 > 30960305 2305069 30964443 2103398 > > Best, > > Jim > > > Manhong Dai wrote: > > Hi Jim, > > > > In our custom cdf, some hits<1 probes would be used. For example, when > > a probe has a hit with an allele of a snp, and the snp's another allele > > has hits=1 match with genome, although the probe has no hit with genome > > at all, we would use this probe and its genome location as a candidate > > for all custom CDFs, although the portion of this kind of probes is > > small. > > > > > > Our UG and ENTREZG custom CDF does have a rule that each probe must > > only hit one genome location and one UG cluster. > > > > > > But in REFSEQ custom cdf, when a probe has match to a REFSEQ sequence, > > but no match to genome at all. The probe would still be used because > > REFSEQ is more reliable than genome. > > > > For example, probe 4 of > > http://arrayanalysis.mbni.med.umich.edu/ps/ps_pb.jsp?p=NM_000019_a t&c=Hs133P_Hs_REFSEQ_8 has no match to genome. > > > > > > Best, > > Manhong Dai > > > > > > > > On Tue, 2006-10-17 at 14:46 -0400, James W. MacDonald wrote: > > > >>Hi Manhong, > >> > >>Manhong Dai wrote: > >> > >>>Hi An, > >>> > >>> Our custom CDF annotation package has only gene name for each probeset > >>>because we designed it this way. > >>> > >>> A probeset's probes could have matches on different location or > >>>chromosomes, even some probes have no match on genome at all, but they > >>>belong to this probeset because they all have perfect match on the > >>>gene's sequence. > >> > >>This doesn't make sense to me. How can a probe not match to the genome, > >>yet have a perfect match to a gene's sequence? > >> > >>I was also under the impression that the matching for the probes that > >>remain in an MBNI cdf was first done to the genome, and those probes > >>that didn't blast to the genome were discarded. From > >> > >>http://brainarray.mhri.med.umich.edu/Brainarray/Database/CustomCDF /cdfreadme.htm > >> > >>I get: > >> > >>A probe must only hit one UniGene cluster and one genomic location > >> > >>A probe must hit only one genomic location > >> > >>Does this mean a probe that hits < 1 genomic location will be included? > >>I assumed this meant a probe had to hit exactly one location. > >> > >>Best, > >> > >>Jim > >> > >> > >> > >

ADD REPLY • link 17.5 years ago Manhong Dai ▴ 200

0

Entering edit mode

Nianhua Li ▴ 870

@nianhua-li-1606

Last seen 9.6 years ago

> We know a probeset's genome location should be calculated from its > probes' genome location, and it should be a range or multiple ranges > under a small percentage of situations. However, it is hard to design a > universal criteria to partition probes' location, as it depends on how > wide one defines a range should be. > > Probe '1007_s_at' exactly shows this problem. it seems the result from > get("1007_s_at", hgu133plus2CHRLOC) only shows locations, instead of > ranges. In addition, the location shows in the current annotation file > is likely to be based on older version of genome assembly. Hi, ManHong, Just to clarify that the basic assumption of hgu133plus2 is each probeset maps to one gene. Base on this assumption (even though it may not be true in reality), the annotations of the genes are used to annotate the probesets. So, get("1007_s_at", hgu133plus2CHRLOC) returns the *transcription start position* of Entrez Gene 780 (discoidin domain receptor family, member 1, DDR1), not the genome location of "1007_s_at". The information is obtained from UCSC Genome Browser this August. I think it is hg18. thanks nianhua

ADD COMMENT • link 17.5 years ago Nianhua Li ▴ 870

Login before adding your answer.