Question

Annotation Tools package

0

Entering edit mode

Yingfang Tian ▴ 20

@yingfang-tian-3615

Last seen 11.5 years ago

Dear Dr Kuhn: We are Yingfang Tian and Brad Ander from the University of California at Davis. We are working on the cross-platform analysis of 3 platforms: llumina Human Ref-8, Affymetrix U133 plus 2 array, and Affymetrix human Exon array. We are trying to use your annotationTools package in R amd are able to at least translate across probes from U133 arrays to Illumina, similar to the example you give in the BMC Bioinformatics paper. We were wondering how to import large or entire numbers of probesets into the myPS object? It may be a basic R command, but unfortunately we rely on commercial software for the majority of our analyses and have limited experience with R (hopefully that can change on both fronts). >From the paper, it seems that we can generate a list of Refseq IDs from the Affy Probesets and then use this list (set it as the myPS object) to pull out the Illumina Probe ID using the refseq column as the identifier column. We can export all these with a simple write command. Right now, we are thinking of bridging to/from the Exon arrays with the Unigene. I guess we will have to see how that works Again, we are having success when dealing with a few probesets, but there must be a way to get ALL probesets. Can you please help us with this? Possibly with the example command syntax? In the paper you mention mapping all the mouse probes across platforms, so you must have had to deal with this. We are likely wanting to try the cross species analysis in the near future as well, so learning how to get passed the limit of entering each probe/gene/etc manually will be a big help. Kind regards, Yingfang and Brad -- Yingfang Tian, PhD M.I.N.D. Institute University of California at Davis 2805 50th Street,Room 2434 Sacramento, CA 95817 Tel:916-703-0384 [[alternative HTML version deleted]]

probe annotationTools probe annotationTools • 2.1k views

ADD COMMENT • link updated 16.5 years ago by Brad Ander ▴ 20 • written 16.5 years ago by Yingfang Tian ▴ 20

score 0 · Answer 1 · 2009-08-10

Hi Yingfang, Once you have loaded your Affymetrix annotation into R (assume it is contained in an R object named 'annot') you could for instance select all probe sets by subsetting the data.frame so as to select the first column > allps<-annot[,1] I am not sure this answers your question tough. Could you please send some lines of code to help me understand what is going wrong? Best, Alexandre -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Yingfang Tian Sent: jeudi 6 ao?t 2009 19:23 To: bioconductor at stat.math.ethz.ch Cc: Brad Ander Subject: Re: [BioC] Annotation Tools package Dear Dr Kuhn: We are Yingfang Tian and Brad Ander from the University of California at Davis. We are working on the cross-platform analysis of 3 platforms: llumina Human Ref-8, Affymetrix U133 plus 2 array, and Affymetrix human Exon array. We are trying to use your annotationTools package in R amd are able to at least translate across probes from U133 arrays to Illumina, similar to the example you give in the BMC Bioinformatics paper. We were wondering how to import large or entire numbers of probesets into the ?myPS? object? It may be a basic R command, but unfortunately we rely on commercial software for the majority of our analyses and have limited experience with R (hopefully that can change on both fronts). >From the paper, it seems that we can generate a list of Refseq IDs from >the Affy Probesets and then use this list (set it as the ?myPS? object) to pull out the Illumina Probe ID using the refseq column as the identifier column. We can export all these with a simple write command. Right now, we are thinking of bridging to/from the Exon arrays with the Unigene. I guess we will have to see how that works Again, we are having success when dealing with a few probesets, but there must be a way to get ALL probesets. Can you please help us with this? Possibly with the example command syntax? In the paper you mention mapping all the mouse probes across platforms, so you must have had to deal with this. We are likely wanting to try the cross species analysis in the near future as well, so learning how to get passed the limit of entering each probe/gene/etc manually will be a big help. Kind regards, Yingfang and Brad -- Yingfang Tian, PhD M.I.N.D. Institute University of California at Davis 2805 50th Street,Room 2434 Sacramento, CA 95817 Tel:916-703-0384 [[alternative HTML version deleted]]

score 0 · Answer 2 · 2009-08-11

0

Entering edit mode

Brad Ander ▴ 20

@brad-ander-3620

Last seen 11.5 years ago

Thank you for the help, Alexandre. That command worked. We were trying to alter the contents of the myPS variable in the example file > myPS <- c("117_at", "1007_s_at") We tried just including as many probesets as possible in the parentheses, but it was just not working/practical. When we replaced it with your suggestion of >allPS<-annotation_HGU133Plus2[,1] we were able to get the full list of probesets from that column in the Affy annotation and convert to Entrez gene and then use that list to get the illumina IDs. FYI (or anybody's), we were using the following: Affy Probeset -> Entrez: annotationFile <- "HG-U133_Plus_2.na29.annot.csv" dataDirectory <- system.file("data", package = "annotationTools") annotation_HGU133Plus2 <- read.csv(paste(dataDirectory, annotationFile, + sep = "/"), colClasses = "character") allPS<-annotation_HGU133Plus2[,1] getANNOTATION(allPS, annotation_HGU133Plus2, diagnose = FALSE, identifierCol = 1, annotationCol = 19, noAnnotationSymbol = NA, noAnnotationProvidedSymbol = "---", sep = " /// ") entrez <- getANNOTATION(allps, annotation_HGU133Plus2, diagnose = FALSE, identifierCol = 1, annotationCol = 19) write.matrix(entrez, file = "humanentrez.csv", sep = " ") Entrez -> illumina: annotationFileIll <- "HumanRef-8_V3_0_R2_11282963_Ab.csv" dataDirectory <- system.file("data", package = "annotationTools") annotation_Illumina <- read.csv(paste(dataDirectory, annotationFileIll, + sep = "/"), colClasses = "character") getANNOTATION(entrez, annotation_Illumina, diagnose = FALSE, identifierCol = 9, annotationCol = 14, noAnnotationSymbol = NA, noAnnotationProvidedSymbol = "---", sep = " /// ") illuminaID <- getANNOTATION(entrez, annotation_Illumina, diagnose = FALSE, identifierCol = 9, annotationCol = 14) write.matrix(IlluminaID, file = "illuminaID.csv", sep = " ") It may not have been the most perfect use of the code but it seems to work (we are still learning). Thanks for your help. If there are any suggestions you feel are important, please let us know. Kind regards, Brad -- Brad Ander, PhD M.I.N.D. Institute University of California at Davis Room 2434 2805 50th Street Sacramento, CA ?95817 2009/8/10 Alexandre Kuhn <alexandre.kuhn at="" epfl.ch=""> > > Hi Yingfang, > Once you have loaded your Affymetrix annotation into R (assume it is > contained in an R object named 'annot') you could for instance select all > probe sets by subsetting the data.frame so as to select the first column > > > allps<-annot[,1] > > I am not sure this answers your question tough. Could you please send some > lines of code to help me understand what is going wrong? > > Best, Alexandre > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Yingfang Tian > Sent: jeudi 6 ao?t 2009 19:23 > To: bioconductor at stat.math.ethz.ch > Cc: Brad Ander > Subject: Re: [BioC] Annotation Tools package > > Dear Dr Kuhn: > > We are Yingfang Tian and Brad Ander from the University of California at > Davis. ?We are working on the cross-platform analysis of 3 platforms: > llumina Human Ref-8, ?Affymetrix U133 plus 2 array, and Affymetrix human > Exon array. > > We are trying to use your annotationTools package in R amd are able to at > least translate across probes from U133 arrays to Illumina, similar to the > example you give in the BMC Bioinformatics paper. ? We were wondering how to > import large or entire numbers of probesets into the ?myPS? object? ?It may > be a basic R command, but unfortunately we rely on commercial software for > the majority of our analyses and have limited experience with R (hopefully > that can change on both fronts). > > >From the paper, it seems that we can generate a list of Refseq IDs from > >the > Affy Probesets and then use this list (set it as the ?myPS? object) to pull > out the Illumina Probe ID using the refseq column as the identifier column. > We can export all these with a simple write command. ?Right now, we are > thinking of bridging to/from the Exon arrays with the Unigene. ?I guess we > will have to see how that works > > Again, we are having success when dealing with a few probesets, but there > must be a way to get ALL probesets. ?Can you please help us with this? > ?Possibly > with the example command syntax? ?In the paper you mention mapping all the > mouse probes across platforms, so you must have had to deal with this. > > We are likely wanting to try the cross species analysis in the near future > as well, so learning how to get passed the limit of entering each > probe/gene/etc manually will be a big help. > > Kind regards, > > Yingfang and Brad > > -- > Yingfang Tian, PhD > M.I.N.D. Institute > University of California at Davis > 2805 50th Street,Room 2434 > Sacramento, CA ?95817 > Tel:916-703-0384 > > ? ? ? ?[[alternative HTML version deleted]] > >

ADD COMMENT • link 16.5 years ago Brad Ander ▴ 20

0

Entering edit mode

Hi Brad, A couple of comments on your code (see below) > -----Original Message----- > > Affy Probeset -> Entrez: > > annotationFile <- "HG-U133_Plus_2.na29.annot.csv" > dataDirectory <- system.file("data", package = "annotationTools") > annotation_HGU133Plus2 <- read.csv(paste(dataDirectory, annotationFile, > + sep = "/"), colClasses = "character") I assume here that you have the annotation file "HG-U133_Plus_2.na29.annot.csv" in the 'data' subdirectory of the annotationTools installation directory. I used this code in the vignette to load an example annotation file (stored in 'data'). You can however save the annotation file anywhere on your file system. Second I am not sure your annotation file loaded correctly since Affymetrix now has a header in annotation files. To skip the header (lines preceded by the dash sign), please use the command (assuming for instance that you work under Windows and you saved you annotation file under "C:/Annotations") >annotation_HGU133Plus2 <- read.csv("C:/Annotations/HG-U133_Plus_2.na29.annot.csv ", colClasses = "character", comment.char='#') You can check the size of annotation_HGU133Plus2 with >dim(annotation_HGU133Plus2) to make sure that you now have what you expected (that is, a data.frame of 54675 rows and 41 columns). I changed the vignette accordingly some time ago and the change will be incorporated in the next Bioconductor release. Alexandre > allPS<-annotation_HGU133Plus2[,1] > getANNOTATION(allPS, annotation_HGU133Plus2, diagnose = FALSE, > identifierCol = 1, annotationCol = 19, noAnnotationSymbol = NA, > noAnnotationProvidedSymbol = "---", sep = " /// ") > entrez <- getANNOTATION(allps, annotation_HGU133Plus2, diagnose = > FALSE, identifierCol = 1, annotationCol = 19) > write.matrix(entrez, file = "humanentrez.csv", sep = " ") > > Entrez -> illumina: > > annotationFileIll <- "HumanRef-8_V3_0_R2_11282963_Ab.csv" > dataDirectory <- system.file("data", package = "annotationTools") > annotation_Illumina <- read.csv(paste(dataDirectory, annotationFileIll, > + sep = "/"), colClasses = "character") > getANNOTATION(entrez, annotation_Illumina, diagnose = FALSE, > identifierCol = 9, annotationCol = 14, noAnnotationSymbol = NA, > noAnnotationProvidedSymbol = "---", sep = " /// ") > illuminaID <- getANNOTATION(entrez, annotation_Illumina, diagnose = > FALSE, identifierCol = 9, annotationCol = 14) > write.matrix(IlluminaID, file = "illuminaID.csv", sep = " ") > > > It may not have been the most perfect use of the code but it seems to > work (we are still learning). > > Thanks for your help. If there are any suggestions you feel are > important, please let us know. > > Kind regards, > Brad > > > -- > Brad Ander, PhD > M.I.N.D. Institute > University of California at Davis > Room 2434 > 2805 50th Street > Sacramento, CA ?95817 > > > > 2009/8/10 Alexandre Kuhn <alexandre.kuhn at="" epfl.ch=""> > > > > Hi Yingfang, > > Once you have loaded your Affymetrix annotation into R (assume it is > > contained in an R object named 'annot') you could for instance select > all > > probe sets by subsetting the data.frame so as to select the first > column > > > > > allps<-annot[,1] > > > > I am not sure this answers your question tough. Could you please send > some > > lines of code to help me understand what is going wrong? > > > > Best, Alexandre > > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Yingfang > Tian > > Sent: jeudi 6 ao?t 2009 19:23 > > To: bioconductor at stat.math.ethz.ch > > Cc: Brad Ander > > Subject: Re: [BioC] Annotation Tools package > > > > Dear Dr Kuhn: > > > > We are Yingfang Tian and Brad Ander from the University of California > at > > Davis. ?We are working on the cross-platform analysis of 3 platforms: > > llumina Human Ref-8, ?Affymetrix U133 plus 2 array, and Affymetrix > human > > Exon array. > > > > We are trying to use your annotationTools package in R amd are able > to at > > least translate across probes from U133 arrays to Illumina, similar > to the > > example you give in the BMC Bioinformatics paper. ? We were wondering > how to > > import large or entire numbers of probesets into the ?myPS? object? > ?It may > > be a basic R command, but unfortunately we rely on commercial > software for > > the majority of our analyses and have limited experience with R > (hopefully > > that can change on both fronts). > > > > >From the paper, it seems that we can generate a list of Refseq IDs > from > > >the > > Affy Probesets and then use this list (set it as the ?myPS? object) > to pull > > out the Illumina Probe ID using the refseq column as the identifier > column. > > We can export all these with a simple write command. ?Right now, we > are > > thinking of bridging to/from the Exon arrays with the Unigene. ?I > guess we > > will have to see how that works > > > > Again, we are having success when dealing with a few probesets, but > there > > must be a way to get ALL probesets. ?Can you please help us with > this? > > ?Possibly > > with the example command syntax? ?In the paper you mention mapping > all the > > mouse probes across platforms, so you must have had to deal with > this. > > > > We are likely wanting to try the cross species analysis in the near > future > > as well, so learning how to get passed the limit of entering each > > probe/gene/etc manually will be a big help. > > > > Kind regards, > > > > Yingfang and Brad > > > > -- > > Yingfang Tian, PhD > > M.I.N.D. Institute > > University of California at Davis > > 2805 50th Street,Room 2434 > > Sacramento, CA ?95817 > > Tel:916-703-0384 > > > > ? ? ? ?[[alternative HTML version deleted]] > > > >

ADD REPLY • link 16.5 years ago Alexandre Kuhn ▴ 60

0

Entering edit mode

Thank you for the suggestions, Alexandre. We indeed were placing the annotation files into the data directory of annotationTools, now we now just use our working directory by leaving out: >dataDirectory <- system.file("data", package = "annotationTools") You are also correct about the header. We got around this by deleting the header in the annotation file, but your suggestion to skip the comments marked '#' will make it convenient to just use the files as provided. As we had it, all rows/columns were read properly. Thanks for the help and the great tool. Brad -- Brad Ander, PhD M.I.N.D. Institute University of California at Davis Room 2434 2805 50th Street Sacramento, CA 95817 2009/8/12 Alexandre Kuhn <kuhnam at="" mail.nih.gov="">: > Hi Brad, > > A couple of comments on your code (see below) > >> -----Original Message----- >> >> Affy Probeset -> Entrez: >> >> annotationFile <- "HG-U133_Plus_2.na29.annot.csv" >> dataDirectory <- system.file("data", package = "annotationTools") >> annotation_HGU133Plus2 <- read.csv(paste(dataDirectory, annotationFile, >> + sep = "/"), colClasses = "character") > > I assume here that you have the annotation file > "HG-U133_Plus_2.na29.annot.csv" in the 'data' subdirectory of the > annotationTools installation directory. I used this code in the vignette to > load an example annotation file (stored in 'data'). You can however save the > annotation file anywhere on your file system. > > Second I am not sure your annotation file loaded correctly since Affymetrix > now has a header in annotation files. To skip the header (lines preceded by > the dash sign), please use the command (assuming for instance that you work > under Windows and you saved you annotation file under "C:/Annotations") > >>annotation_HGU133Plus2 <- > read.csv("C:/Annotations/HG-U133_Plus_2.na29.annot.csv ", colClasses = > "character", comment.char='#') > > You can check the size of annotation_HGU133Plus2 with > >>dim(annotation_HGU133Plus2) > > to make sure that you now have what you expected (that is, a data.frame of > 54675 rows and 41 columns). > > I changed the vignette accordingly some time ago and the change will be > incorporated in the next Bioconductor release. > > > Alexandre > > > > >> allPS<-annotation_HGU133Plus2[,1] >> getANNOTATION(allPS, annotation_HGU133Plus2, diagnose = FALSE, >> identifierCol = 1, annotationCol = 19, noAnnotationSymbol = NA, >> noAnnotationProvidedSymbol = "---", sep = " /// ") >> entrez <- getANNOTATION(allps, annotation_HGU133Plus2, diagnose = >> FALSE, identifierCol = 1, annotationCol = 19) >> write.matrix(entrez, file = "humanentrez.csv", sep = " ") >> >> Entrez -> illumina: >> >> annotationFileIll <- "HumanRef-8_V3_0_R2_11282963_Ab.csv" >> dataDirectory <- system.file("data", package = "annotationTools") >> annotation_Illumina <- read.csv(paste(dataDirectory, annotationFileIll, >> + sep = "/"), colClasses = "character") >> getANNOTATION(entrez, annotation_Illumina, diagnose = FALSE, >> identifierCol = 9, annotationCol = 14, noAnnotationSymbol = NA, >> noAnnotationProvidedSymbol = "---", sep = " /// ") >> illuminaID <- getANNOTATION(entrez, annotation_Illumina, diagnose = >> FALSE, identifierCol = 9, annotationCol = 14) >> write.matrix(IlluminaID, file = "illuminaID.csv", sep = " ") >> >> >> It may not have been the most perfect use of the code but it seems to >> work (we are still learning). >> >> Thanks for your help. ?If there are any suggestions you feel are >> important, please let us know. >> >> Kind regards, >> Brad >> >> >> -- >> Brad Ander, PhD >> M.I.N.D. Institute >> University of California at Davis >> Room 2434 >> 2805 50th Street >> Sacramento, CA ?95817 >> >> >> >> 2009/8/10 Alexandre Kuhn <alexandre.kuhn at="" epfl.ch=""> >> > >> > Hi Yingfang, >> > Once you have loaded your Affymetrix annotation into R (assume it is >> > contained in an R object named 'annot') you could for instance select >> all >> > probe sets by subsetting the data.frame so as to select the first >> column >> > >> > > allps<-annot[,1] >> > >> > I am not sure this answers your question tough. Could you please send >> some >> > lines of code to help me understand what is going wrong? >> > >> > Best, Alexandre >> > >> > -----Original Message----- >> > From: bioconductor-bounces at stat.math.ethz.ch >> > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Yingfang >> Tian >> > Sent: jeudi 6 ao?t 2009 19:23 >> > To: bioconductor at stat.math.ethz.ch >> > Cc: Brad Ander >> > Subject: Re: [BioC] Annotation Tools package >> > >> > Dear Dr Kuhn: >> > >> > We are Yingfang Tian and Brad Ander from the University of California >> at >> > Davis. ?We are working on the cross-platform analysis of 3 platforms: >> > llumina Human Ref-8, ?Affymetrix U133 plus 2 array, and Affymetrix >> human >> > Exon array. >> > >> > We are trying to use your annotationTools package in R amd are able >> to at >> > least translate across probes from U133 arrays to Illumina, similar >> to the >> > example you give in the BMC Bioinformatics paper. ? We were wondering >> how to >> > import large or entire numbers of probesets into the ?myPS? object? >> ?It may >> > be a basic R command, but unfortunately we rely on commercial >> software for >> > the majority of our analyses and have limited experience with R >> (hopefully >> > that can change on both fronts). >> > >> > >From the paper, it seems that we can generate a list of Refseq IDs >> from >> > >the >> > Affy Probesets and then use this list (set it as the ?myPS? object) >> to pull >> > out the Illumina Probe ID using the refseq column as the identifier >> column. >> > We can export all these with a simple write command. ?Right now, we >> are >> > thinking of bridging to/from the Exon arrays with the Unigene. ?I >> guess we >> > will have to see how that works >> > >> > Again, we are having success when dealing with a few probesets, but >> there >> > must be a way to get ALL probesets. ?Can you please help us with >> this? >> > ?Possibly >> > with the example command syntax? ?In the paper you mention mapping >> all the >> > mouse probes across platforms, so you must have had to deal with >> this. >> > >> > We are likely wanting to try the cross species analysis in the near >> future >> > as well, so learning how to get passed the limit of entering each >> > probe/gene/etc manually will be a big help. >> > >> > Kind regards, >> > >> > Yingfang and Brad >> > >> > -- >> > Yingfang Tian, PhD >> > M.I.N.D. Institute >> > University of California at Davis >> > 2805 50th Street,Room 2434 >> > Sacramento, CA ?95817 >> > Tel:916-703-0384 >> > >> > ? ? ? ?[[alternative HTML version deleted]] >> > >> > > >

ADD REPLY • link 16.5 years ago Brad Ander ▴ 20