Help with annotation packages
2
0
Entering edit mode
Amy Mikhail ▴ 460
@amy-mikhail-1317
Last seen 9.6 years ago
Dear list, I have a mosquito microarray that I would like to annotate, but am having some trouble figuring out which packages are appropriate to use. After reading the Annbuilder, Annotate and BiomaRt vignettes, I am still really unsure if any of those packages would do what I want. So here is my question: The array is for Anopheles gambiae, and consists of about 13,500 cDNA spots from PCR plates - probe sequences between 150 and 500 bp in length. The manufacturer of my array provided a .GAL file with it - this was made in GenePix and lists ensembl gene transcripts under the column "name" and ensembl gene identifiers under the column "ID". What I would really like is to add an extra column to this .GAL file (or actually my .gpr outputs from GenePix) which would contain gene function/ontology information, so that everything I do with my results thereafter would come up with the GO information as well (e.g. toptable from limma). I know that the latest An. gambiae annotation available in Ensembl is agam_P3, and would like to use this but have to bear in mind that the microarray probe IDs were provided from an earlier build, so a number of genes on the array will not be present in the agam_P3 list . If the package I use flags these as NAs or whatever, that would be fine for the moment. My confusion is really over which package to use: I understand that Biomart can handle single queries or queries for a small list of (e.g.) DE genes, but not the entire probe set. Is that right? Also, I note that other list users with queries relating to Biomart have been directed to use the devel version. I think this doesn't work with R 2.2.1? It also seems that the Annotate package is only suitable for species that Bioconductor has specifically created libraries for, and that there are currently only libraries for human, mouse and rat ... so not suitable for me either? Lastly, the Annbuilder package sounds most like what I'm after, but I'm a bit confused about whether it is limited in the public data repositories it can use, as the probe IDs I have are from Ensembl, not Entrez-gene. Also I gather I would have to query the data package that Annbuilder creates every time I want the annotation info for a given list of genes, rather than it being linked to my .gpr or .GAL files. Have I understood that correctly, and if so is there any way to link annotation info to the .GAL file itself? Also is Pearl something one has to download in order to use this package (please excuse the very naive question as I'm not a bioinformatician)? So just to recap; all I actually want to do is merge the AGAM P3 annotation list with my .GAL file, and make sure that the new columns appear as part of the output from limma, etc. Looking forward to any advice / suggestions, Regards, Amy R: 2.2.1, Bioconductor: 1.7, OS: windows XP. ------------------------------------------- Amy Mikhail Research student University of Aberdeen Zoology Building Tillydrone Avenue Aberdeen AB24 2TZ Scotland Email: a.mikhail at abdn.ac.uk Phone: 00-44-1224-272880 (lab)
Microarray Annotation GO Anopheles gambiae probe annotate limma AnnBuilder biomaRt GO • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States
Hi Amy, Amy Mikhail wrote: > Dear list, > > I have a mosquito microarray that I would like to annotate, but am having > some trouble figuring out which packages are appropriate to use. After > reading the Annbuilder, Annotate and BiomaRt vignettes, I am still really > unsure if any of those packages would do what I want. So here is my > question: > > The array is for Anopheles gambiae, and consists of about 13,500 cDNA > spots from PCR plates - probe sequences between 150 and 500 bp in length. > The manufacturer of my array provided a .GAL file with it - this was made > in GenePix and lists ensembl gene transcripts under the column "name" and > ensembl gene identifiers under the column "ID". > > What I would really like is to add an extra column to this .GAL file (or > actually my .gpr outputs from GenePix) which would contain gene > function/ontology information, so that everything I do with my results > thereafter would come up with the GO information as well (e.g. toptable > from limma). I don't know if this is going to work out too well. There is a one-to-many relationship between e.g., Ensembl ID and GO terms (especially if you are using all three GO types), so it is not likely that you will be able to get all your GO terms to fit into a topTable. However, you could use topTable() to get the most interesting genes, then extract the Ensembl IDs associated with them and use biomaRt to get the GO terms using the getGO() function. If you then want to look at your data along with the GO terms, you could use htmlpage() in the annotate package to make an HTML table where you *could* visualize things with a one-to-many relationship. This will take some work on your part to figure things out, but the main workflow would be something like: Run topTable() to get a vector of Ensembl IDs Load biomaRt and set up a connection using useMart(). To figure out the dataset to use, I usually do mart <- useMart("ensembl") listDatasets(mart) mart <- useMart("ensembl", dataset = "<name of="" dataset="" from="" above="">") Then use getGO() to get a data.frame containing all the GO terms. You can then use the 'ensembl_gene_id' column to parse out which GO terms belong to which ID. You want to make a list of the same length as your vector of Ensembl IDs and then stick the unique GO terms for each Ensembl ID into the corresponding position of the list. For example, say you have a vector of Ensembl IDs. Then you would do something like this (not tested): ens.ids <- <character vector="" of="" ids=""> go <- getGO(ens.ids, "ensembl", mart=mart) mylist <- vector("list", length(ens.ids)) for(i in seq(along = ens.ids)) mylist[[i]] <- unique(subset(go, go[,4] == ens.ids[i], select = 2)) You can then use htmlpage() in the annotate package to turn this into an HTML table. You won't be able to make links to databases currently because there isn't a function to make links to Ensembl. However, you could put the mylist from above along with vectors of Ensembl IDs, p-values, t-statistics, and a data.frame of your expression values into another list and use that for the 'othernames' argument to htmlpage(). As I mentioned above, this will take some work on your part since I have only sketched the basic idea here. However, this is likely the easiest way to go, as compared to building an annotation package. HTH, Jim > > I know that the latest An. gambiae annotation available in Ensembl is > agam_P3, and would like to use this but have to bear in mind that the > microarray probe IDs were provided from an earlier build, so a number of > genes on the array will not be present in the agam_P3 list . If the > package I use flags these as NAs or whatever, that would be fine for the > moment. > > My confusion is really over which package to use: > > I understand that Biomart can handle single queries or queries for a small > list of (e.g.) DE genes, but not the entire probe set. Is that right? > Also, I note that other list users with queries relating to Biomart have > been directed to use the devel version. I think this doesn't work with R > 2.2.1? > > It also seems that the Annotate package is only suitable for species that > Bioconductor has specifically created libraries for, and that there are > currently only libraries for human, mouse and rat ... so not suitable for > me either? > > Lastly, the Annbuilder package sounds most like what I'm after, but I'm a > bit confused about whether it is limited in the public data repositories > it can use, as the probe IDs I have are from Ensembl, not Entrez- gene. > Also I gather I would have to query the data package that Annbuilder > creates every time I want the annotation info for a given list of genes, > rather than it being linked to my .gpr or .GAL files. Have I understood > that correctly, and if so is there any way to link annotation info to the > .GAL file itself? Also is Pearl something one has to download in order to > use this package (please excuse the very naive question as I'm not a > bioinformatician)? > > So just to recap; all I actually want to do is merge the AGAM P3 > annotation list with my .GAL file, and make sure that the new columns > appear as part of the output from limma, etc. > > Looking forward to any advice / suggestions, > > Regards, > Amy > > R: 2.2.1, Bioconductor: 1.7, OS: windows XP. > > ------------------------------------------- > Amy Mikhail > Research student > University of Aberdeen > Zoology Building > Tillydrone Avenue > Aberdeen AB24 2TZ > Scotland > Email: a.mikhail at abdn.ac.uk > Phone: 00-44-1224-272880 (lab) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD COMMENT
0
Entering edit mode
@steffen-durinck-519
Last seen 9.6 years ago
Hi Amy, > My confusion is really over which package to use: > > I understand that Biomart can handle single queries or queries for a small > list of (e.g.) DE genes, but not the entire probe set. Is that right? This is not true. You can use it for long lists as well, the maximum I tried is to annotate about 20.000 ids at once, even longer lists are possible. > Also, I note that other list users with queries relating to Biomart have > been directed to use the devel version. I think this doesn't work with R > 2.2.1? This is correct, it is recommended to use the latest devel version of biomaRt and you'll need R-2.3.0 for this of which you can find a Windows version here: http://cran.r-project.org/bin/windows/base/rdevel.html Cheers, Steffen
ADD COMMENT
0
Entering edit mode
Hi Steffen and Jim, Thanks for the suggestions. I'll try the whole probeset and a toptable subset in Biomart and see how I get on. Cheers, Amy > Hi Amy, > >> My confusion is really over which package to use: >> >> I understand that Biomart can handle single queries or queries for a >> small >> list of (e.g.) DE genes, but not the entire probe set. Is that right? > > > This is not true. You can use it for long lists as well, the maximum I > tried is to annotate about 20.000 ids at once, even longer lists are > possible. > >> Also, I note that other list users with queries relating to Biomart have >> been directed to use the devel version. I think this doesn't work with >> R >> 2.2.1? > > This is correct, it is recommended to use the latest devel version of > biomaRt and you'll need R-2.3.0 for this of which you can find a Windows > version here: > > http://cran.r-project.org/bin/windows/base/rdevel.html > > Cheers, > Steffen > > > > > ------------------------------------------- Amy Mikhail Research student University of Aberdeen Zoology Building Tillydrone Avenue Aberdeen AB24 2TZ Scotland Email: a.mikhail at abdn.ac.uk Phone: 00-44-1224-272880 (lab)
ADD REPLY

Login before adding your answer.

Traffic: 593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6