problem with rat database
2
0
Entering edit mode
@alberto-goldoni-3477
Last seen 10.4 years ago
Dear All, i'm analyzing agilent microarrays with the "rgug4130a.db" database and using the function:"topTable(fit2,number=500,adjust="BH")" i have obtained 500 genes like these: Row Col ProbeUID ControlType ProbeName GeneName SystematicName Description X.hda.str...ref. X.ref.str...ref. X.hda.str...ref.str. AveExpr F P.Value adj.P.Val 16096 79 38 15309 0 A_43_P10328 CB606456 CB606456 unknown function 3.988290607 -0.951656306 4.939946913 10.29735936 36.77263264 0.000212298 0.641094595 8109 40 109 7609 0 A_42_P552092 203358_Rn 203358_Rn Rat c-fos mRNA. 5.670956889 4.413365374 1.257591514 13.47699544 33.20342601 0.000292278 0.641094595 but as you can see most genes like the first one - CB606456 - in the DESCRPTION there is written "unknown function". So i have performed a very simply search. 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations of DnaAlignFeature" it gives to me the Genomic location(strand): chr 7:16261621-16262210 2) Then in the Rat Genome Database (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found that in this position there is one gene: 735058 GENE Angptl4 angiopoietin-like 4 7 16261623 16267852 so the question is why in the "rgug4130a.db" database the R system gives to me "unknown function" when using the genomic location in ensamble and then in rgd it gives to me the Angptl4 gene! and there is a function in order to do to R to perform this kind of search automatically? (this why in my 500 genes there are 100 "unknow function" genes and it will be interesting to have a function that perform this kind of search automatically). Best regards to all and to whom answer to me. -- ----------------------------------------------------- Dr. Alberto Goldoni Parma, Italy
• 1.6k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 4 days ago
United States
1) you did not provide sessionInfo(), which is critical for helping you to diagnose an issue that may pertain to software version -- revisions to annotation packages can have all sorts of consequences 2) i am not sure rgug4130.db has anything to do with this. > get("CB606456", revmap(rgug4130aSYMBOL)) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : value for "CB606456" not found and so on. look at the featureData component of the object passed to lmFit -- the annotation may be in there. if this does not give clarification please give very explicity indication of how the topTable was generated, going back to the structure of the object passed to lmFit On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni <alberto.goldoni1975 at="" gmail.com=""> wrote: > Dear All, > i'm analyzing agilent microarrays with the "rgug4130a.db" database and > using the function:"topTable(fit2,number=500,adjust="BH")" i have > obtained 500 genes like these: > > Row ? ? Col ? ? ProbeUID ? ? ? ?ControlType ? ? ProbeName ? ? ? GeneName ? ? ? ?SystematicName ?Description ? ? X.hda.str...ref. ? ? ? ?X.ref.str...ref. ? ? ? ?X.hda.str...ref.str. ? ?AveExpr F ? ? ? P.Value adj.P.Val > 16096 ? 79 ? ? ?38 ? ? ?15309 ? 0 ? ? ? A_43_P10328 ? ? CB606456 ? ? ? ?CB606456 ? ? ? ?unknown > function ? ? ? ?3.988290607 ? ? -0.951656306 ? ?4.939946913 ? ? 10.29735936 ? ? 36.77263264 ? ? 0.000212298 ? ? 0.641094595 > 8109 ? ?40 ? ? ?109 ? ? 7609 ? ?0 ? ? ? A_42_P552092 ? ?203358_Rn ? ? ? 203358_Rn ? ? ? Rat c-fos > mRNA. ? 5.670956889 ? ? 4.413365374 ? ? ? ?1.257591514 ? ? 13.47699544 ? ? 33.20342601 ? ? 0.000292278 ? ? 0.641094595 > > but as you can see most genes like the first one ?- CB606456 - ?in the > DESCRPTION there is written "unknown function". > > So i have performed a very simply search. > 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations > of DnaAlignFeature" it gives to me the Genomic location(strand): chr > 7:16261621-16262210 > 2) Then in the Rat Genome Database > (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found > that in this position there is one gene: > > 735058 ?GENE ? ?Angptl4 angiopoietin-like 4 ? ? 7 ? ? ? 16261623 ? ? ? ?16267852 > > so the question is why in the "rgug4130a.db" database the R system > gives to me "unknown function" when using the genomic location in > ensamble and then in rgd it gives to me the Angptl4 gene! > > and there is a function in order to do to R to perform this kind of > search automatically? (this why in my 500 genes there are 100 "unknow > function" genes and it will be interesting to have a function that > perform this kind of search automatically). > > > Best regards to all and to whom answer to me. > > -- > ----------------------------------------------------- > Dr. Alberto Goldoni > Parma, Italy > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
@Vincent The chip used is the "rgug4130a" so i have to use the "rgug4130a.db" database. In order to obtain the toptable this is my history: library(limma) library(vsn) targets <- readTargets("targets.txt") RG <- read.maimages(targets$FileName, source="agilent") MA <- normalizeBetweenArrays(RG, method="Aquantile") contrast.matrix <- cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str)-(re f+str)"=c(1,-1)) rownames(contrast.matrix) <- colnames(design) fit <- lmFit(MA, design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2) geni500<-topTable(fit2,number=500,adjust="BH") > sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] AnnotationDbi_1.12.0 Biobase_2.10.0 limma_3.6.9 loaded via a namespace (and not attached): [1] DBI_0.2-5 RSQLite_0.9-4 tools_2.12.1 2011/5/10 Vincent Carey <stvjc at="" channing.harvard.edu="">: > 1) you did not provide sessionInfo(), which is critical for helping > you to diagnose an issue that may pertain to software version -- > revisions to annotation packages can have all sorts of consequences > > 2) i am not sure rgug4130.db has anything to do with this. > >> get("CB606456", revmap(rgug4130aSYMBOL)) > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > ?value for "CB606456" not found > > > and so on. ?look at the featureData component of the object passed to > lmFit -- the annotation may be in there. ?if this does not give > clarification please give very explicity indication of how the > topTable was generated, going back to the structure of the object > passed to lmFit > > On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni > <alberto.goldoni1975 at="" gmail.com=""> wrote: >> Dear All, >> i'm analyzing agilent microarrays with the "rgug4130a.db" database and >> using the function:"topTable(fit2,number=500,adjust="BH")" i have >> obtained 500 genes like these: >> >> Row ? ? Col ? ? ProbeUID ? ? ? ?ControlType ? ? ProbeName ? ? ? GeneName ? ? ? ?SystematicName ?Description ? ? X.hda.str...ref. ? ? ? ?X.ref.str...ref. ? ? ? ?X.hda.str...ref.str. ? ?AveExpr F ? ? ? P.Value adj.P.Val >> 16096 ? 79 ? ? ?38 ? ? ?15309 ? 0 ? ? ? A_43_P10328 ? ? CB606456 ? ? ? ?CB606456 ? ? ? ?unknown >> function ? ? ? ?3.988290607 ? ? -0.951656306 ? ?4.939946913 ? ? 10.29735936 ? ? 36.77263264 ? ? 0.000212298 ? ? 0.641094595 >> 8109 ? ?40 ? ? ?109 ? ? 7609 ? ?0 ? ? ? A_42_P552092 ? ?203358_Rn ? ? ? 203358_Rn ? ? ? Rat c-fos >> mRNA. ? 5.670956889 ? ? 4.413365374 ? ? ? ?1.257591514 ? ? 13.47699544 ? ? 33.20342601 ? ? 0.000292278 ? ? 0.641094595 >> >> but as you can see most genes like the first one ?- CB606456 - ?in the >> DESCRPTION there is written "unknown function". >> >> So i have performed a very simply search. >> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations >> of DnaAlignFeature" it gives to me the Genomic location(strand): chr >> 7:16261621-16262210 >> 2) Then in the Rat Genome Database >> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found >> that in this position there is one gene: >> >> 735058 ?GENE ? ?Angptl4 angiopoietin-like 4 ? ? 7 ? ? ? 16261623 ? ? ? ?16267852 >> >> so the question is why in the "rgug4130a.db" database the R system >> gives to me "unknown function" when using the genomic location in >> ensamble and then in rgd it gives to me the Angptl4 gene! >> >> and there is a function in order to do to R to perform this kind of >> search automatically? (this why in my 500 genes there are 100 "unknow >> function" genes and it will be interesting to have a function that >> perform this kind of search automatically). >> >> >> Best regards to all and to whom answer to me. >> >> -- >> ----------------------------------------------------- >> Dr. Alberto Goldoni >> Parma, Italy >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- ----------------------------------------------------- Dr. Alberto Goldoni Parma, Italy
ADD REPLY
0
Entering edit mode
On Tue, May 10, 2011 at 8:17 AM, Alberto Goldoni < alberto.goldoni1975@gmail.com> wrote: > @Vincent > > The chip used is the "rgug4130a" so i have to use the "rgug4130a.db" > database. > > In order to obtain the toptable this is my history: > > library(limma) > library(vsn) > targets <- readTargets("targets.txt") > RG <- read.maimages(targets$FileName, source="agilent") > MA <- normalizeBetweenArrays(RG, method="Aquantile") > contrast.matrix <- > > cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str)-( ref+str)"=c(1,-1)) > rownames(contrast.matrix) <- colnames(design) > fit <- lmFit(MA, design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit2 <- eBayes(fit2) > geni500<-topTable(fit2,number=500,adjust="BH") > > Hi, Alberto. The data in your topTable result are taken from the feature extraction result file. In other words, rgug4130a.db is not used in what you show above. You could add to your annotation using either rgug4130a.db or biomaRt, but you will need to perform these steps yourself. As to why some of your probes do not appear to have annotation, you would probably need to contact Agilent as they are the source of your current annotation. Hope that helps, Sean > > sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C > [5] LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] AnnotationDbi_1.12.0 Biobase_2.10.0 limma_3.6.9 > > loaded via a namespace (and not attached): > [1] DBI_0.2-5 RSQLite_0.9-4 tools_2.12.1 > > > > 2011/5/10 Vincent Carey <stvjc@channing.harvard.edu>: > > 1) you did not provide sessionInfo(), which is critical for helping > > you to diagnose an issue that may pertain to software version -- > > revisions to annotation packages can have all sorts of consequences > > > > 2) i am not sure rgug4130.db has anything to do with this. > > > >> get("CB606456", revmap(rgug4130aSYMBOL)) > > Error in .checkKeys(value, Rkeys(x), x@ifnotfound) : > > value for "CB606456" not found > > > > > > and so on. look at the featureData component of the object passed to > > lmFit -- the annotation may be in there. if this does not give > > clarification please give very explicity indication of how the > > topTable was generated, going back to the structure of the object > > passed to lmFit > > > > On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni > > <alberto.goldoni1975@gmail.com> wrote: > >> Dear All, > >> i'm analyzing agilent microarrays with the "rgug4130a.db" database and > >> using the function:"topTable(fit2,number=500,adjust="BH")" i have > >> obtained 500 genes like these: > >> > >> Row Col ProbeUID ControlType ProbeName GeneName > SystematicName Description X.hda.str...ref. > X.ref.str...ref. X.hda.str...ref.str. AveExpr F P.Value > adj.P.Val > >> 16096 79 38 15309 0 A_43_P10328 CB606456 > CB606456 unknown > >> function 3.988290607 -0.951656306 4.939946913 > 10.29735936 36.77263264 0.000212298 0.641094595 > >> 8109 40 109 7609 0 A_42_P552092 203358_Rn > 203358_Rn Rat c-fos > >> mRNA. 5.670956889 4.413365374 1.257591514 13.47699544 > 33.20342601 0.000292278 0.641094595 > >> > >> but as you can see most genes like the first one - CB606456 - in the > >> DESCRPTION there is written "unknown function". > >> > >> So i have performed a very simply search. > >> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations > >> of DnaAlignFeature" it gives to me the Genomic location(strand): chr > >> 7:16261621-16262210 > >> 2) Then in the Rat Genome Database > >> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found > >> that in this position there is one gene: > >> > >> 735058 GENE Angptl4 angiopoietin-like 4 7 16261623 > 16267852 > >> > >> so the question is why in the "rgug4130a.db" database the R system > >> gives to me "unknown function" when using the genomic location in > >> ensamble and then in rgd it gives to me the Angptl4 gene! > >> > >> and there is a function in order to do to R to perform this kind of > >> search automatically? (this why in my 500 genes there are 100 "unknow > >> function" genes and it will be interesting to have a function that > >> perform this kind of search automatically). > >> > >> > >> Best regards to all and to whom answer to me. > >> > >> -- > >> ----------------------------------------------------- > >> Dr. Alberto Goldoni > >> Parma, Italy > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > -- > ----------------------------------------------------- > Dr. Alberto Goldoni > Parma, Italy > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@Davis You are right! But i have tryed to perform this kind of search: library("rgug4130a.db") x <- rgug4130aENSEMBL mapped_genes <- mappedkeys(x) xx <- as.list(x[mapped_genes]) or this approach: x <- rgug4130aGENENAME mapped_probes <- mappedkeys(x) xx <- as.list(x[mapped_probes]) but the results are the same in some genes there is:"unknown function". I would like to know if there is a method in order to perform the search using another database or directly to the Rat Genome Database or using biomaRt...but i don't know how. I have more or less 100 genes with an "unknown function" and it would be very useful if there is a script or function in order to perform automatically instead of serching genes one by one. Best regards. 2011/5/10 Sean Davis <sdavis2 at="" mail.nih.gov="">: > > > On Tue, May 10, 2011 at 8:17 AM, Alberto Goldoni > <alberto.goldoni1975 at="" gmail.com=""> wrote: >> >> @Vincent >> >> The chip used is the "rgug4130a" so i have to use the "rgug4130a.db" >> database. >> >> In order to obtain the toptable this is my history: >> >> library(limma) >> library(vsn) >> targets <- readTargets("targets.txt") >> RG <- read.maimages(targets$FileName, source="agilent") >> MA <- normalizeBetweenArrays(RG, method="Aquantile") >> contrast.matrix <- >> >> cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str)- (ref+str)"=c(1,-1)) >> rownames(contrast.matrix) <- colnames(design) >> fit <- lmFit(MA, design) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) >> geni500<-topTable(fit2,number=500,adjust="BH") >> > > Hi, Alberto. > The data in your topTable result are taken from the feature extraction > result file. ?In other words, rgug4130a.db is not used in what you show > above. ?You could add to your annotation using either rgug4130a.db or > biomaRt, but you will need to perform these steps yourself. ?As to why some > of your probes do not appear to have annotation, you would probably need to > contact Agilent as they are the source of your current annotation. > Hope that helps, > Sean > >> >> > sessionInfo() >> R version 2.12.1 (2010-12-16) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United Kingdom.1252 ?LC_CTYPE=English_United >> Kingdom.1252 >> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C >> [5] LC_TIME=English_United Kingdom.1252 >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> other attached packages: >> [1] AnnotationDbi_1.12.0 Biobase_2.10.0 ? ? ? limma_3.6.9 >> >> loaded via a namespace (and not attached): >> [1] DBI_0.2-5 ? ? RSQLite_0.9-4 tools_2.12.1 >> >> >> >> 2011/5/10 Vincent Carey <stvjc at="" channing.harvard.edu="">: >> > 1) you did not provide sessionInfo(), which is critical for helping >> > you to diagnose an issue that may pertain to software version -- >> > revisions to annotation packages can have all sorts of consequences >> > >> > 2) i am not sure rgug4130.db has anything to do with this. >> > >> >> get("CB606456", revmap(rgug4130aSYMBOL)) >> > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >> > ?value for "CB606456" not found >> > >> > >> > and so on. ?look at the featureData component of the object passed to >> > lmFit -- the annotation may be in there. ?if this does not give >> > clarification please give very explicity indication of how the >> > topTable was generated, going back to the structure of the object >> > passed to lmFit >> > >> > On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni >> > <alberto.goldoni1975 at="" gmail.com=""> wrote: >> >> Dear All, >> >> i'm analyzing agilent microarrays with the "rgug4130a.db" database and >> >> using the function:"topTable(fit2,number=500,adjust="BH")" i have >> >> obtained 500 genes like these: >> >> >> >> Row ? ? Col ? ? ProbeUID ? ? ? ?ControlType ? ? ProbeName >> >> GeneName ? ? ? ?SystematicName ?Description ? ? X.hda.str...ref. >> >> ?X.ref.str...ref. ? ? ? ?X.hda.str...ref.str. ? ?AveExpr F ? ? ? P.Value >> >> adj.P.Val >> >> 16096 ? 79 ? ? ?38 ? ? ?15309 ? 0 ? ? ? A_43_P10328 ? ? CB606456 >> >> ?CB606456 ? ? ? ?unknown >> >> function ? ? ? ?3.988290607 ? ? -0.951656306 ? ?4.939946913 >> >> 10.29735936 ? ? 36.77263264 ? ? 0.000212298 ? ? 0.641094595 >> >> 8109 ? ?40 ? ? ?109 ? ? 7609 ? ?0 ? ? ? A_42_P552092 ? ?203358_Rn >> >> 203358_Rn ? ? ? Rat c-fos >> >> mRNA. ? 5.670956889 ? ? 4.413365374 ? ? ? ?1.257591514 ? ? 13.47699544 >> >> ? ? 33.20342601 ? ? 0.000292278 ? ? 0.641094595 >> >> >> >> but as you can see most genes like the first one ?- CB606456 - ?in the >> >> DESCRPTION there is written "unknown function". >> >> >> >> So i have performed a very simply search. >> >> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations >> >> of DnaAlignFeature" it gives to me the Genomic location(strand): chr >> >> 7:16261621-16262210 >> >> 2) Then in the Rat Genome Database >> >> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found >> >> that in this position there is one gene: >> >> >> >> 735058 ?GENE ? ?Angptl4 angiopoietin-like 4 ? ? 7 ? ? ? 16261623 >> >> ?16267852 >> >> >> >> so the question is why in the "rgug4130a.db" database the R system >> >> gives to me "unknown function" when using the genomic location in >> >> ensamble and then in rgd it gives to me the Angptl4 gene! >> >> >> >> and there is a function in order to do to R to perform this kind of >> >> search automatically? (this why in my 500 genes there are 100 "unknow >> >> function" genes and it will be interesting to have a function that >> >> perform this kind of search automatically). >> >> >> >> >> >> Best regards to all and to whom answer to me. >> >> >> >> -- >> >> ----------------------------------------------------- >> >> Dr. Alberto Goldoni >> >> Parma, Italy >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >> >> >> -- >> ----------------------------------------------------- >> Dr. Alberto Goldoni >> Parma, Italy >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- ----------------------------------------------------- Dr. Alberto Goldoni Parma, Italy
ADD REPLY
0
Entering edit mode
Hi Alberto, So the way that the annotation packages work is that they take a probe to gene mapping from a manufacturer and then return to you the relevant gene information that is associated with the gene that is mapped to by a specific probe. If there was no mapping between the probe and the gene provided by the manufacturer, we have not attempted to guess one for you. But if you feel that you have better information about where these probes map to (perhaps you took the time to align them to the genome and see what genes are nearby as you did for this one here), then you could supply that mapping to the SQLForge code in the AnnotationDbi package and produce a new annotation package based on that. The details on how to do this are described here in one of the vignettes from the AnnotationDbi package http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/ inst/doc/SQLForge.pdf I hope this clarifies things, Marc On 05/10/2011 05:34 AM, Alberto Goldoni wrote: > @Davis > > You are right! But i have tryed to perform this kind of search: > > library("rgug4130a.db") > x<- rgug4130aENSEMBL > mapped_genes<- mappedkeys(x) > xx<- as.list(x[mapped_genes]) > > or this approach: > > x<- rgug4130aGENENAME > mapped_probes<- mappedkeys(x) > xx<- as.list(x[mapped_probes]) > > but the results are the same in some genes there is:"unknown function". > > I would like to know if there is a method in order to perform the > search using another database or directly to the Rat Genome Database > or using biomaRt...but i don't know how. > I have more or less 100 genes with an "unknown function" and it would > be very useful if there is a script or function in order to perform > automatically instead of serching genes one by one. > > > Best regards. > > 2011/5/10 Sean Davis<sdavis2 at="" mail.nih.gov="">: >> >> On Tue, May 10, 2011 at 8:17 AM, Alberto Goldoni >> <alberto.goldoni1975 at="" gmail.com=""> wrote: >>> @Vincent >>> >>> The chip used is the "rgug4130a" so i have to use the "rgug4130a.db" >>> database. >>> >>> In order to obtain the toptable this is my history: >>> >>> library(limma) >>> library(vsn) >>> targets<- readTargets("targets.txt") >>> RG<- read.maimages(targets$FileName, source="agilent") >>> MA<- normalizeBetweenArrays(RG, method="Aquantile") >>> contrast.matrix<- >>> >>> cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str) -(ref+str)"=c(1,-1)) >>> rownames(contrast.matrix)<- colnames(design) >>> fit<- lmFit(MA, design) >>> fit2<- contrasts.fit(fit, contrast.matrix) >>> fit2<- eBayes(fit2) >>> geni500<-topTable(fit2,number=500,adjust="BH") >>> >> Hi, Alberto. >> The data in your topTable result are taken from the feature extraction >> result file. In other words, rgug4130a.db is not used in what you show >> above. You could add to your annotation using either rgug4130a.db or >> biomaRt, but you will need to perform these steps yourself. As to why some >> of your probes do not appear to have annotation, you would probably need to >> contact Agilent as they are the source of your current annotation. >> Hope that helps, >> Sean >> >>>> sessionInfo() >>> R version 2.12.1 (2010-12-16) >>> Platform: i386-pc-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United >>> Kingdom.1252 >>> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C >>> [5] LC_TIME=English_United Kingdom.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] AnnotationDbi_1.12.0 Biobase_2.10.0 limma_3.6.9 >>> >>> loaded via a namespace (and not attached): >>> [1] DBI_0.2-5 RSQLite_0.9-4 tools_2.12.1 >>> >>> >>> >>> 2011/5/10 Vincent Carey<stvjc at="" channing.harvard.edu="">: >>>> 1) you did not provide sessionInfo(), which is critical for helping >>>> you to diagnose an issue that may pertain to software version -- >>>> revisions to annotation packages can have all sorts of consequences >>>> >>>> 2) i am not sure rgug4130.db has anything to do with this. >>>> >>>>> get("CB606456", revmap(rgug4130aSYMBOL)) >>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >>>> value for "CB606456" not found >>>> >>>> >>>> and so on. look at the featureData component of the object passed to >>>> lmFit -- the annotation may be in there. if this does not give >>>> clarification please give very explicity indication of how the >>>> topTable was generated, going back to the structure of the object >>>> passed to lmFit >>>> >>>> On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni >>>> <alberto.goldoni1975 at="" gmail.com=""> wrote: >>>>> Dear All, >>>>> i'm analyzing agilent microarrays with the "rgug4130a.db" database and >>>>> using the function:"topTable(fit2,number=500,adjust="BH")" i have >>>>> obtained 500 genes like these: >>>>> >>>>> Row Col ProbeUID ControlType ProbeName >>>>> GeneName SystematicName Description X.hda.str...ref. >>>>> X.ref.str...ref. X.hda.str...ref.str. AveExpr F P.Value >>>>> adj.P.Val >>>>> 16096 79 38 15309 0 A_43_P10328 CB606456 >>>>> CB606456 unknown >>>>> function 3.988290607 -0.951656306 4.939946913 >>>>> 10.29735936 36.77263264 0.000212298 0.641094595 >>>>> 8109 40 109 7609 0 A_42_P552092 203358_Rn >>>>> 203358_Rn Rat c-fos >>>>> mRNA. 5.670956889 4.413365374 1.257591514 13.47699544 >>>>> 33.20342601 0.000292278 0.641094595 >>>>> >>>>> but as you can see most genes like the first one - CB606456 - in the >>>>> DESCRPTION there is written "unknown function". >>>>> >>>>> So i have performed a very simply search. >>>>> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations >>>>> of DnaAlignFeature" it gives to me the Genomic location(strand): chr >>>>> 7:16261621-16262210 >>>>> 2) Then in the Rat Genome Database >>>>> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found >>>>> that in this position there is one gene: >>>>> >>>>> 735058 GENE Angptl4 angiopoietin-like 4 7 16261623 >>>>> 16267852 >>>>> >>>>> so the question is why in the "rgug4130a.db" database the R system >>>>> gives to me "unknown function" when using the genomic location in >>>>> ensamble and then in rgd it gives to me the Angptl4 gene! >>>>> >>>>> and there is a function in order to do to R to perform this kind of >>>>> search automatically? (this why in my 500 genes there are 100 "unknow >>>>> function" genes and it will be interesting to have a function that >>>>> perform this kind of search automatically). >>>>> >>>>> >>>>> Best regards to all and to whom answer to me. >>>>> >>>>> -- >>>>> ----------------------------------------------------- >>>>> Dr. Alberto Goldoni >>>>> Parma, Italy >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>> >>> >>> -- >>> ----------------------------------------------------- >>> Dr. Alberto Goldoni >>> Parma, Italy >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY
0
Entering edit mode
Very clear. Thanks Marc. 2011/5/10 Marc Carlson <mcarlson at="" fhcrc.org="">: > Hi Alberto, > > So the way that the annotation packages work is that they take a probe to > gene mapping from a manufacturer and then return to you the relevant gene > information that is associated with the gene that is mapped to by a specific > probe. > > If there was no mapping between the probe and the gene provided by the > manufacturer, we have not attempted to guess one for you. > > But if you feel that you have better information about where these probes > map to (perhaps you took the time to align them to the genome and see what > genes are nearby as you did ?for this one here), then you could supply that > mapping to the SQLForge code in the AnnotationDbi package and produce a new > annotation package based on that. ?The details on how to do this are > described here in one of the vignettes from the AnnotationDbi package > > http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDb i/inst/doc/SQLForge.pdf > > I hope this clarifies things, > > > ?Marc > > > > On 05/10/2011 05:34 AM, Alberto Goldoni wrote: >> >> @Davis >> >> You are right! But i have tryed to perform this kind of search: >> >> library("rgug4130a.db") >> x<- rgug4130aENSEMBL >> mapped_genes<- mappedkeys(x) >> xx<- as.list(x[mapped_genes]) >> >> or this approach: >> >> x<- rgug4130aGENENAME >> mapped_probes<- mappedkeys(x) >> xx<- as.list(x[mapped_probes]) >> >> but the results are the same in some genes there is:"unknown function". >> >> I would like to know if there is a method in order to perform the >> search using another database or directly to the Rat Genome Database >> or using biomaRt...but i don't know how. >> I have more or less 100 genes with an "unknown function" and it would >> be very useful if there is a script or function in order to perform >> automatically instead of serching genes one by one. >> >> >> Best regards. >> >> 2011/5/10 Sean Davis<sdavis2 at="" mail.nih.gov="">: >>> >>> On Tue, May 10, 2011 at 8:17 AM, Alberto Goldoni >>> <alberto.goldoni1975 at="" gmail.com=""> ?wrote: >>>> >>>> @Vincent >>>> >>>> The chip used is the "rgug4130a" so i have to use the "rgug4130a.db" >>>> database. >>>> >>>> In order to obtain the toptable this is my history: >>>> >>>> library(limma) >>>> library(vsn) >>>> targets<- readTargets("targets.txt") >>>> RG<- read.maimages(targets$FileName, source="agilent") >>>> MA<- normalizeBetweenArrays(RG, method="Aquantile") >>>> contrast.matrix<- >>>> >>>> >>>> cbind("(hda+str)-(ref)"=c(1,0),"(ref+str)-(ref)"=c(0,1),"(hda+str )-(ref+str)"=c(1,-1)) >>>> rownames(contrast.matrix)<- colnames(design) >>>> fit<- lmFit(MA, design) >>>> fit2<- contrasts.fit(fit, contrast.matrix) >>>> fit2<- eBayes(fit2) >>>> geni500<-topTable(fit2,number=500,adjust="BH") >>>> >>> Hi, Alberto. >>> The data in your topTable result are taken from the feature extraction >>> result file. ?In other words, rgug4130a.db is not used in what you show >>> above. ?You could add to your annotation using either rgug4130a.db or >>> biomaRt, but you will need to perform these steps yourself. ?As to why >>> some >>> of your probes do not appear to have annotation, you would probably need >>> to >>> contact Agilent as they are the source of your current annotation. >>> Hope that helps, >>> Sean >>> >>>>> sessionInfo() >>>> >>>> R version 2.12.1 (2010-12-16) >>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_United Kingdom.1252 ?LC_CTYPE=English_United >>>> Kingdom.1252 >>>> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C >>>> [5] LC_TIME=English_United Kingdom.1252 >>>> >>>> attached base packages: >>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>> >>>> other attached packages: >>>> [1] AnnotationDbi_1.12.0 Biobase_2.10.0 ? ? ? limma_3.6.9 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] DBI_0.2-5 ? ? RSQLite_0.9-4 tools_2.12.1 >>>> >>>> >>>> >>>> 2011/5/10 Vincent Carey<stvjc at="" channing.harvard.edu="">: >>>>> >>>>> 1) you did not provide sessionInfo(), which is critical for helping >>>>> you to diagnose an issue that may pertain to software version -- >>>>> revisions to annotation packages can have all sorts of consequences >>>>> >>>>> 2) i am not sure rgug4130.db has anything to do with this. >>>>> >>>>>> get("CB606456", revmap(rgug4130aSYMBOL)) >>>>> >>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : >>>>> ?value for "CB606456" not found >>>>> >>>>> >>>>> and so on. ?look at the featureData component of the object passed to >>>>> lmFit -- the annotation may be in there. ?if this does not give >>>>> clarification please give very explicity indication of how the >>>>> topTable was generated, going back to the structure of the object >>>>> passed to lmFit >>>>> >>>>> On Tue, May 10, 2011 at 5:30 AM, Alberto Goldoni >>>>> <alberto.goldoni1975 at="" gmail.com=""> ?wrote: >>>>>> >>>>>> Dear All, >>>>>> i'm analyzing agilent microarrays with the "rgug4130a.db" database and >>>>>> using the function:"topTable(fit2,number=500,adjust="BH")" i have >>>>>> obtained 500 genes like these: >>>>>> >>>>>> Row ? ? Col ? ? ProbeUID ? ? ? ?ControlType ? ? ProbeName >>>>>> GeneName ? ? ? ?SystematicName ?Description ? ? X.hda.str...ref. >>>>>> ?X.ref.str...ref. ? ? ? ?X.hda.str...ref.str. ? ?AveExpr F >>>>>> P.Value >>>>>> adj.P.Val >>>>>> 16096 ? 79 ? ? ?38 ? ? ?15309 ? 0 ? ? ? A_43_P10328 ? ? CB606456 >>>>>> ?CB606456 ? ? ? ?unknown >>>>>> function ? ? ? ?3.988290607 ? ? -0.951656306 ? ?4.939946913 >>>>>> 10.29735936 ? ? 36.77263264 ? ? 0.000212298 ? ? 0.641094595 >>>>>> 8109 ? ?40 ? ? ?109 ? ? 7609 ? ?0 ? ? ? A_42_P552092 ? ?203358_Rn >>>>>> 203358_Rn ? ? ? Rat c-fos >>>>>> mRNA. ? 5.670956889 ? ? 4.413365374 ? ? ? ?1.257591514 ? ? 13.47699544 >>>>>> ? ? 33.20342601 ? ? 0.000292278 ? ? 0.641094595 >>>>>> >>>>>> but as you can see most genes like the first one ?- CB606456 - ?in the >>>>>> DESCRPTION there is written "unknown function". >>>>>> >>>>>> So i have performed a very simply search. >>>>>> 1) First in ENSAMBLE using the GeneName "CB606456" with the "Locations >>>>>> of DnaAlignFeature" it gives to me the Genomic location(strand): chr >>>>>> 7:16261621-16262210 >>>>>> 2) Then in the Rat Genome Database >>>>>> (http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=735058) i have found >>>>>> that in this position there is one gene: >>>>>> >>>>>> 735058 ?GENE ? ?Angptl4 angiopoietin-like 4 ? ? 7 ? ? ? 16261623 >>>>>> ?16267852 >>>>>> >>>>>> so the question is why in the "rgug4130a.db" database the R system >>>>>> gives to me "unknown function" when using the genomic location in >>>>>> ensamble and then in rgd it gives to me the Angptl4 gene! >>>>>> >>>>>> and there is a function in order to do to R to perform this kind of >>>>>> search automatically? (this why in my 500 genes there are 100 "unknow >>>>>> function" genes and it will be interesting to have a function that >>>>>> perform this kind of search automatically). >>>>>> >>>>>> >>>>>> Best regards to all and to whom answer to me. >>>>>> >>>>>> -- >>>>>> ----------------------------------------------------- >>>>>> Dr. Alberto Goldoni >>>>>> Parma, Italy >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>> >>>> >>>> -- >>>> ----------------------------------------------------- >>>> Dr. Alberto Goldoni >>>> Parma, Italy >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- ----------------------------------------------------- Dr. Alberto Goldoni Parma, Italy
ADD REPLY
0
Entering edit mode
@alberto-goldoni-3477
Last seen 10.4 years ago
Dear All, i'm working with a agilent rat microarry, what i would to know if there is a script or a package that starting from the GeneName "CB606456" it gives to me the Genomic location(strand) for example chr 7:16261621-16262210 as you can do pasting the GeneName "CB606456" to the ensamble database. this because i have hundred of gene names and i would know the genomic location in a automatic way. best regards -- ----------------------------------------------------- Dr. Alberto Goldoni Parma, Italy
ADD COMMENT
0
Entering edit mode
Hi Alberto, On 5/25/2011 5:35 AM, Alberto Goldoni wrote: > Dear All, > i'm working with a agilent rat microarry, what i would to know if > there is a script or a package that starting from the GeneName > "CB606456" it gives to me the Genomic location(strand) for example > chr 7:16261621-16262210 as you can do pasting the GeneName "CB606456" > to the ensamble database. That's not a gene name, as it points to an EST, which isn't AFAIK a gene. Instead it is an EST ID or some such thing. The problem with the sort of query you want is that most of the available tools aren't as sophisticated as the Ensembl browser, which seamlessly queries lots of different databases when you paste something in the query window. If you want to do this sort of thing programmatically, you will have to know for sure what sort of inputs you are using, as you have to know which table to query. Are all of the IDs you have similar to the one you supply? If so, you will need to figure out exactly what they are (Ensembl isn't particularly enlightening), and then maybe we can help you to do your query. If you have other IDs, like entrez gene or ensembl, that would be much better. Best, Jim > this because i have hundred of gene names and i would know the genomic > location in a automatic way. > > best regards > > > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY

Login before adding your answer.

Traffic: 630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6