Changes in annotations?
2
0
Entering edit mode
Alex Sanchez ▴ 90
@alex-sanchez-3227
Last seen 8.1 years ago
Hello I have had to review recently an analysis I did some time ago. This was done on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I have re-run the analyses using R 2.9 and BioC 2.4 (sessionInfo below). I have been surprised by the changes in the annotations: Many probesets that had had an annotation have become NA's whereas some have changed their symbol and their Entrez gene. To be specific I summarize my question with the top genes of my list The list I obtained 2 years ago is: probeset locuslink symbol 238900_at 3123 HLA-DRB1 232583_at 8440 NCK2 236307_at 60468 BACH2 223620_at 2857 GPR34 219759_at 64167 LRAP 201702_s_at 5514 PPP1R10 232882_at 2308 FOXO1A 213446_s_at 8826 IQGAP1 234033_at 9693 RAPGEF2 243006_at 2534 FYN 244648_at 54520 CCDC93 243691_at 23142 DCUN1D4 239264_at 60412 EXOC4 243546_at 143686 SESN3 205239_at 374 AREG 1565703_at 55520 ELAC1 244061_at 55843 ARHGAP15 230505_at 26037 SIPA1L1 242688_at 9320 TRIP12 1556474_a_at 285097 FLJ38379 232614_at 596 BCL2 1565689_at 3839 KPNA3 236685_at NA NA 225173_at 93663 ARHGAP18 241893_at 4249 MGAT5 I used the following code to reproduce the issue with the annotations: ##################################################################### ## Verification using R 2.9 & BioC 2.4 ##################################################################### > probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , "219759_at" , + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", "243006_at" , + "244648_at" , "243691_at" , "239264_at" , "243546_at" , "205239_at" , + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , "1556474_a_at", + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , "241893_at") > > library(hgu133plus2.db) > library(annotate) > > entrezs<- getEG(probes, "hgu133plus2") > symbols<- getSYMBOL(probes, "hgu133plus2") > sel2<- cbind(probes, entrezs, symbols) > sel2 probes entrezs symbols 238900_at "238900_at" "100133484" "LOC100133484" 232583_at "232583_at" NA NA 236307_at "236307_at" NA NA 223620_at "223620_at" "2857" "GPR34" 219759_at "219759_at" "64167" "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" 232882_at "232882_at" NA NA 213446_s_at "213446_s_at" "8826" "IQGAP1" 234033_at "234033_at" NA NA 243006_at "243006_at" NA NA 244648_at "244648_at" NA NA 243691_at "243691_at" NA NA 239264_at "239264_at" NA NA 243546_at "243546_at" NA NA 205239_at "205239_at" "374" "AREG" 1565703_at "1565703_at" "4089" "SMAD4" 244061_at "244061_at" NA NA 230505_at "230505_at" "145474" "LOC145474" 242688_at "242688_at" NA NA 1556474_a_at "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" NA NA 1565689_at "1565689_at" NA NA 236685_at "236685_at" NA NA 225173_at "225173_at" "93663" "ARHGAP18" 241893_at "241893_at" NA NA > sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.22.0 hgu133plus2.db_2.2.11 RSQLite_0.7-1 DBI_0.2-4 AnnotationDbi_1.6.0 Biobase_2.4.1 loaded via a namespace (and not attached): [1] xtable_1.5-5 ############################################# Many probesets seem to have changed. Can someone explain to me what is happening (or what may I be doing wrong)? The same code does not work with R 2.4 but if I change hgu133plus2.db by hgu133plus2 and getEG by getLL I obtain the original results: ############################################### ### Review of annotatons with R 2.4 and BioC 1.9 ############################################### ### This code is executed on a clean new session with R 2. and BioC 1.9 > probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , "219759_at" , + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", "243006_at" , + "244648_at" , "243691_at" , "239264_at" , "243546_at" , "205239_at" , + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , "1556474_a_at", + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , "241893_at") > >LLs<- getLL(rownames(sel), "hgu133plus2") >symbols<- getSYMBOL(rownames(sel), "hgu133plus2") >sel1<- cbind(probes, LLs, symbols) >sel1 probes LLs symbols 238900_at "238900_at" "3123" "HLA-DRB1" 232583_at "232583_at" "8440" "NCK2" 236307_at "236307_at" "60468" "BACH2" 223620_at "223620_at" "2857" "GPR34" 219759_at "219759_at" "64167" "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" 232882_at "232882_at" "2308" "FOXO1" 213446_s_at "213446_s_at" "8826" "IQGAP1" 234033_at "234033_at" "9693" "RAPGEF2" 243006_at "243006_at" "2534" "FYN" 244648_at "244648_at" "54520" "CCDC93" 243691_at "243691_at" "23142" "DCUN1D4" 239264_at "239264_at" "60412" "EXOC4" 243546_at "243546_at" "143686" "SESN3" 205239_at "205239_at" "374" "AREG" 1565703_at "1565703_at" "4089" "SMAD4" 244061_at "244061_at" "55843" "ARHGAP15" 230505_at "230505_at" "145474" "LOC145474" 242688_at "242688_at" "9320" "TRIP12" 1556474_a_at "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" "596" "BCL2" 1565689_at "1565689_at" "3839" "KPNA3" 236685_at "236685_at" NA NA 225173_at "225173_at" "93663" "ARHGAP18" 241893_at "241893_at" "4249" "MGAT5" > sessionInfo() R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETARY= Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252 attached base packages: [1] "tools" "stats" "graphics" "grDevices" [5] "utils" "datasets" "methods" "base" other attached packages: annotate Biobase hgu133plus2 "1.12.1" "1.12.2" "1.14.0" ######################################################## In summary. If I use R 2.4/BioC 1.9 I obtain the same results I ibtained 2 years ago, but If I do the same steps using R2.9/BioC2.4 the results change dramatically. I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in R 2.8 (results not shown here). BioC 2.0 yield the same as 1.9 and BioC 2.2 the same as 2.4, Any help to understand what's happening would be appreciated Alex Sanchez ---------------------------------------------------------------------- ------------------------------- Dr. Alex Sánchez. Statistics Department. University of Barcelona. Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain asanchez_at_ub.edu Statistics and Bioinformatics Unit Institut de Recerca. Hospital Universitari Vall 'Hebron Passeig Vall d'Hebron 112-119. 08034 Barcelona asanchez_at_ir.vhebron.net ---------------------------------------------------------------------- ------------------------------ [[alternative HTML version deleted]]
Annotation hgu133plus2 Biobase annotate Annotation hgu133plus2 Biobase annotate • 851 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States
Hi Alex, This is a question that comes up on the Bioc list fairly regularly, and the answer is in two parts: First, the annotations supplied in the various metadata packages supplied by BioC are *not* our annotations, but are simply a re-packaging of data we collect from various sources. As an example, we use the mappings of Affymetrix Probe ID to Entrez Gene ID from the annotation csv files you can download from the Affy website. We then map the Entrez Gene IDs to other annotation using primarily NCBI data. So if you go to Affy's netaffx site (free registration required) and query on say, 238900_at, you get this: https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=HG- U133_PLUS_2%3A238900_AT And you will note that the first Entrez Gene ID listed there is 100133484, which happens to be a defunct ID. However, this is the first of many listed there (and we need a one-to-one mapping), so we chose that one. A more likely Entrez Gene ID can be found further down the list, but we simply don't have the resources to figure out if there is a better choice in that list (for every reporter on every Affy chip we annotate). Nor do we have the resources to ensure that any of the mappings that Affy make are reasonable to begin with. We have to trust that they (with *way* more resources that us) are doing a reasonable job. The second part of the answer has to do with the 'moving target' aspect of Biological annotations. These data change all the time, and there is the recurring question of whether one should do an analysis and 'freeze' it to that point in time, or should the annotations be updated on a regular basis, with the realization that things can and will change? Without looking at each reporter ID you list, I can't say if the changes are due to Affy changing their annotation csv files, or to changing knowledge of the genome, but I suspect it is a combination of the two. Best, Jim Alex Sanchez wrote: > Hello > > I have had to review recently an analysis I did some time ago. This was done on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I have re-run the analyses using R 2.9 and BioC 2.4 (sessionInfo below). > I have been surprised by the changes in the annotations: Many probesets that had had an annotation have become NA's whereas some have changed their symbol and their Entrez gene. > > To be specific I summarize my question with the top genes of my list > > The list I obtained 2 years ago is: > > probeset locuslink symbol > 238900_at 3123 HLA-DRB1 > 232583_at 8440 NCK2 > 236307_at 60468 BACH2 > 223620_at 2857 GPR34 > 219759_at 64167 LRAP > 201702_s_at 5514 PPP1R10 > 232882_at 2308 FOXO1A > 213446_s_at 8826 IQGAP1 > 234033_at 9693 RAPGEF2 > 243006_at 2534 FYN > 244648_at 54520 CCDC93 > 243691_at 23142 DCUN1D4 > 239264_at 60412 EXOC4 > 243546_at 143686 SESN3 > 205239_at 374 AREG > 1565703_at 55520 ELAC1 > 244061_at 55843 ARHGAP15 > 230505_at 26037 SIPA1L1 > 242688_at 9320 TRIP12 > 1556474_a_at 285097 FLJ38379 > 232614_at 596 BCL2 > 1565689_at 3839 KPNA3 > 236685_at NA NA > 225173_at 93663 ARHGAP18 > 241893_at 4249 MGAT5 > > I used the following code to reproduce the issue with the annotations: > > > ##################################################################### > ## Verification using R 2.9 & BioC 2.4 > ##################################################################### > >> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , "219759_at" , > + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", "243006_at" , > + "244648_at" , "243691_at" , "239264_at" , "243546_at" , "205239_at" , > + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , "1556474_a_at", > + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , "241893_at") >> library(hgu133plus2.db) >> library(annotate) >> >> entrezs<- getEG(probes, "hgu133plus2") >> symbols<- getSYMBOL(probes, "hgu133plus2") >> sel2<- cbind(probes, entrezs, symbols) >> sel2 > probes entrezs symbols > 238900_at "238900_at" "100133484" "LOC100133484" > 232583_at "232583_at" NA NA > 236307_at "236307_at" NA NA > 223620_at "223620_at" "2857" "GPR34" > 219759_at "219759_at" "64167" "ERAP2" > 201702_s_at "201702_s_at" "5514" "PPP1R10" > 232882_at "232882_at" NA NA > 213446_s_at "213446_s_at" "8826" "IQGAP1" > 234033_at "234033_at" NA NA > 243006_at "243006_at" NA NA > 244648_at "244648_at" NA NA > 243691_at "243691_at" NA NA > 239264_at "239264_at" NA NA > 243546_at "243546_at" NA NA > 205239_at "205239_at" "374" "AREG" > 1565703_at "1565703_at" "4089" "SMAD4" > 244061_at "244061_at" NA NA > 230505_at "230505_at" "145474" "LOC145474" > 242688_at "242688_at" NA NA > 1556474_a_at "1556474_a_at" "285097" "FLJ38379" > 232614_at "232614_at" NA NA > 1565689_at "1565689_at" NA NA > 236685_at "236685_at" NA NA > 225173_at "225173_at" "93663" "ARHGAP18" > 241893_at "241893_at" NA NA >> sessionInfo() > R version 2.9.0 (2009-04-17) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] annotate_1.22.0 hgu133plus2.db_2.2.11 RSQLite_0.7-1 DBI_0.2-4 AnnotationDbi_1.6.0 Biobase_2.4.1 > > loaded via a namespace (and not attached): > [1] xtable_1.5-5 > ############################################# > > Many probesets seem to have changed. > Can someone explain to me what is happening (or what may I be doing wrong)? > > The same code does not work with R 2.4 but if I change hgu133plus2.db by hgu133plus2 and getEG by getLL I obtain the original results: > > ############################################### > ### Review of annotatons with R 2.4 and BioC 1.9 > ############################################### > > ### This code is executed on a clean new session with R 2. and BioC 1.9 > >> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , "219759_at" , > + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", "243006_at" , > + "244648_at" , "243691_at" , "239264_at" , "243546_at" , "205239_at" , > + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , "1556474_a_at", > + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , "241893_at") >> LLs<- getLL(rownames(sel), "hgu133plus2") >> symbols<- getSYMBOL(rownames(sel), "hgu133plus2") >> sel1<- cbind(probes, LLs, symbols) >> sel1 > probes LLs symbols > 238900_at "238900_at" "3123" "HLA-DRB1" > 232583_at "232583_at" "8440" "NCK2" > 236307_at "236307_at" "60468" "BACH2" > 223620_at "223620_at" "2857" "GPR34" > 219759_at "219759_at" "64167" "ERAP2" > 201702_s_at "201702_s_at" "5514" "PPP1R10" > 232882_at "232882_at" "2308" "FOXO1" > 213446_s_at "213446_s_at" "8826" "IQGAP1" > 234033_at "234033_at" "9693" "RAPGEF2" > 243006_at "243006_at" "2534" "FYN" > 244648_at "244648_at" "54520" "CCDC93" > 243691_at "243691_at" "23142" "DCUN1D4" > 239264_at "239264_at" "60412" "EXOC4" > 243546_at "243546_at" "143686" "SESN3" > 205239_at "205239_at" "374" "AREG" > 1565703_at "1565703_at" "4089" "SMAD4" > 244061_at "244061_at" "55843" "ARHGAP15" > 230505_at "230505_at" "145474" "LOC145474" > 242688_at "242688_at" "9320" "TRIP12" > 1556474_a_at "1556474_a_at" "285097" "FLJ38379" > 232614_at "232614_at" "596" "BCL2" > 1565689_at "1565689_at" "3839" "KPNA3" > 236685_at "236685_at" NA NA > 225173_at "225173_at" "93663" "ARHGAP18" > 241893_at "241893_at" "4249" "MGAT5" > >> sessionInfo() > R version 2.4.1 (2006-12-18) > i386-pc-mingw32 > > locale: > LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETAR Y=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252 > > attached base packages: > [1] "tools" "stats" "graphics" "grDevices" > [5] "utils" "datasets" "methods" "base" > > other attached packages: > annotate Biobase hgu133plus2 > "1.12.1" "1.12.2" "1.14.0" > > ######################################################## > > In summary. If I use R 2.4/BioC 1.9 I obtain the same results I ibtained 2 years ago, but If I do the same steps using R2.9/BioC2.4 the results change dramatically. > I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in R 2.8 (results not shown here). BioC 2.0 yield the same as 1.9 and BioC 2.2 the same as 2.4, > > Any help to understand what's happening would be appreciated > > Alex Sanchez > > -------------------------------------------------------------------- --------------------------------- > Dr. Alex S?nchez. Statistics Department. University of Barcelona. > Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain > asanchez_at_ub.edu > Statistics and Bioinformatics Unit > Institut de Recerca. Hospital Universitari Vall 'Hebron > Passeig Vall d'Hebron 112-119. 08034 Barcelona > asanchez_at_ir.vhebron.net > -------------------------------------------------------------------- -------------------------------- > > > > > [[alternative HTML version deleted]] > > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT
0
Entering edit mode
For probe 238900_at the Affy csv file for symbol is HLA-DRB1 /// HLA-DRB2 /// HLA-DRB3 /// HLA-DRB4 /// HLA-DRB5 /// LOC100133484 /// LOC100133661 /// LOC100133811 /// LOC730415 /// RNASE2 /// ZNF749 And for Gene ID is 100133484 /// 100133661 /// 100133811 /// 3123 /// 3124 /// 3125 /// 3126 /// 3127 /// 388567 /// 6036 /// 730415 So it is confusing even without the "moving target" matter If a probe has multiple choices for symbol and ID, what would happen if the response is "multiple, you must choose one" Or The response is to select all of the identifiers Thank you > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > Date: Mon, 06 Jul 2009 09:58:05 -0400 > To: Alex Sanchez <asanchez at="" ub.edu=""> > Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] Changes in annotations? > > Hi Alex, > > This is a question that comes up on the Bioc list fairly regularly, and > the answer is in two parts: > > First, the annotations supplied in the various metadata packages > supplied by BioC are *not* our annotations, but are simply a > re-packaging of data we collect from various sources. As an example, we > use the mappings of Affymetrix Probe ID to Entrez Gene ID from the > annotation csv files you can download from the Affy website. We then map > the Entrez Gene IDs to other annotation using primarily NCBI data. So if > you go to Affy's netaffx site (free registration required) and query on > say, 238900_at, you get this: > > https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=HG- U133_PLUS_2% > 3A238900_AT > > And you will note that the first Entrez Gene ID listed there is > 100133484, which happens to be a defunct ID. However, this is the first > of many listed there (and we need a one-to-one mapping), so we chose > that one. A more likely Entrez Gene ID can be found further down the > list, but we simply don't have the resources to figure out if there is a > better choice in that list (for every reporter on every Affy chip we > annotate). Nor do we have the resources to ensure that any of the > mappings that Affy make are reasonable to begin with. We have to trust > that they (with *way* more resources that us) are doing a reasonable job. > > The second part of the answer has to do with the 'moving target' aspect > of Biological annotations. These data change all the time, and there is > the recurring question of whether one should do an analysis and 'freeze' > it to that point in time, or should the annotations be updated on a > regular basis, with the realization that things can and will change? > > Without looking at each reporter ID you list, I can't say if the changes > are due to Affy changing their annotation csv files, or to changing > knowledge of the genome, but I suspect it is a combination of the two. > > Best, > > Jim > > > > > > > Alex Sanchez wrote: >> Hello >> >> I have had to review recently an analysis I did some time ago. This was done >> on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I have re- run the >> analyses using R 2.9 and BioC 2.4 (sessionInfo below). >> I have been surprised by the changes in the annotations: Many probesets that >> had had an annotation have become NA's whereas some have changed their symbol >> and their Entrez gene. >> >> To be specific I summarize my question with the top genes of my list >> >> The list I obtained 2 years ago is: >> >> probeset locuslink symbol >> 238900_at 3123 HLA-DRB1 >> 232583_at 8440 NCK2 >> 236307_at 60468 BACH2 >> 223620_at 2857 GPR34 >> 219759_at 64167 LRAP >> 201702_s_at 5514 PPP1R10 >> 232882_at 2308 FOXO1A >> 213446_s_at 8826 IQGAP1 >> 234033_at 9693 RAPGEF2 >> 243006_at 2534 FYN >> 244648_at 54520 CCDC93 >> 243691_at 23142 DCUN1D4 >> 239264_at 60412 EXOC4 >> 243546_at 143686 SESN3 >> 205239_at 374 AREG >> 1565703_at 55520 ELAC1 >> 244061_at 55843 ARHGAP15 >> 230505_at 26037 SIPA1L1 >> 242688_at 9320 TRIP12 >> 1556474_a_at 285097 FLJ38379 >> 232614_at 596 BCL2 >> 1565689_at 3839 KPNA3 >> 236685_at NA NA >> 225173_at 93663 ARHGAP18 >> 241893_at 4249 MGAT5 >> >> I used the following code to reproduce the issue with the annotations: >> >> >> ##################################################################### >> ## Verification using R 2.9 & BioC 2.4 >> ##################################################################### >> >>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , "219759_at" >>> , >> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >> "243006_at" , >> + "244648_at" , "243691_at" , "239264_at" , "243546_at" , >> "205239_at" , >> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >> "1556474_a_at", >> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >> "241893_at") >>> library(hgu133plus2.db) >>> library(annotate) >>> >>> entrezs<- getEG(probes, "hgu133plus2") >>> symbols<- getSYMBOL(probes, "hgu133plus2") >>> sel2<- cbind(probes, entrezs, symbols) >>> sel2 >> probes entrezs symbols >> 238900_at "238900_at" "100133484" "LOC100133484" >> 232583_at "232583_at" NA NA >> 236307_at "236307_at" NA NA >> 223620_at "223620_at" "2857" "GPR34" >> 219759_at "219759_at" "64167" "ERAP2" >> 201702_s_at "201702_s_at" "5514" "PPP1R10" >> 232882_at "232882_at" NA NA >> 213446_s_at "213446_s_at" "8826" "IQGAP1" >> 234033_at "234033_at" NA NA >> 243006_at "243006_at" NA NA >> 244648_at "244648_at" NA NA >> 243691_at "243691_at" NA NA >> 239264_at "239264_at" NA NA >> 243546_at "243546_at" NA NA >> 205239_at "205239_at" "374" "AREG" >> 1565703_at "1565703_at" "4089" "SMAD4" >> 244061_at "244061_at" NA NA >> 230505_at "230505_at" "145474" "LOC145474" >> 242688_at "242688_at" NA NA >> 1556474_a_at "1556474_a_at" "285097" "FLJ38379" >> 232614_at "232614_at" NA NA >> 1565689_at "1565689_at" NA NA >> 236685_at "236685_at" NA NA >> 225173_at "225173_at" "93663" "ARHGAP18" >> 241893_at "241893_at" NA NA >>> sessionInfo() >> R version 2.9.0 (2009-04-17) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] annotate_1.22.0 hgu133plus2.db_2.2.11 RSQLite_0.7-1 >> DBI_0.2-4 AnnotationDbi_1.6.0 Biobase_2.4.1 >> >> loaded via a namespace (and not attached): >> [1] xtable_1.5-5 >> ############################################# >> >> Many probesets seem to have changed. >> Can someone explain to me what is happening (or what may I be doing wrong)? >> >> The same code does not work with R 2.4 but if I change hgu133plus2.db by >> hgu133plus2 and getEG by getLL I obtain the original results: >> >> ############################################### >> ### Review of annotatons with R 2.4 and BioC 1.9 >> ############################################### >> >> ### This code is executed on a clean new session with R 2. and BioC 1.9 >> >>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , "219759_at" >>> , >> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >> "243006_at" , >> + "244648_at" , "243691_at" , "239264_at" , "243546_at" , >> "205239_at" , >> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >> "1556474_a_at", >> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >> "241893_at") >>> LLs<- getLL(rownames(sel), "hgu133plus2") >>> symbols<- getSYMBOL(rownames(sel), "hgu133plus2") >>> sel1<- cbind(probes, LLs, symbols) >>> sel1 >> probes LLs symbols >> 238900_at "238900_at" "3123" "HLA-DRB1" >> 232583_at "232583_at" "8440" "NCK2" >> 236307_at "236307_at" "60468" "BACH2" >> 223620_at "223620_at" "2857" "GPR34" >> 219759_at "219759_at" "64167" "ERAP2" >> 201702_s_at "201702_s_at" "5514" "PPP1R10" >> 232882_at "232882_at" "2308" "FOXO1" >> 213446_s_at "213446_s_at" "8826" "IQGAP1" >> 234033_at "234033_at" "9693" "RAPGEF2" >> 243006_at "243006_at" "2534" "FYN" >> 244648_at "244648_at" "54520" "CCDC93" >> 243691_at "243691_at" "23142" "DCUN1D4" >> 239264_at "239264_at" "60412" "EXOC4" >> 243546_at "243546_at" "143686" "SESN3" >> 205239_at "205239_at" "374" "AREG" >> 1565703_at "1565703_at" "4089" "SMAD4" >> 244061_at "244061_at" "55843" "ARHGAP15" >> 230505_at "230505_at" "145474" "LOC145474" >> 242688_at "242688_at" "9320" "TRIP12" >> 1556474_a_at "1556474_a_at" "285097" "FLJ38379" >> 232614_at "232614_at" "596" "BCL2" >> 1565689_at "1565689_at" "3839" "KPNA3" >> 236685_at "236685_at" NA NA >> 225173_at "225173_at" "93663" "ARHGAP18" >> 241893_at "241893_at" "4249" "MGAT5" >> >>> sessionInfo() >> R version 2.4.1 (2006-12-18) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETA RY=Spanish >> _Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252 >> >> attached base packages: >> [1] "tools" "stats" "graphics" "grDevices" >> [5] "utils" "datasets" "methods" "base" >> >> other attached packages: >> annotate Biobase hgu133plus2 >> "1.12.1" "1.12.2" "1.14.0" >> >> ######################################################## >> >> In summary. If I use R 2.4/BioC 1.9 I obtain the same results I ibtained 2 >> years ago, but If I do the same steps using R2.9/BioC2.4 the results change >> dramatically. >> I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in R 2.8 >> (results not shown here). BioC 2.0 yield the same as 1.9 and BioC 2.2 the >> same as 2.4, >> >> Any help to understand what's happening would be appreciated >> >> Alex Sanchez >> >> ------------------------------------------------------------------- ---------- >> ------------------------ >> Dr. Alex S?nchez. Statistics Department. University of Barcelona. >> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >> asanchez_at_ub.edu >> Statistics and Bioinformatics Unit >> Institut de Recerca. Hospital Universitari Vall 'Hebron >> Passeig Vall d'Hebron 112-119. 08034 Barcelona >> asanchez_at_ir.vhebron.net >> ------------------------------------------------------------------- ---------- >> ----------------------- >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ------------------------------------------------------------------- ----- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Alex Sanchez ▴ 90
@alex-sanchez-3227
Last seen 8.1 years ago
Hi James and thanks for the help. I know, of course, that what Bioconductor does is to take the annotations from public data sources. I will now turn to affymetrix to see if someone there, or in their forums, can explain why so many annotations have been turned into "NA's" There seem to be two problems here - The first one, pointed by Loren in his message today is synonims. For instance the first probeset in my list was 238900_at, In the first version I used (BioC 1.9) the Gene Symbol provide by getSymbol was HLA-DRB1 whereas now (BioC 2.4) it is LOC100133484 However the original symbol is still in the Affymetrix annotation file "HG-U133_Plus_2.na28.annot.csv". It seems as if the new symbol is intended to show the relation with the new Entrez ID (100133484). In any case it is ennoying but it can be assumed. - What seems worse is what happens to some annotations: If, for instance, I take the second probeset in my list, 232583_at, and I look for it in the affy annotations file I find that apart some links to databases as genebank it does not have a gene symbol or an entrez (or most annotations anymore). In the previous version of the annotations this probeset was one of two associated with gene NCK2 (232583_at;203315_at) . The problem is that the second probeset (203315_at) was not selected by the analysis. That is the only evidence suggesting that gene NCK2 was differentially expressed (232583_at) does not suggest it anymore. Last, but in any case least, this is not happening to a few probesets. My original list had 417 probesets. 210 have changed/synonimized their gene symbols, but from these 210, 160 have become NA's Seems too many to feel comfortable :-( Thanks for the help Alex ...................................................................... ......................................... Dr. Alex S?nchez. Associate Professor. Statistics Department. University of Barcelona. Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain asanchez_at_ub.edu Statistics and Bioinformatics Unit Institut de Recerca. Hospital Universitari Vall 'Hebron Passeig Vall d'Hebron 112-119. 08034 Barcelona asanchez_at_ir.vhebron.net ...................................................................... ......................................... ----- Original Message ----- From: "James W. MacDonald" <jmacdon@med.umich.edu> To: "Alex Sanchez" <asanchez at="" ub.edu=""> Cc: <bioconductor at="" stat.math.ethz.ch=""> Sent: Monday, July 06, 2009 3:58 PM Subject: Re: [BioC] Changes in annotations? > Hi Alex, > > This is a question that comes up on the Bioc list fairly regularly, and > the answer is in two parts: > > First, the annotations supplied in the various metadata packages supplied > by BioC are *not* our annotations, but are simply a re-packaging of data > we collect from various sources. As an example, we use the mappings of > Affymetrix Probe ID to Entrez Gene ID from the annotation csv files you > can download from the Affy website. We then map the Entrez Gene IDs to > other annotation using primarily NCBI data. So if you go to Affy's netaffx > site (free registration required) and query on say, 238900_at, you get > this: > > https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=HG- U133_PLUS_2%3A238900_AT > > And you will note that the first Entrez Gene ID listed there is 100133484, > which happens to be a defunct ID. However, this is the first of many > listed there (and we need a one-to-one mapping), so we chose that one. A > more likely Entrez Gene ID can be found further down the list, but we > simply don't have the resources to figure out if there is a better choice > in that list (for every reporter on every Affy chip we annotate). Nor do > we have the resources to ensure that any of the mappings that Affy make > are reasonable to begin with. We have to trust that they (with *way* more > resources that us) are doing a reasonable job. > > The second part of the answer has to do with the 'moving target' aspect of > Biological annotations. These data change all the time, and there is the > recurring question of whether one should do an analysis and 'freeze' it to > that point in time, or should the annotations be updated on a regular > basis, with the realization that things can and will change? > > Without looking at each reporter ID you list, I can't say if the changes > are due to Affy changing their annotation csv files, or to changing > knowledge of the genome, but I suspect it is a combination of the two. > > Best, > > Jim > > > > > > > Alex Sanchez wrote: >> Hello >> >> I have had to review recently an analysis I did some time ago. This was >> done on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I have >> re-run the analyses using R 2.9 and BioC 2.4 (sessionInfo below). >> I have been surprised by the changes in the annotations: Many probesets >> that had had an annotation have become NA's whereas some have changed >> their symbol and their Entrez gene. >> >> To be specific I summarize my question with the top genes of my list >> >> The list I obtained 2 years ago is: >> >> probeset locuslink symbol >> 238900_at 3123 HLA-DRB1 >> 232583_at 8440 NCK2 >> 236307_at 60468 BACH2 >> 223620_at 2857 GPR34 >> 219759_at 64167 LRAP >> 201702_s_at 5514 PPP1R10 >> 232882_at 2308 FOXO1A >> 213446_s_at 8826 IQGAP1 >> 234033_at 9693 RAPGEF2 >> 243006_at 2534 FYN >> 244648_at 54520 CCDC93 >> 243691_at 23142 DCUN1D4 >> 239264_at 60412 EXOC4 >> 243546_at 143686 SESN3 >> 205239_at 374 AREG >> 1565703_at 55520 ELAC1 >> 244061_at 55843 ARHGAP15 >> 230505_at 26037 SIPA1L1 >> 242688_at 9320 TRIP12 >> 1556474_a_at 285097 FLJ38379 >> 232614_at 596 BCL2 >> 1565689_at 3839 KPNA3 >> 236685_at NA NA >> 225173_at 93663 ARHGAP18 >> 241893_at 4249 MGAT5 >> >> I used the following code to reproduce the issue with the annotations: >> >> >> ##################################################################### >> ## Verification using R 2.9 & BioC 2.4 >> ##################################################################### >> >>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>> "219759_at" , >> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >> "243546_at" , "205239_at" , >> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >> "1556474_a_at", >> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >> "241893_at") >>> library(hgu133plus2.db) >>> library(annotate) >>> >>> entrezs<- getEG(probes, "hgu133plus2") >>> symbols<- getSYMBOL(probes, "hgu133plus2") >>> sel2<- cbind(probes, entrezs, symbols) >>> sel2 >> probes entrezs symbols 238900_at >> "238900_at" "100133484" "LOC100133484" >> 232583_at "232583_at" NA NA 236307_at >> "236307_at" NA NA 223620_at "223620_at" >> "2857" "GPR34" 219759_at "219759_at" "64167" "ERAP2" >> 201702_s_at "201702_s_at" "5514" "PPP1R10" 232882_at >> "232882_at" NA NA 213446_s_at "213446_s_at" >> "8826" "IQGAP1" 234033_at "234033_at" NA NA >> 243006_at "243006_at" NA NA 244648_at >> "244648_at" NA NA 243691_at "243691_at" NA >> NA 239264_at "239264_at" NA NA >> 243546_at "243546_at" NA NA 205239_at >> "205239_at" "374" "AREG" 1565703_at "1565703_at" >> "4089" "SMAD4" 244061_at "244061_at" NA NA >> 230505_at "230505_at" "145474" "LOC145474" 242688_at >> "242688_at" NA NA 1556474_a_at "1556474_a_at" >> "285097" "FLJ38379" 232614_at "232614_at" NA NA >> 1565689_at "1565689_at" NA NA 236685_at >> "236685_at" NA NA 225173_at "225173_at" >> "93663" "ARHGAP18" 241893_at "241893_at" NA NA >>> sessionInfo() >> R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: >> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> other attached packages: >> [1] annotate_1.22.0 hgu133plus2.db_2.2.11 RSQLite_0.7-1 >> DBI_0.2-4 AnnotationDbi_1.6.0 Biobase_2.4.1 loaded >> via a namespace (and not attached): >> [1] xtable_1.5-5 >> ############################################# >> >> Many probesets seem to have changed. >> Can someone explain to me what is happening (or what may I be doing >> wrong)? >> >> The same code does not work with R 2.4 but if I change hgu133plus2.db by >> hgu133plus2 and getEG by getLL I obtain the original results: >> >> ############################################### >> ### Review of annotatons with R 2.4 and BioC 1.9 >> ############################################### >> >> ### This code is executed on a clean new session with R 2. and BioC 1.9 >> >>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>> "219759_at" , >> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >> "243546_at" , "205239_at" , >> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >> "1556474_a_at", >> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >> "241893_at") >>> LLs<- getLL(rownames(sel), "hgu133plus2") >>> symbols<- getSYMBOL(rownames(sel), "hgu133plus2") >>> sel1<- cbind(probes, LLs, symbols) >>> sel1 >> probes LLs symbols 238900_at "238900_at" >> "3123" "HLA-DRB1" 232583_at "232583_at" "8440" "NCK2" >> 236307_at "236307_at" "60468" "BACH2" 223620_at "223620_at" >> "2857" "GPR34" 219759_at "219759_at" "64167" "ERAP2" >> 201702_s_at "201702_s_at" "5514" "PPP1R10" 232882_at "232882_at" >> "2308" "FOXO1" 213446_s_at "213446_s_at" "8826" "IQGAP1" >> 234033_at "234033_at" "9693" "RAPGEF2" 243006_at "243006_at" >> "2534" "FYN" 244648_at "244648_at" "54520" "CCDC93" >> 243691_at "243691_at" "23142" "DCUN1D4" 239264_at "239264_at" >> "60412" "EXOC4" 243546_at "243546_at" "143686" "SESN3" >> 205239_at "205239_at" "374" "AREG" 1565703_at "1565703_at" >> "4089" "SMAD4" 244061_at "244061_at" "55843" "ARHGAP15" >> 230505_at "230505_at" "145474" "LOC145474" >> 242688_at "242688_at" "9320" "TRIP12" 1556474_a_at >> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" "596" >> "BCL2" 1565689_at "1565689_at" "3839" "KPNA3" 236685_at >> "236685_at" NA NA 225173_at "225173_at" "93663" >> "ARHGAP18" 241893_at "241893_at" "4249" "MGAT5" >>> sessionInfo() >> R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: >> LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETA RY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252 >> >> attached base packages: >> [1] "tools" "stats" "graphics" "grDevices" >> [5] "utils" "datasets" "methods" "base" other attached >> packages: >> annotate Biobase hgu133plus2 "1.12.1" "1.12.2" "1.14.0" >> ######################################################## >> >> In summary. If I use R 2.4/BioC 1.9 I obtain the same results I ibtained >> 2 years ago, but If I do the same steps using R2.9/BioC2.4 the results >> change dramatically. >> I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in R >> 2.8 (results not shown here). BioC 2.0 yield the same as 1.9 and BioC 2.2 >> the same as 2.4, >> >> Any help to understand what's happening would be appreciated >> >> Alex Sanchez >> >> ------------------------------------------------------------------- ---------------------------------- >> Dr. Alex S?nchez. Statistics Department. University of Barcelona. >> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >> asanchez_at_ub.edu >> Statistics and Bioinformatics Unit >> Institut de Recerca. Hospital Universitari Vall 'Hebron >> Passeig Vall d'Hebron 112-119. 08034 Barcelona >> asanchez_at_ir.vhebron.net >> ------------------------------------------------------------------- --------------------------------- >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ------------------------------------------------------------------- ----- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 >
ADD COMMENT
0
Entering edit mode
Hi Alex, We have taken this up as an internal topic of discussion and hopefully will be able to come up with a solution that is a bit better than what we are doing right now. However, some of this is going to be unavoidable. The chips were based on UniGene build 133 (and we are now on build 219), so things are going to tend to move around. But I don't think this is the main reason for what you are seeing, as the latest Affy annotation file is still based on UCSC build hg18 (which is based on NCBI build 36.1), so these data are over three years old. Instead, it appears that Affy have changed something about how they are annotating things. Some of this is an improvement, and some not so much. For instance, 203315_at does interrogate NCK2: http://genome.ucsc.edu/cgi-bin/hgTracks?insideX=115&revCmplDisp=0&hgsi d=136510693&hgt.out3=10x&position=chr2%3A105876577-105876826&hgtgroup_ map_close=0&hgtgroup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_r na_close=0&hgtgroup_expression_close=1&hgtgroup_regulation_close=0&hgt group_compGeno_close=0&hgtgroup_varRep_close=0&hgtgroup_encodeGenes_cl ose=1&hgtgroup_encodeTxLevels_close=1&hgtgroup_encodeChip_close=1&hgtg roup_encodeChrom_close=1&hgtgroup_encodeCompAndVar_close=1 whereas 232583_at does not: http://genome.ucsc.edu/cgi-bin/hgTracks?insideX=115&revCmplDisp=0&hgsi d=136510693&hgt.out3=10x&position=chr2%3A105752637-105752886&hgtgroup_ map_close=0&hgtgroup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_r na_close=0&hgtgroup_expression_close=1&hgtgroup_regulation_close=0&hgt group_compGeno_close=0&hgtgroup_varRep_close=0&hgtgroup_encodeGenes_cl ose=1&hgtgroup_encodeTxLevels_close=1&hgtgroup_encodeChip_close=1&hgtg roup_encodeChrom_close=1&hgtgroup_encodeCompAndVar_close=1 so in this case they have improved their annotations. In the case of 238900_at, they seem to have added in a bunch of EG IDs that are based on a model rather than being actual reviewed genes. In addition, that probeset seems to hit an intron, so adding in all this other annotation appears pointless: http://genome.ucsc.edu/cgi-bin/hgTracks?insideX=115&revCmplDisp=0&hgsi d=136511081&hgt.out3=10x&position=chr6%3A32653723-32653972&hgtgroup_ma p_close=0&hgtgroup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna _close=0&hgtgroup_expression_close=1&hgtgroup_regulation_close=0&hgtgr oup_compGeno_close=0&hgtgroup_varRep_close=0&hgtgroup_encodeGenes_clos e=1&hgtgroup_encodeTxLevels_close=1&hgtgroup_encodeChip_close=1&hgtgro up_encodeChrom_close=1&hgtgroup_encodeCompAndVar_close=1 What this really comes down to is the fact that you can't take anything for granted. Not only do you have to validate a set of significant results using some other technology, you also need to ensure that the chip you are using is actually measuring what you think prior to doing the validation. It should be relatively painless to set up a pipeline where you could use the BSgenome.Hsapiens.UCSC.hg18 package along with functions from BSgenome/Biostrings to map probesets to the genome and then use the org.Hs.eg.db package to see if a given probeset is even interrogating the gene it is supposed to. One could then use rtracklayer to visualize things in the UCSC genome browser to make sure you weren't interrogating an intron. Best, Jim Alex Sanchez wrote: > Hi James and thanks for the help. > > I know, of course, that what Bioconductor does is to take the > annotations from public data sources. > I will now turn to affymetrix to see if someone there, or in their > forums, can explain why so many annotations have been turned into "NA's" > > There seem to be two problems here > - The first one, pointed by Loren in his message today is synonims. For > instance the first probeset in my list was 238900_at, > In the first version I used (BioC 1.9) the Gene Symbol provide by > getSymbol was HLA-DRB1 whereas now (BioC 2.4) it is LOC100133484 > However the original symbol is still in the Affymetrix annotation file > "HG-U133_Plus_2.na28.annot.csv". It seems as if the new symbol is intended > to show the relation with the new Entrez ID (100133484). > In any case it is ennoying but it can be assumed. > - What seems worse is what happens to some annotations: If, for > instance, I take the second probeset in my list, 232583_at, and I look > for it in the affy annotations file I find that apart some links to > databases as genebank it does not have a gene symbol or an entrez (or > most annotations anymore). > In the previous version of the annotations this probeset was one of two > associated with gene NCK2 (232583_at;203315_at) . > The problem is that the second probeset (203315_at) was not selected by > the analysis. That is the only evidence suggesting that gene NCK2 was > differentially expressed (232583_at) does not suggest it anymore. > > Last, but in any case least, this is not happening to a few probesets. > My original list had 417 probesets. 210 have changed/synonimized their > gene symbols, but from these 210, 160 have become NA's > > Seems too many to feel comfortable :-( > > Thanks for the help > > Alex > > .................................................................... ........................................... > > Dr. Alex S?nchez. > Associate Professor. Statistics Department. University of Barcelona. > Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain > asanchez_at_ub.edu > Statistics and Bioinformatics Unit > Institut de Recerca. Hospital Universitari Vall 'Hebron > Passeig Vall d'Hebron 112-119. 08034 Barcelona > asanchez_at_ir.vhebron.net > .................................................................... ........................................... > > > > > > ----- Original Message ----- From: "James W. MacDonald" > <jmacdon at="" med.umich.edu=""> > To: "Alex Sanchez" <asanchez at="" ub.edu=""> > Cc: <bioconductor at="" stat.math.ethz.ch=""> > Sent: Monday, July 06, 2009 3:58 PM > Subject: Re: [BioC] Changes in annotations? > > >> Hi Alex, >> >> This is a question that comes up on the Bioc list fairly regularly, >> and the answer is in two parts: >> >> First, the annotations supplied in the various metadata packages >> supplied by BioC are *not* our annotations, but are simply a >> re-packaging of data we collect from various sources. As an example, >> we use the mappings of Affymetrix Probe ID to Entrez Gene ID from the >> annotation csv files you can download from the Affy website. We then >> map the Entrez Gene IDs to other annotation using primarily NCBI data. >> So if you go to Affy's netaffx site (free registration required) and >> query on say, 238900_at, you get this: >> >> https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=HG- U133_PLUS_2%3A238900_AT >> >> >> And you will note that the first Entrez Gene ID listed there is >> 100133484, which happens to be a defunct ID. However, this is the >> first of many listed there (and we need a one-to-one mapping), so we >> chose that one. A more likely Entrez Gene ID can be found further down >> the list, but we simply don't have the resources to figure out if >> there is a better choice in that list (for every reporter on every >> Affy chip we annotate). Nor do we have the resources to ensure that >> any of the mappings that Affy make are reasonable to begin with. We >> have to trust that they (with *way* more resources that us) are doing >> a reasonable job. >> >> The second part of the answer has to do with the 'moving target' >> aspect of Biological annotations. These data change all the time, and >> there is the recurring question of whether one should do an analysis >> and 'freeze' it to that point in time, or should the annotations be >> updated on a regular basis, with the realization that things can and >> will change? >> >> Without looking at each reporter ID you list, I can't say if the >> changes are due to Affy changing their annotation csv files, or to >> changing knowledge of the genome, but I suspect it is a combination of >> the two. >> >> Best, >> >> Jim >> >> >> >> >> >> >> Alex Sanchez wrote: >>> Hello >>> >>> I have had to review recently an analysis I did some time ago. This >>> was done on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I >>> have re-run the analyses using R 2.9 and BioC 2.4 (sessionInfo below). >>> I have been surprised by the changes in the annotations: Many >>> probesets that had had an annotation have become NA's whereas some >>> have changed their symbol and their Entrez gene. >>> >>> To be specific I summarize my question with the top genes of my list >>> >>> The list I obtained 2 years ago is: >>> >>> probeset locuslink symbol >>> 238900_at 3123 HLA-DRB1 >>> 232583_at 8440 NCK2 >>> 236307_at 60468 BACH2 >>> 223620_at 2857 GPR34 >>> 219759_at 64167 LRAP >>> 201702_s_at 5514 PPP1R10 >>> 232882_at 2308 FOXO1A >>> 213446_s_at 8826 IQGAP1 >>> 234033_at 9693 RAPGEF2 >>> 243006_at 2534 FYN >>> 244648_at 54520 CCDC93 >>> 243691_at 23142 DCUN1D4 >>> 239264_at 60412 EXOC4 >>> 243546_at 143686 SESN3 >>> 205239_at 374 AREG >>> 1565703_at 55520 ELAC1 >>> 244061_at 55843 ARHGAP15 >>> 230505_at 26037 SIPA1L1 >>> 242688_at 9320 TRIP12 >>> 1556474_a_at 285097 FLJ38379 >>> 232614_at 596 BCL2 >>> 1565689_at 3839 KPNA3 >>> 236685_at NA NA >>> 225173_at 93663 ARHGAP18 >>> 241893_at 4249 MGAT5 >>> >>> I used the following code to reproduce the issue with the annotations: >>> >>> >>> ##################################################################### >>> ## Verification using R 2.9 & BioC 2.4 >>> ##################################################################### >>> >>>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>>> "219759_at" , >>> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >>> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >>> "243546_at" , "205239_at" , >>> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >>> "1556474_a_at", >>> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >>> "241893_at") >>>> library(hgu133plus2.db) >>>> library(annotate) >>>> >>>> entrezs<- getEG(probes, "hgu133plus2") >>>> symbols<- getSYMBOL(probes, "hgu133plus2") >>>> sel2<- cbind(probes, entrezs, symbols) >>>> sel2 >>> probes entrezs symbols 238900_at >>> "238900_at" "100133484" "LOC100133484" >>> 232583_at "232583_at" NA NA 236307_at >>> "236307_at" NA NA 223620_at "223620_at" >>> "2857" "GPR34" 219759_at "219759_at" "64167" >>> "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" >>> 232882_at "232882_at" NA NA 213446_s_at >>> "213446_s_at" "8826" "IQGAP1" 234033_at "234033_at" >>> NA NA 243006_at "243006_at" NA NA >>> 244648_at "244648_at" NA NA 243691_at >>> "243691_at" NA NA 239264_at "239264_at" >>> NA NA 243546_at "243546_at" NA NA >>> 205239_at "205239_at" "374" "AREG" 1565703_at >>> "1565703_at" "4089" "SMAD4" 244061_at "244061_at" >>> NA NA 230505_at "230505_at" "145474" "LOC145474" >>> 242688_at "242688_at" NA NA 1556474_a_at >>> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" >>> NA NA 1565689_at "1565689_at" NA NA >>> 236685_at "236685_at" NA NA 225173_at >>> "225173_at" "93663" "ARHGAP18" 241893_at "241893_at" >>> NA NA >>>> sessionInfo() >>> R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: >>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >>> States.1252;LC_MONETARY=English_United >>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> other attached packages: >>> [1] annotate_1.22.0 hgu133plus2.db_2.2.11 RSQLite_0.7-1 >>> DBI_0.2-4 AnnotationDbi_1.6.0 Biobase_2.4.1 >>> loaded via a namespace (and not attached): >>> [1] xtable_1.5-5 >>> ############################################# >>> >>> Many probesets seem to have changed. >>> Can someone explain to me what is happening (or what may I be doing >>> wrong)? >>> >>> The same code does not work with R 2.4 but if I change hgu133plus2.db >>> by hgu133plus2 and getEG by getLL I obtain the original results: >>> >>> ############################################### >>> ### Review of annotatons with R 2.4 and BioC 1.9 >>> ############################################### >>> >>> ### This code is executed on a clean new session with R 2. and BioC 1.9 >>> >>>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>>> "219759_at" , >>> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >>> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >>> "243546_at" , "205239_at" , >>> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >>> "1556474_a_at", >>> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >>> "241893_at") >>>> LLs<- getLL(rownames(sel), "hgu133plus2") >>>> symbols<- getSYMBOL(rownames(sel), "hgu133plus2") >>>> sel1<- cbind(probes, LLs, symbols) >>>> sel1 >>> probes LLs symbols 238900_at >>> "238900_at" "3123" "HLA-DRB1" 232583_at "232583_at" "8440" >>> "NCK2" 236307_at "236307_at" "60468" "BACH2" 223620_at >>> "223620_at" "2857" "GPR34" 219759_at "219759_at" "64167" >>> "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" 232882_at >>> "232882_at" "2308" "FOXO1" 213446_s_at "213446_s_at" "8826" >>> "IQGAP1" 234033_at "234033_at" "9693" "RAPGEF2" 243006_at >>> "243006_at" "2534" "FYN" 244648_at "244648_at" "54520" >>> "CCDC93" 243691_at "243691_at" "23142" "DCUN1D4" 239264_at >>> "239264_at" "60412" "EXOC4" 243546_at "243546_at" "143686" >>> "SESN3" 205239_at "205239_at" "374" "AREG" 1565703_at >>> "1565703_at" "4089" "SMAD4" 244061_at "244061_at" "55843" >>> "ARHGAP15" 230505_at "230505_at" "145474" "LOC145474" >>> 242688_at "242688_at" "9320" "TRIP12" 1556474_a_at >>> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" "596" >>> "BCL2" 1565689_at "1565689_at" "3839" "KPNA3" 236685_at >>> "236685_at" NA NA 225173_at "225173_at" >>> "93663" "ARHGAP18" 241893_at "241893_at" "4249" "MGAT5" >>>> sessionInfo() >>> R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: >>> LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONET ARY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252 >>> >>> >>> attached base packages: >>> [1] "tools" "stats" "graphics" "grDevices" >>> [5] "utils" "datasets" "methods" "base" other attached >>> packages: >>> annotate Biobase hgu133plus2 "1.12.1" "1.12.2" "1.14.0" >>> ######################################################## >>> >>> In summary. If I use R 2.4/BioC 1.9 I obtain the same results I >>> ibtained 2 years ago, but If I do the same steps using R2.9/BioC2.4 >>> the results change dramatically. >>> I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in >>> R 2.8 (results not shown here). BioC 2.0 yield the same as 1.9 and >>> BioC 2.2 the same as 2.4, >>> >>> Any help to understand what's happening would be appreciated >>> >>> Alex Sanchez >>> >>> ------------------------------------------------------------------ ----------------------------------- >>> >>> Dr. Alex S?nchez. Statistics Department. University of Barcelona. >>> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >>> asanchez_at_ub.edu >>> Statistics and Bioinformatics Unit >>> Institut de Recerca. Hospital Universitari Vall 'Hebron >>> Passeig Vall d'Hebron 112-119. 08034 Barcelona >>> asanchez_at_ir.vhebron.net >>> ------------------------------------------------------------------ ---------------------------------- >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> ------------------------------------------------------------------ ------ >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD REPLY
0
Entering edit mode
Thank you Why is it not "best" to include everything Affy includes? All the synonyms and IDs Or is this not software possible? > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > Date: Thu, 09 Jul 2009 09:36:46 -0400 > To: Alex Sanchez <asanchez at="" ub.edu=""> > Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] Changes in annotations? > > Hi Alex, > > We have taken this up as an internal topic of discussion and hopefully > will be able to come up with a solution that is a bit better than what > we are doing right now. > > However, some of this is going to be unavoidable. The chips were based > on UniGene build 133 (and we are now on build 219), so things are going > to tend to move around. But I don't think this is the main reason for > what you are seeing, as the latest Affy annotation file is still based > on UCSC build hg18 (which is based on NCBI build 36.1), so these data > are over three years old. > > Instead, it appears that Affy have changed something about how they are > annotating things. Some of this is an improvement, and some not so much. > > For instance, 203315_at does interrogate NCK2: > > http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=136510 > 693&hgt.out3=10x&position=chr2%3A105876577-105876826&hgtgroup_map_cl ose=0&hgtg > roup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&hgt group_expr > ession_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close=0 &hgtgroup_ > varRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevels_ close=1&hg > tgroup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_enco deCompAndV > ar_close=1 > > whereas 232583_at does not: > > http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=136510 > 693&hgt.out3=10x&position=chr2%3A105752637-105752886&hgtgroup_map_cl ose=0&hgtg > roup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&hgt group_expr > ession_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close=0 &hgtgroup_ > varRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevels_ close=1&hg > tgroup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_enco deCompAndV > ar_close=1 > > so in this case they have improved their annotations. > > In the case of 238900_at, they seem to have added in a bunch of EG IDs > that are based on a model rather than being actual reviewed genes. In > addition, that probeset seems to hit an intron, so adding in all this > other annotation appears pointless: > > http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=136511 > 081&hgt.out3=10x&position=chr6%3A32653723-32653972&hgtgroup_map_clos e=0&hgtgro > up_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&hgtgr oup_expres > sion_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close=0&h gtgroup_va > rRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevels_cl ose=1&hgtg > roup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_encode CompAndVar > _close=1 > > What this really comes down to is the fact that you can't take anything > for granted. Not only do you have to validate a set of significant > results using some other technology, you also need to ensure that the > chip you are using is actually measuring what you think prior to doing > the validation. > > It should be relatively painless to set up a pipeline where you could > use the BSgenome.Hsapiens.UCSC.hg18 package along with functions from > BSgenome/Biostrings to map probesets to the genome and then use the > org.Hs.eg.db package to see if a given probeset is even interrogating > the gene it is supposed to. One could then use rtracklayer to visualize > things in the UCSC genome browser to make sure you weren't interrogating > an intron. > > Best, > > Jim > > > > > Alex Sanchez wrote: >> Hi James and thanks for the help. >> >> I know, of course, that what Bioconductor does is to take the >> annotations from public data sources. >> I will now turn to affymetrix to see if someone there, or in their >> forums, can explain why so many annotations have been turned into "NA's" >> >> There seem to be two problems here >> - The first one, pointed by Loren in his message today is synonims. For >> instance the first probeset in my list was 238900_at, >> In the first version I used (BioC 1.9) the Gene Symbol provide by >> getSymbol was HLA-DRB1 whereas now (BioC 2.4) it is LOC100133484 >> However the original symbol is still in the Affymetrix annotation file >> "HG-U133_Plus_2.na28.annot.csv". It seems as if the new symbol is intended >> to show the relation with the new Entrez ID (100133484). >> In any case it is ennoying but it can be assumed. >> - What seems worse is what happens to some annotations: If, for >> instance, I take the second probeset in my list, 232583_at, and I look >> for it in the affy annotations file I find that apart some links to >> databases as genebank it does not have a gene symbol or an entrez (or >> most annotations anymore). >> In the previous version of the annotations this probeset was one of two >> associated with gene NCK2 (232583_at;203315_at) . >> The problem is that the second probeset (203315_at) was not selected by >> the analysis. That is the only evidence suggesting that gene NCK2 was >> differentially expressed (232583_at) does not suggest it anymore. >> >> Last, but in any case least, this is not happening to a few probesets. >> My original list had 417 probesets. 210 have changed/synonimized their >> gene symbols, but from these 210, 160 have become NA's >> >> Seems too many to feel comfortable :-( >> >> Thanks for the help >> >> Alex >> >> ................................................................... .......... >> .................................. >> >> Dr. Alex S?nchez. >> Associate Professor. Statistics Department. University of Barcelona. >> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >> asanchez_at_ub.edu >> Statistics and Bioinformatics Unit >> Institut de Recerca. Hospital Universitari Vall 'Hebron >> Passeig Vall d'Hebron 112-119. 08034 Barcelona >> asanchez_at_ir.vhebron.net >> ................................................................... .......... >> .................................. >> >> >> >> >> >> ----- Original Message ----- From: "James W. MacDonald" >> <jmacdon at="" med.umich.edu=""> >> To: "Alex Sanchez" <asanchez at="" ub.edu=""> >> Cc: <bioconductor at="" stat.math.ethz.ch=""> >> Sent: Monday, July 06, 2009 3:58 PM >> Subject: Re: [BioC] Changes in annotations? >> >> >>> Hi Alex, >>> >>> This is a question that comes up on the Bioc list fairly regularly, >>> and the answer is in two parts: >>> >>> First, the annotations supplied in the various metadata packages >>> supplied by BioC are *not* our annotations, but are simply a >>> re-packaging of data we collect from various sources. As an example, >>> we use the mappings of Affymetrix Probe ID to Entrez Gene ID from the >>> annotation csv files you can download from the Affy website. We then >>> map the Entrez Gene IDs to other annotation using primarily NCBI data. >>> So if you go to Affy's netaffx site (free registration required) and >>> query on say, 238900_at, you get this: >>> >>> https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=HG- U133_PLUS_ >>> 2%3A238900_AT >>> >>> >>> And you will note that the first Entrez Gene ID listed there is >>> 100133484, which happens to be a defunct ID. However, this is the >>> first of many listed there (and we need a one-to-one mapping), so we >>> chose that one. A more likely Entrez Gene ID can be found further down >>> the list, but we simply don't have the resources to figure out if >>> there is a better choice in that list (for every reporter on every >>> Affy chip we annotate). Nor do we have the resources to ensure that >>> any of the mappings that Affy make are reasonable to begin with. We >>> have to trust that they (with *way* more resources that us) are doing >>> a reasonable job. >>> >>> The second part of the answer has to do with the 'moving target' >>> aspect of Biological annotations. These data change all the time, and >>> there is the recurring question of whether one should do an analysis >>> and 'freeze' it to that point in time, or should the annotations be >>> updated on a regular basis, with the realization that things can and >>> will change? >>> >>> Without looking at each reporter ID you list, I can't say if the >>> changes are due to Affy changing their annotation csv files, or to >>> changing knowledge of the genome, but I suspect it is a combination of >>> the two. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>> >>> >>> Alex Sanchez wrote: >>>> Hello >>>> >>>> I have had to review recently an analysis I did some time ago. This >>>> was done on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I >>>> have re-run the analyses using R 2.9 and BioC 2.4 (sessionInfo below). >>>> I have been surprised by the changes in the annotations: Many >>>> probesets that had had an annotation have become NA's whereas some >>>> have changed their symbol and their Entrez gene. >>>> >>>> To be specific I summarize my question with the top genes of my list >>>> >>>> The list I obtained 2 years ago is: >>>> >>>> probeset locuslink symbol >>>> 238900_at 3123 HLA-DRB1 >>>> 232583_at 8440 NCK2 >>>> 236307_at 60468 BACH2 >>>> 223620_at 2857 GPR34 >>>> 219759_at 64167 LRAP >>>> 201702_s_at 5514 PPP1R10 >>>> 232882_at 2308 FOXO1A >>>> 213446_s_at 8826 IQGAP1 >>>> 234033_at 9693 RAPGEF2 >>>> 243006_at 2534 FYN >>>> 244648_at 54520 CCDC93 >>>> 243691_at 23142 DCUN1D4 >>>> 239264_at 60412 EXOC4 >>>> 243546_at 143686 SESN3 >>>> 205239_at 374 AREG >>>> 1565703_at 55520 ELAC1 >>>> 244061_at 55843 ARHGAP15 >>>> 230505_at 26037 SIPA1L1 >>>> 242688_at 9320 TRIP12 >>>> 1556474_a_at 285097 FLJ38379 >>>> 232614_at 596 BCL2 >>>> 1565689_at 3839 KPNA3 >>>> 236685_at NA NA >>>> 225173_at 93663 ARHGAP18 >>>> 241893_at 4249 MGAT5 >>>> >>>> I used the following code to reproduce the issue with the annotations: >>>> >>>> >>>> ##################################################################### >>>> ## Verification using R 2.9 & BioC 2.4 >>>> ##################################################################### >>>> >>>>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>>>> "219759_at" , >>>> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >>>> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >>>> "243546_at" , "205239_at" , >>>> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >>>> "1556474_a_at", >>>> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >>>> "241893_at") >>>>> library(hgu133plus2.db) >>>>> library(annotate) >>>>> >>>>> entrezs<- getEG(probes, "hgu133plus2") >>>>> symbols<- getSYMBOL(probes, "hgu133plus2") >>>>> sel2<- cbind(probes, entrezs, symbols) >>>>> sel2 >>>> probes entrezs symbols 238900_at >>>> "238900_at" "100133484" "LOC100133484" >>>> 232583_at "232583_at" NA NA 236307_at >>>> "236307_at" NA NA 223620_at "223620_at" >>>> "2857" "GPR34" 219759_at "219759_at" "64167" >>>> "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" >>>> 232882_at "232882_at" NA NA 213446_s_at >>>> "213446_s_at" "8826" "IQGAP1" 234033_at "234033_at" >>>> NA NA 243006_at "243006_at" NA NA >>>> 244648_at "244648_at" NA NA 243691_at >>>> "243691_at" NA NA 239264_at "239264_at" >>>> NA NA 243546_at "243546_at" NA NA >>>> 205239_at "205239_at" "374" "AREG" 1565703_at >>>> "1565703_at" "4089" "SMAD4" 244061_at "244061_at" >>>> NA NA 230505_at "230505_at" "145474" "LOC145474" >>>> 242688_at "242688_at" NA NA 1556474_a_at >>>> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" >>>> NA NA 1565689_at "1565689_at" NA NA >>>> 236685_at "236685_at" NA NA 225173_at >>>> "225173_at" "93663" "ARHGAP18" 241893_at "241893_at" >>>> NA NA >>>>> sessionInfo() >>>> R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: >>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >>>> States.1252;LC_MONETARY=English_United >>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> other attached packages: >>>> [1] annotate_1.22.0 hgu133plus2.db_2.2.11 RSQLite_0.7-1 >>>> DBI_0.2-4 AnnotationDbi_1.6.0 Biobase_2.4.1 >>>> loaded via a namespace (and not attached): >>>> [1] xtable_1.5-5 >>>> ############################################# >>>> >>>> Many probesets seem to have changed. >>>> Can someone explain to me what is happening (or what may I be doing >>>> wrong)? >>>> >>>> The same code does not work with R 2.4 but if I change hgu133plus2.db >>>> by hgu133plus2 and getEG by getLL I obtain the original results: >>>> >>>> ############################################### >>>> ### Review of annotatons with R 2.4 and BioC 1.9 >>>> ############################################### >>>> >>>> ### This code is executed on a clean new session with R 2. and BioC 1.9 >>>> >>>>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>>>> "219759_at" , >>>> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >>>> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >>>> "243546_at" , "205239_at" , >>>> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >>>> "1556474_a_at", >>>> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >>>> "241893_at") >>>>> LLs<- getLL(rownames(sel), "hgu133plus2") >>>>> symbols<- getSYMBOL(rownames(sel), "hgu133plus2") >>>>> sel1<- cbind(probes, LLs, symbols) >>>>> sel1 >>>> probes LLs symbols 238900_at >>>> "238900_at" "3123" "HLA-DRB1" 232583_at "232583_at" "8440" >>>> "NCK2" 236307_at "236307_at" "60468" "BACH2" 223620_at >>>> "223620_at" "2857" "GPR34" 219759_at "219759_at" "64167" >>>> "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" 232882_at >>>> "232882_at" "2308" "FOXO1" 213446_s_at "213446_s_at" "8826" >>>> "IQGAP1" 234033_at "234033_at" "9693" "RAPGEF2" 243006_at >>>> "243006_at" "2534" "FYN" 244648_at "244648_at" "54520" >>>> "CCDC93" 243691_at "243691_at" "23142" "DCUN1D4" 239264_at >>>> "239264_at" "60412" "EXOC4" 243546_at "243546_at" "143686" >>>> "SESN3" 205239_at "205239_at" "374" "AREG" 1565703_at >>>> "1565703_at" "4089" "SMAD4" 244061_at "244061_at" "55843" >>>> "ARHGAP15" 230505_at "230505_at" "145474" "LOC145474" >>>> 242688_at "242688_at" "9320" "TRIP12" 1556474_a_at >>>> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" "596" >>>> "BCL2" 1565689_at "1565689_at" "3839" "KPNA3" 236685_at >>>> "236685_at" NA NA 225173_at "225173_at" >>>> "93663" "ARHGAP18" 241893_at "241893_at" "4249" "MGAT5" >>>>> sessionInfo() >>>> R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: >>>> LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONE TARY=Spani >>>> sh_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252 >>>> >>>> >>>> attached base packages: >>>> [1] "tools" "stats" "graphics" "grDevices" >>>> [5] "utils" "datasets" "methods" "base" other attached >>>> packages: >>>> annotate Biobase hgu133plus2 "1.12.1" "1.12.2" "1.14.0" >>>> ######################################################## >>>> >>>> In summary. If I use R 2.4/BioC 1.9 I obtain the same results I >>>> ibtained 2 years ago, but If I do the same steps using R2.9/BioC2.4 >>>> the results change dramatically. >>>> I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in >>>> R 2.8 (results not shown here). BioC 2.0 yield the same as 1.9 and >>>> BioC 2.2 the same as 2.4, >>>> >>>> Any help to understand what's happening would be appreciated >>>> >>>> Alex Sanchez >>>> >>>> ----------------------------------------------------------------- ---------- >>>> -------------------------- >>>> >>>> Dr. Alex S?nchez. Statistics Department. University of Barcelona. >>>> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >>>> asanchez_at_ub.edu >>>> Statistics and Bioinformatics Unit >>>> Institut de Recerca. Hospital Universitari Vall 'Hebron >>>> Passeig Vall d'Hebron 112-119. 08034 Barcelona >>>> asanchez_at_ir.vhebron.net >>>> ----------------------------------------------------------------- ---------- >>>> ------------------------- >>>> >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> >>>> ----------------------------------------------------------------- ------- >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> University of Michigan >>> Department of Human Genetics >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> >> >> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Loren, I am working on this right now. But it is really not as simple as it sounds. I hope to have more to say about this very soon. Marc Loren Engrav wrote: > Thank you > > Why is it not "best" to include everything Affy includes? All the synonyms > and IDs > > Or is this not software possible? > > > > >> From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >> Date: Thu, 09 Jul 2009 09:36:46 -0400 >> To: Alex Sanchez <asanchez at="" ub.edu=""> >> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] Changes in annotations? >> >> Hi Alex, >> >> We have taken this up as an internal topic of discussion and hopefully >> will be able to come up with a solution that is a bit better than what >> we are doing right now. >> >> However, some of this is going to be unavoidable. The chips were based >> on UniGene build 133 (and we are now on build 219), so things are going >> to tend to move around. But I don't think this is the main reason for >> what you are seeing, as the latest Affy annotation file is still based >> on UCSC build hg18 (which is based on NCBI build 36.1), so these data >> are over three years old. >> >> Instead, it appears that Affy have changed something about how they are >> annotating things. Some of this is an improvement, and some not so much. >> >> For instance, 203315_at does interrogate NCK2: >> >> http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=136510 >> 693&hgt.out3=10x&position=chr2%3A105876577-105876826&hgtgroup_map_c lose=0&hgtg >> roup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&hg tgroup_expr >> ession_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close= 0&hgtgroup_ >> varRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevels _close=1&hg >> tgroup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_enc odeCompAndV >> ar_close=1 >> >> whereas 232583_at does not: >> >> http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=136510 >> 693&hgt.out3=10x&position=chr2%3A105752637-105752886&hgtgroup_map_c lose=0&hgtg >> roup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&hg tgroup_expr >> ession_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close= 0&hgtgroup_ >> varRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevels _close=1&hg >> tgroup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_enc odeCompAndV >> ar_close=1 >> >> so in this case they have improved their annotations. >> >> In the case of 238900_at, they seem to have added in a bunch of EG IDs >> that are based on a model rather than being actual reviewed genes. In >> addition, that probeset seems to hit an intron, so adding in all this >> other annotation appears pointless: >> >> http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=136511 >> 081&hgt.out3=10x&position=chr6%3A32653723-32653972&hgtgroup_map_clo se=0&hgtgro >> up_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&hgtg roup_expres >> sion_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close=0& hgtgroup_va >> rRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevels_c lose=1&hgtg >> roup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_encod eCompAndVar >> _close=1 >> >> What this really comes down to is the fact that you can't take anything >> for granted. Not only do you have to validate a set of significant >> results using some other technology, you also need to ensure that the >> chip you are using is actually measuring what you think prior to doing >> the validation. >> >> It should be relatively painless to set up a pipeline where you could >> use the BSgenome.Hsapiens.UCSC.hg18 package along with functions from >> BSgenome/Biostrings to map probesets to the genome and then use the >> org.Hs.eg.db package to see if a given probeset is even interrogating >> the gene it is supposed to. One could then use rtracklayer to visualize >> things in the UCSC genome browser to make sure you weren't interrogating >> an intron. >> >> Best, >> >> Jim >> >> >> >> >> Alex Sanchez wrote: >> >>> Hi James and thanks for the help. >>> >>> I know, of course, that what Bioconductor does is to take the >>> annotations from public data sources. >>> I will now turn to affymetrix to see if someone there, or in their >>> forums, can explain why so many annotations have been turned into "NA's" >>> >>> There seem to be two problems here >>> - The first one, pointed by Loren in his message today is synonims. For >>> instance the first probeset in my list was 238900_at, >>> In the first version I used (BioC 1.9) the Gene Symbol provide by >>> getSymbol was HLA-DRB1 whereas now (BioC 2.4) it is LOC100133484 >>> However the original symbol is still in the Affymetrix annotation file >>> "HG-U133_Plus_2.na28.annot.csv". It seems as if the new symbol is intended >>> to show the relation with the new Entrez ID (100133484). >>> In any case it is ennoying but it can be assumed. >>> - What seems worse is what happens to some annotations: If, for >>> instance, I take the second probeset in my list, 232583_at, and I look >>> for it in the affy annotations file I find that apart some links to >>> databases as genebank it does not have a gene symbol or an entrez (or >>> most annotations anymore). >>> In the previous version of the annotations this probeset was one of two >>> associated with gene NCK2 (232583_at;203315_at) . >>> The problem is that the second probeset (203315_at) was not selected by >>> the analysis. That is the only evidence suggesting that gene NCK2 was >>> differentially expressed (232583_at) does not suggest it anymore. >>> >>> Last, but in any case least, this is not happening to a few probesets. >>> My original list had 417 probesets. 210 have changed/synonimized their >>> gene symbols, but from these 210, 160 have become NA's >>> >>> Seems too many to feel comfortable :-( >>> >>> Thanks for the help >>> >>> Alex >>> >>> .................................................................. ........... >>> .................................. >>> >>> Dr. Alex S?nchez. >>> Associate Professor. Statistics Department. University of Barcelona. >>> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >>> asanchez_at_ub.edu >>> Statistics and Bioinformatics Unit >>> Institut de Recerca. Hospital Universitari Vall 'Hebron >>> Passeig Vall d'Hebron 112-119. 08034 Barcelona >>> asanchez_at_ir.vhebron.net >>> .................................................................. ........... >>> .................................. >>> >>> >>> >>> >>> >>> ----- Original Message ----- From: "James W. MacDonald" >>> <jmacdon at="" med.umich.edu=""> >>> To: "Alex Sanchez" <asanchez at="" ub.edu=""> >>> Cc: <bioconductor at="" stat.math.ethz.ch=""> >>> Sent: Monday, July 06, 2009 3:58 PM >>> Subject: Re: [BioC] Changes in annotations? >>> >>> >>> >>>> Hi Alex, >>>> >>>> This is a question that comes up on the Bioc list fairly regularly, >>>> and the answer is in two parts: >>>> >>>> First, the annotations supplied in the various metadata packages >>>> supplied by BioC are *not* our annotations, but are simply a >>>> re-packaging of data we collect from various sources. As an example, >>>> we use the mappings of Affymetrix Probe ID to Entrez Gene ID from the >>>> annotation csv files you can download from the Affy website. We then >>>> map the Entrez Gene IDs to other annotation using primarily NCBI data. >>>> So if you go to Affy's netaffx site (free registration required) and >>>> query on say, 238900_at, you get this: >>>> >>>> https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk =HG-U133_PLUS_ >>>> 2%3A238900_AT >>>> >>>> >>>> And you will note that the first Entrez Gene ID listed there is >>>> 100133484, which happens to be a defunct ID. However, this is the >>>> first of many listed there (and we need a one-to-one mapping), so we >>>> chose that one. A more likely Entrez Gene ID can be found further down >>>> the list, but we simply don't have the resources to figure out if >>>> there is a better choice in that list (for every reporter on every >>>> Affy chip we annotate). Nor do we have the resources to ensure that >>>> any of the mappings that Affy make are reasonable to begin with. We >>>> have to trust that they (with *way* more resources that us) are doing >>>> a reasonable job. >>>> >>>> The second part of the answer has to do with the 'moving target' >>>> aspect of Biological annotations. These data change all the time, and >>>> there is the recurring question of whether one should do an analysis >>>> and 'freeze' it to that point in time, or should the annotations be >>>> updated on a regular basis, with the realization that things can and >>>> will change? >>>> >>>> Without looking at each reporter ID you list, I can't say if the >>>> changes are due to Affy changing their annotation csv files, or to >>>> changing knowledge of the genome, but I suspect it is a combination of >>>> the two. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>> >>>> >>>> >>>> Alex Sanchez wrote: >>>> >>>>> Hello >>>>> >>>>> I have had to review recently an analysis I did some time ago. This >>>>> was done on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I >>>>> have re-run the analyses using R 2.9 and BioC 2.4 (sessionInfo below). >>>>> I have been surprised by the changes in the annotations: Many >>>>> probesets that had had an annotation have become NA's whereas some >>>>> have changed their symbol and their Entrez gene. >>>>> >>>>> To be specific I summarize my question with the top genes of my list >>>>> >>>>> The list I obtained 2 years ago is: >>>>> >>>>> probeset locuslink symbol >>>>> 238900_at 3123 HLA-DRB1 >>>>> 232583_at 8440 NCK2 >>>>> 236307_at 60468 BACH2 >>>>> 223620_at 2857 GPR34 >>>>> 219759_at 64167 LRAP >>>>> 201702_s_at 5514 PPP1R10 >>>>> 232882_at 2308 FOXO1A >>>>> 213446_s_at 8826 IQGAP1 >>>>> 234033_at 9693 RAPGEF2 >>>>> 243006_at 2534 FYN >>>>> 244648_at 54520 CCDC93 >>>>> 243691_at 23142 DCUN1D4 >>>>> 239264_at 60412 EXOC4 >>>>> 243546_at 143686 SESN3 >>>>> 205239_at 374 AREG >>>>> 1565703_at 55520 ELAC1 >>>>> 244061_at 55843 ARHGAP15 >>>>> 230505_at 26037 SIPA1L1 >>>>> 242688_at 9320 TRIP12 >>>>> 1556474_a_at 285097 FLJ38379 >>>>> 232614_at 596 BCL2 >>>>> 1565689_at 3839 KPNA3 >>>>> 236685_at NA NA >>>>> 225173_at 93663 ARHGAP18 >>>>> 241893_at 4249 MGAT5 >>>>> >>>>> I used the following code to reproduce the issue with the annotations: >>>>> >>>>> >>>>> ##################################################################### >>>>> ## Verification using R 2.9 & BioC 2.4 >>>>> ##################################################################### >>>>> >>>>> >>>>>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>>>>> "219759_at" , >>>>>> >>>>> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >>>>> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >>>>> "243546_at" , "205239_at" , >>>>> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >>>>> "1556474_a_at", >>>>> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >>>>> "241893_at") >>>>> >>>>>> library(hgu133plus2.db) >>>>>> library(annotate) >>>>>> >>>>>> entrezs<- getEG(probes, "hgu133plus2") >>>>>> symbols<- getSYMBOL(probes, "hgu133plus2") >>>>>> sel2<- cbind(probes, entrezs, symbols) >>>>>> sel2 >>>>>> >>>>> probes entrezs symbols 238900_at >>>>> "238900_at" "100133484" "LOC100133484" >>>>> 232583_at "232583_at" NA NA 236307_at >>>>> "236307_at" NA NA 223620_at "223620_at" >>>>> "2857" "GPR34" 219759_at "219759_at" "64167" >>>>> "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" >>>>> 232882_at "232882_at" NA NA 213446_s_at >>>>> "213446_s_at" "8826" "IQGAP1" 234033_at "234033_at" >>>>> NA NA 243006_at "243006_at" NA NA >>>>> 244648_at "244648_at" NA NA 243691_at >>>>> "243691_at" NA NA 239264_at "239264_at" >>>>> NA NA 243546_at "243546_at" NA NA >>>>> 205239_at "205239_at" "374" "AREG" 1565703_at >>>>> "1565703_at" "4089" "SMAD4" 244061_at "244061_at" >>>>> NA NA 230505_at "230505_at" "145474" "LOC145474" >>>>> 242688_at "242688_at" NA NA 1556474_a_at >>>>> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" >>>>> NA NA 1565689_at "1565689_at" NA NA >>>>> 236685_at "236685_at" NA NA 225173_at >>>>> "225173_at" "93663" "ARHGAP18" 241893_at "241893_at" >>>>> NA NA >>>>> >>>>>> sessionInfo() >>>>>> >>>>> R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: >>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >>>>> States.1252;LC_MONETARY=English_United >>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> other attached packages: >>>>> [1] annotate_1.22.0 hgu133plus2.db_2.2.11 RSQLite_0.7-1 >>>>> DBI_0.2-4 AnnotationDbi_1.6.0 Biobase_2.4.1 >>>>> loaded via a namespace (and not attached): >>>>> [1] xtable_1.5-5 >>>>> ############################################# >>>>> >>>>> Many probesets seem to have changed. >>>>> Can someone explain to me what is happening (or what may I be doing >>>>> wrong)? >>>>> >>>>> The same code does not work with R 2.4 but if I change hgu133plus2.db >>>>> by hgu133plus2 and getEG by getLL I obtain the original results: >>>>> >>>>> ############################################### >>>>> ### Review of annotatons with R 2.4 and BioC 1.9 >>>>> ############################################### >>>>> >>>>> ### This code is executed on a clean new session with R 2. and BioC 1.9 >>>>> >>>>> >>>>>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>>>>> "219759_at" , >>>>>> >>>>> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >>>>> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >>>>> "243546_at" , "205239_at" , >>>>> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >>>>> "1556474_a_at", >>>>> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >>>>> "241893_at") >>>>> >>>>>> LLs<- getLL(rownames(sel), "hgu133plus2") >>>>>> symbols<- getSYMBOL(rownames(sel), "hgu133plus2") >>>>>> sel1<- cbind(probes, LLs, symbols) >>>>>> sel1 >>>>>> >>>>> probes LLs symbols 238900_at >>>>> "238900_at" "3123" "HLA-DRB1" 232583_at "232583_at" "8440" >>>>> "NCK2" 236307_at "236307_at" "60468" "BACH2" 223620_at >>>>> "223620_at" "2857" "GPR34" 219759_at "219759_at" "64167" >>>>> "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" 232882_at >>>>> "232882_at" "2308" "FOXO1" 213446_s_at "213446_s_at" "8826" >>>>> "IQGAP1" 234033_at "234033_at" "9693" "RAPGEF2" 243006_at >>>>> "243006_at" "2534" "FYN" 244648_at "244648_at" "54520" >>>>> "CCDC93" 243691_at "243691_at" "23142" "DCUN1D4" 239264_at >>>>> "239264_at" "60412" "EXOC4" 243546_at "243546_at" "143686" >>>>> "SESN3" 205239_at "205239_at" "374" "AREG" 1565703_at >>>>> "1565703_at" "4089" "SMAD4" 244061_at "244061_at" "55843" >>>>> "ARHGAP15" 230505_at "230505_at" "145474" "LOC145474" >>>>> 242688_at "242688_at" "9320" "TRIP12" 1556474_a_at >>>>> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" "596" >>>>> "BCL2" 1565689_at "1565689_at" "3839" "KPNA3" 236685_at >>>>> "236685_at" NA NA 225173_at "225173_at" >>>>> "93663" "ARHGAP18" 241893_at "241893_at" "4249" "MGAT5" >>>>> >>>>>> sessionInfo() >>>>>> >>>>> R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: >>>>> LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MON ETARY=Spani >>>>> sh_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252 >>>>> >>>>> >>>>> attached base packages: >>>>> [1] "tools" "stats" "graphics" "grDevices" >>>>> [5] "utils" "datasets" "methods" "base" other attached >>>>> packages: >>>>> annotate Biobase hgu133plus2 "1.12.1" "1.12.2" "1.14.0" >>>>> ######################################################## >>>>> >>>>> In summary. If I use R 2.4/BioC 1.9 I obtain the same results I >>>>> ibtained 2 years ago, but If I do the same steps using R2.9/BioC2.4 >>>>> the results change dramatically. >>>>> I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in >>>>> R 2.8 (results not shown here). BioC 2.0 yield the same as 1.9 and >>>>> BioC 2.2 the same as 2.4, >>>>> >>>>> Any help to understand what's happening would be appreciated >>>>> >>>>> Alex Sanchez >>>>> >>>>> ---------------------------------------------------------------- ----------- >>>>> -------------------------- >>>>> >>>>> Dr. Alex S?nchez. Statistics Department. University of Barcelona. >>>>> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >>>>> asanchez_at_ub.edu >>>>> Statistics and Bioinformatics Unit >>>>> Institut de Recerca. Hospital Universitari Vall 'Hebron >>>>> Passeig Vall d'Hebron 112-119. 08034 Barcelona >>>>> asanchez_at_ir.vhebron.net >>>>> ---------------------------------------------------------------- ----------- >>>>> ------------------------- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> >>>>> >>>>> ---------------------------------------------------------------- -------- >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> Douglas Lab >>>> University of Michigan >>>> Department of Human Genetics >>>> 5912 Buhl >>>> 1241 E. Catherine St. >>>> Ann Arbor MI 48109-5618 >>>> 734-615-7826 >>>> >>>> >>> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
Thank you > From: Marc Carlson <mcarlson at="" fhcrc.org=""> > Date: Fri, 10 Jul 2009 08:46:36 -0700 > To: Loren Engrav <engrav at="" u.washington.edu=""> > Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] Changes in annotations? > > Hi Loren, > > I am working on this right now. But it is really not as simple as it > sounds. I hope to have more to say about this very soon. > > > Marc > > > > Loren Engrav wrote: >> Thank you >> >> Why is it not "best" to include everything Affy includes? All the synonyms >> and IDs >> >> Or is this not software possible? >> >> >> >> >>> From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >>> Date: Thu, 09 Jul 2009 09:36:46 -0400 >>> To: Alex Sanchez <asanchez at="" ub.edu=""> >>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: Re: [BioC] Changes in annotations? >>> >>> Hi Alex, >>> >>> We have taken this up as an internal topic of discussion and hopefully >>> will be able to come up with a solution that is a bit better than what >>> we are doing right now. >>> >>> However, some of this is going to be unavoidable. The chips were based >>> on UniGene build 133 (and we are now on build 219), so things are going >>> to tend to move around. But I don't think this is the main reason for >>> what you are seeing, as the latest Affy annotation file is still based >>> on UCSC build hg18 (which is based on NCBI build 36.1), so these data >>> are over three years old. >>> >>> Instead, it appears that Affy have changed something about how they are >>> annotating things. Some of this is an improvement, and some not so much. >>> >>> For instance, 203315_at does interrogate NCK2: >>> >>> http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=1365 >>> 10 >>> 693&hgt.out3=10x&position=chr2%3A105876577-105876826&hgtgroup_map_ close=0&hg >>> tg >>> roup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&h gtgroup_ex >>> pr >>> ession_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close =0&hgtgrou >>> p_ >>> varRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevel s_close=1& >>> hg >>> tgroup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_en codeCompAn >>> dV >>> ar_close=1 >>> >>> whereas 232583_at does not: >>> >>> http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=1365 >>> 10 >>> 693&hgt.out3=10x&position=chr2%3A105752637-105752886&hgtgroup_map_ close=0&hg >>> tg >>> roup_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&h gtgroup_ex >>> pr >>> ession_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close =0&hgtgrou >>> p_ >>> varRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevel s_close=1& >>> hg >>> tgroup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_en codeCompAn >>> dV >>> ar_close=1 >>> >>> so in this case they have improved their annotations. >>> >>> In the case of 238900_at, they seem to have added in a bunch of EG IDs >>> that are based on a model rather than being actual reviewed genes. In >>> addition, that probeset seems to hit an intron, so adding in all this >>> other annotation appears pointless: >>> >>> http://genome.ucsc.edu/cgi- bin/hgTracks?insideX=115&revCmplDisp=0&hgsid=1365 >>> 11 >>> 081&hgt.out3=10x&position=chr6%3A32653723-32653972&hgtgroup_map_cl ose=0&hgtg >>> ro >>> up_phenDis_close=1&hgtgroup_genes_close=0&hgtgroup_rna_close=0&hgt group_expr >>> es >>> sion_close=1&hgtgroup_regulation_close=0&hgtgroup_compGeno_close=0 &hgtgroup_ >>> va >>> rRep_close=0&hgtgroup_encodeGenes_close=1&hgtgroup_encodeTxLevels_ close=1&hg >>> tg >>> roup_encodeChip_close=1&hgtgroup_encodeChrom_close=1&hgtgroup_enco deCompAndV >>> ar >>> _close=1 >>> >>> What this really comes down to is the fact that you can't take anything >>> for granted. Not only do you have to validate a set of significant >>> results using some other technology, you also need to ensure that the >>> chip you are using is actually measuring what you think prior to doing >>> the validation. >>> >>> It should be relatively painless to set up a pipeline where you could >>> use the BSgenome.Hsapiens.UCSC.hg18 package along with functions from >>> BSgenome/Biostrings to map probesets to the genome and then use the >>> org.Hs.eg.db package to see if a given probeset is even interrogating >>> the gene it is supposed to. One could then use rtracklayer to visualize >>> things in the UCSC genome browser to make sure you weren't interrogating >>> an intron. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>> Alex Sanchez wrote: >>> >>>> Hi James and thanks for the help. >>>> >>>> I know, of course, that what Bioconductor does is to take the >>>> annotations from public data sources. >>>> I will now turn to affymetrix to see if someone there, or in their >>>> forums, can explain why so many annotations have been turned into "NA's" >>>> >>>> There seem to be two problems here >>>> - The first one, pointed by Loren in his message today is synonims. For >>>> instance the first probeset in my list was 238900_at, >>>> In the first version I used (BioC 1.9) the Gene Symbol provide by >>>> getSymbol was HLA-DRB1 whereas now (BioC 2.4) it is LOC100133484 >>>> However the original symbol is still in the Affymetrix annotation file >>>> "HG-U133_Plus_2.na28.annot.csv". It seems as if the new symbol is intended >>>> to show the relation with the new Entrez ID (100133484). >>>> In any case it is ennoying but it can be assumed. >>>> - What seems worse is what happens to some annotations: If, for >>>> instance, I take the second probeset in my list, 232583_at, and I look >>>> for it in the affy annotations file I find that apart some links to >>>> databases as genebank it does not have a gene symbol or an entrez (or >>>> most annotations anymore). >>>> In the previous version of the annotations this probeset was one of two >>>> associated with gene NCK2 (232583_at;203315_at) . >>>> The problem is that the second probeset (203315_at) was not selected by >>>> the analysis. That is the only evidence suggesting that gene NCK2 was >>>> differentially expressed (232583_at) does not suggest it anymore. >>>> >>>> Last, but in any case least, this is not happening to a few probesets. >>>> My original list had 417 probesets. 210 have changed/synonimized their >>>> gene symbols, but from these 210, 160 have become NA's >>>> >>>> Seems too many to feel comfortable :-( >>>> >>>> Thanks for the help >>>> >>>> Alex >>>> >>>> ................................................................. .......... >>>> .. >>>> .................................. >>>> >>>> Dr. Alex S?nchez. >>>> Associate Professor. Statistics Department. University of Barcelona. >>>> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >>>> asanchez_at_ub.edu >>>> Statistics and Bioinformatics Unit >>>> Institut de Recerca. Hospital Universitari Vall 'Hebron >>>> Passeig Vall d'Hebron 112-119. 08034 Barcelona >>>> asanchez_at_ir.vhebron.net >>>> ................................................................. .......... >>>> .. >>>> .................................. >>>> >>>> >>>> >>>> >>>> >>>> ----- Original Message ----- From: "James W. MacDonald" >>>> <jmacdon at="" med.umich.edu=""> >>>> To: "Alex Sanchez" <asanchez at="" ub.edu=""> >>>> Cc: <bioconductor at="" stat.math.ethz.ch=""> >>>> Sent: Monday, July 06, 2009 3:58 PM >>>> Subject: Re: [BioC] Changes in annotations? >>>> >>>> >>>> >>>>> Hi Alex, >>>>> >>>>> This is a question that comes up on the Bioc list fairly regularly, >>>>> and the answer is in two parts: >>>>> >>>>> First, the annotations supplied in the various metadata packages >>>>> supplied by BioC are *not* our annotations, but are simply a >>>>> re-packaging of data we collect from various sources. As an example, >>>>> we use the mappings of Affymetrix Probe ID to Entrez Gene ID from the >>>>> annotation csv files you can download from the Affy website. We then >>>>> map the Entrez Gene IDs to other annotation using primarily NCBI data. >>>>> So if you go to Affy's netaffx site (free registration required) and >>>>> query on say, 238900_at, you get this: >>>>> >>>>> https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk =HG-U133_PLU >>>>> S_ >>>>> 2%3A238900_AT >>>>> >>>>> >>>>> And you will note that the first Entrez Gene ID listed there is >>>>> 100133484, which happens to be a defunct ID. However, this is the >>>>> first of many listed there (and we need a one-to-one mapping), so we >>>>> chose that one. A more likely Entrez Gene ID can be found further down >>>>> the list, but we simply don't have the resources to figure out if >>>>> there is a better choice in that list (for every reporter on every >>>>> Affy chip we annotate). Nor do we have the resources to ensure that >>>>> any of the mappings that Affy make are reasonable to begin with. We >>>>> have to trust that they (with *way* more resources that us) are doing >>>>> a reasonable job. >>>>> >>>>> The second part of the answer has to do with the 'moving target' >>>>> aspect of Biological annotations. These data change all the time, and >>>>> there is the recurring question of whether one should do an analysis >>>>> and 'freeze' it to that point in time, or should the annotations be >>>>> updated on a regular basis, with the realization that things can and >>>>> will change? >>>>> >>>>> Without looking at each reporter ID you list, I can't say if the >>>>> changes are due to Affy changing their annotation csv files, or to >>>>> changing knowledge of the genome, but I suspect it is a combination of >>>>> the two. >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Alex Sanchez wrote: >>>>> >>>>>> Hello >>>>>> >>>>>> I have had to review recently an analysis I did some time ago. This >>>>>> was done on affymetrix hgu133plus2 chips with R 2.4 and BioC 1.9 I >>>>>> have re-run the analyses using R 2.9 and BioC 2.4 (sessionInfo below). >>>>>> I have been surprised by the changes in the annotations: Many >>>>>> probesets that had had an annotation have become NA's whereas some >>>>>> have changed their symbol and their Entrez gene. >>>>>> >>>>>> To be specific I summarize my question with the top genes of my list >>>>>> >>>>>> The list I obtained 2 years ago is: >>>>>> >>>>>> probeset locuslink symbol >>>>>> 238900_at 3123 HLA-DRB1 >>>>>> 232583_at 8440 NCK2 >>>>>> 236307_at 60468 BACH2 >>>>>> 223620_at 2857 GPR34 >>>>>> 219759_at 64167 LRAP >>>>>> 201702_s_at 5514 PPP1R10 >>>>>> 232882_at 2308 FOXO1A >>>>>> 213446_s_at 8826 IQGAP1 >>>>>> 234033_at 9693 RAPGEF2 >>>>>> 243006_at 2534 FYN >>>>>> 244648_at 54520 CCDC93 >>>>>> 243691_at 23142 DCUN1D4 >>>>>> 239264_at 60412 EXOC4 >>>>>> 243546_at 143686 SESN3 >>>>>> 205239_at 374 AREG >>>>>> 1565703_at 55520 ELAC1 >>>>>> 244061_at 55843 ARHGAP15 >>>>>> 230505_at 26037 SIPA1L1 >>>>>> 242688_at 9320 TRIP12 >>>>>> 1556474_a_at 285097 FLJ38379 >>>>>> 232614_at 596 BCL2 >>>>>> 1565689_at 3839 KPNA3 >>>>>> 236685_at NA NA >>>>>> 225173_at 93663 ARHGAP18 >>>>>> 241893_at 4249 MGAT5 >>>>>> >>>>>> I used the following code to reproduce the issue with the annotations: >>>>>> >>>>>> >>>>>> ##################################################################### >>>>>> ## Verification using R 2.9 & BioC 2.4 >>>>>> ##################################################################### >>>>>> >>>>>> >>>>>>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>>>>>> "219759_at" , >>>>>>> >>>>>> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >>>>>> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >>>>>> "243546_at" , "205239_at" , >>>>>> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >>>>>> "1556474_a_at", >>>>>> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >>>>>> "241893_at") >>>>>> >>>>>>> library(hgu133plus2.db) >>>>>>> library(annotate) >>>>>>> >>>>>>> entrezs<- getEG(probes, "hgu133plus2") >>>>>>> symbols<- getSYMBOL(probes, "hgu133plus2") >>>>>>> sel2<- cbind(probes, entrezs, symbols) >>>>>>> sel2 >>>>>>> >>>>>> probes entrezs symbols 238900_at >>>>>> "238900_at" "100133484" "LOC100133484" >>>>>> 232583_at "232583_at" NA NA 236307_at >>>>>> "236307_at" NA NA 223620_at "223620_at" >>>>>> "2857" "GPR34" 219759_at "219759_at" "64167" >>>>>> "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" >>>>>> 232882_at "232882_at" NA NA 213446_s_at >>>>>> "213446_s_at" "8826" "IQGAP1" 234033_at "234033_at" >>>>>> NA NA 243006_at "243006_at" NA NA >>>>>> 244648_at "244648_at" NA NA 243691_at >>>>>> "243691_at" NA NA 239264_at "239264_at" >>>>>> NA NA 243546_at "243546_at" NA NA >>>>>> 205239_at "205239_at" "374" "AREG" 1565703_at >>>>>> "1565703_at" "4089" "SMAD4" 244061_at "244061_at" >>>>>> NA NA 230505_at "230505_at" "145474" "LOC145474" >>>>>> 242688_at "242688_at" NA NA 1556474_a_at >>>>>> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" >>>>>> NA NA 1565689_at "1565689_at" NA NA >>>>>> 236685_at "236685_at" NA NA 225173_at >>>>>> "225173_at" "93663" "ARHGAP18" 241893_at "241893_at" >>>>>> NA NA >>>>>> >>>>>>> sessionInfo() >>>>>>> >>>>>> R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: >>>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >>>>>> States.1252;LC_MONETARY=English_United >>>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> other attached packages: >>>>>> [1] annotate_1.22.0 hgu133plus2.db_2.2.11 RSQLite_0.7-1 >>>>>> DBI_0.2-4 AnnotationDbi_1.6.0 Biobase_2.4.1 >>>>>> loaded via a namespace (and not attached): >>>>>> [1] xtable_1.5-5 >>>>>> ############################################# >>>>>> >>>>>> Many probesets seem to have changed. >>>>>> Can someone explain to me what is happening (or what may I be doing >>>>>> wrong)? >>>>>> >>>>>> The same code does not work with R 2.4 but if I change hgu133plus2.db >>>>>> by hgu133plus2 and getEG by getLL I obtain the original results: >>>>>> >>>>>> ############################################### >>>>>> ### Review of annotatons with R 2.4 and BioC 1.9 >>>>>> ############################################### >>>>>> >>>>>> ### This code is executed on a clean new session with R 2. and BioC 1.9 >>>>>> >>>>>> >>>>>>> probes<-c("238900_at" , "232583_at", "236307_at" ,"223620_at" , >>>>>>> "219759_at" , >>>>>>> >>>>>> + "201702_s_at" , "232882_at" , "213446_s_at", "234033_at", >>>>>> "243006_at" , + "244648_at" , "243691_at" , "239264_at" , >>>>>> "243546_at" , "205239_at" , >>>>>> + "1565703_at" , "244061_at" , "230505_at" , "242688_at" , >>>>>> "1556474_a_at", >>>>>> + "232614_at" , "1565689_at" , "236685_at" , "225173_at" , >>>>>> "241893_at") >>>>>> >>>>>>> LLs<- getLL(rownames(sel), "hgu133plus2") >>>>>>> symbols<- getSYMBOL(rownames(sel), "hgu133plus2") >>>>>>> sel1<- cbind(probes, LLs, symbols) >>>>>>> sel1 >>>>>>> >>>>>> probes LLs symbols 238900_at >>>>>> "238900_at" "3123" "HLA-DRB1" 232583_at "232583_at" "8440" >>>>>> "NCK2" 236307_at "236307_at" "60468" "BACH2" 223620_at >>>>>> "223620_at" "2857" "GPR34" 219759_at "219759_at" "64167" >>>>>> "ERAP2" 201702_s_at "201702_s_at" "5514" "PPP1R10" 232882_at >>>>>> "232882_at" "2308" "FOXO1" 213446_s_at "213446_s_at" "8826" >>>>>> "IQGAP1" 234033_at "234033_at" "9693" "RAPGEF2" 243006_at >>>>>> "243006_at" "2534" "FYN" 244648_at "244648_at" "54520" >>>>>> "CCDC93" 243691_at "243691_at" "23142" "DCUN1D4" 239264_at >>>>>> "239264_at" "60412" "EXOC4" 243546_at "243546_at" "143686" >>>>>> "SESN3" 205239_at "205239_at" "374" "AREG" 1565703_at >>>>>> "1565703_at" "4089" "SMAD4" 244061_at "244061_at" "55843" >>>>>> "ARHGAP15" 230505_at "230505_at" "145474" "LOC145474" >>>>>> 242688_at "242688_at" "9320" "TRIP12" 1556474_a_at >>>>>> "1556474_a_at" "285097" "FLJ38379" 232614_at "232614_at" "596" >>>>>> "BCL2" 1565689_at "1565689_at" "3839" "KPNA3" 236685_at >>>>>> "236685_at" NA NA 225173_at "225173_at" >>>>>> "93663" "ARHGAP18" 241893_at "241893_at" "4249" "MGAT5" >>>>>> >>>>>>> sessionInfo() >>>>>>> >>>>>> R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: >>>>>> LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MO NETARY=Spa >>>>>> ni >>>>>> sh_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252 >>>>>> >>>>>> >>>>>> attached base packages: >>>>>> [1] "tools" "stats" "graphics" "grDevices" >>>>>> [5] "utils" "datasets" "methods" "base" other attached >>>>>> packages: >>>>>> annotate Biobase hgu133plus2 "1.12.1" "1.12.2" "1.14.0" >>>>>> ######################################################## >>>>>> >>>>>> In summary. If I use R 2.4/BioC 1.9 I obtain the same results I >>>>>> ibtained 2 years ago, but If I do the same steps using R2.9/BioC2.4 >>>>>> the results change dramatically. >>>>>> I have repeated the analyses using BioC 2.01 in R 2.7 and BioC 2.2 in >>>>>> R 2.8 (results not shown here). BioC 2.0 yield the same as 1.9 and >>>>>> BioC 2.2 the same as 2.4, >>>>>> >>>>>> Any help to understand what's happening would be appreciated >>>>>> >>>>>> Alex Sanchez >>>>>> >>>>>> --------------------------------------------------------------- ---------- >>>>>> -- >>>>>> -------------------------- >>>>>> >>>>>> Dr. Alex S?nchez. Statistics Department. University of Barcelona. >>>>>> Facultat de Biologia UB. Avda Diagonal 645. 08028 Barcelona. Spain >>>>>> asanchez_at_ub.edu >>>>>> Statistics and Bioinformatics Unit >>>>>> Institut de Recerca. Hospital Universitari Vall 'Hebron >>>>>> Passeig Vall d'Hebron 112-119. 08034 Barcelona >>>>>> asanchez_at_ir.vhebron.net >>>>>> --------------------------------------------------------------- ---------- >>>>>> -- >>>>>> ------------------------- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------- --------- >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> Douglas Lab >>>>> University of Michigan >>>>> Department of Human Genetics >>>>> 5912 Buhl >>>>> 1241 E. Catherine St. >>>>> Ann Arbor MI 48109-5618 >>>>> 734-615-7826 >>>>> >>>>> >>>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> University of Michigan >>> Department of Human Genetics >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >
ADD REPLY

Login before adding your answer.

Traffic: 275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6