Fwd: Annotation discrepancy
1
0
Entering edit mode
Eric Zollars ▴ 30
@eric-zollars-6299
Last seen 6.9 years ago
United States
All- I have been attempting to compare sequences on the HGU133 Plus 2.0 chip to the HT HGU 133+ PM. I am doing this to compare values of vectors in frma. The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes removes and some probesets reduced in size. Looking at the probe package: hthgu133pluspmprobe$sequence: 519370 However, when looking at an Affybatch object made from HT CEL files: Taking an Affybatch object: 'dat' Index <- pmindex(dat) tv = unlist(Index) length(tv) #536460 It appears that the Affybatch reports that there are 536460 sequences and the hthgu133pluspmprobe package is reporting only 519370. What is the difference? It is possible to find the information on the 17090 sequences not in the hthgu133pluspmprobe package? Thanks for any information or direction. Eric Zollars Session info below: bioconductor 2.13, R 3.0.2 > sessionInfo() R version 3.0.2 (2013-09-25) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] affy_1.40.0 hthgu133pluspmcdf_2.13.0 hgu133plus2frmavecs_1.3.0 [4] hgu133plus2probe_2.13.0 hthgu133pluspmprobe_2.13.0 AnnotationDbi_1.24.0 [7] Biobase_2.22.0 BiocGenerics_0.8.0 BiocInstaller_1.12.0 loaded via a namespace (and not attached): [1] affyio_1.30.0 DBI_0.2-7 IRanges_1.20.6 [4] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2 [7] tools_3.0.2 zlibbioc_1.8.0 -- Eric Zollars MD, PhD Fellow, Division of Rheumatology The Johns Hopkins Hospital [[alternative HTML version deleted]]
probe frma probe frma • 698 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 51 minutes ago
United States
Hi Eric, Most if not all of those probes are the oligo-dT probes that surround the chip (and I believe there are some in the middle as well). These probes are used by the scanner as 'landing lights' to allow the scanner to accurately align to the array prior to doing the scan. The scanner does collect data from these probes, which ends up in the cel file, but they are then ignored when the array is processed further. Best, Jim On 12/20/2013 1:28 PM, Eric Zollars wrote: > All- > > I have been attempting to compare sequences on the HGU133 Plus 2.0 chip to > the HT HGU 133+ PM. > I am doing this to compare values of vectors in frma. > > The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes removes and > some probesets reduced in size. > > Looking at the probe package: > > hthgu133pluspmprobe$sequence: 519370 > > However, when looking at an Affybatch object made from HT CEL files: > Taking an Affybatch object: 'dat' > > Index <- pmindex(dat) > tv = unlist(Index) > length(tv) #536460 > > It appears that the Affybatch reports that there are 536460 sequences and > the hthgu133pluspmprobe package is reporting only 519370. > > What is the difference? It is possible to find the information on the > 17090 sequences not in the hthgu133pluspmprobe package? > > Thanks for any information or direction. > > Eric Zollars > > Session info below: bioconductor 2.13, R 3.0.2 > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] affy_1.40.0 hthgu133pluspmcdf_2.13.0 > hgu133plus2frmavecs_1.3.0 > [4] hgu133plus2probe_2.13.0 hthgu133pluspmprobe_2.13.0 > AnnotationDbi_1.24.0 > [7] Biobase_2.22.0 BiocGenerics_0.8.0 > BiocInstaller_1.12.0 > > loaded via a namespace (and not attached): > [1] affyio_1.30.0 DBI_0.2-7 IRanges_1.20.6 > [4] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2 > [7] tools_3.0.2 zlibbioc_1.8.0 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Jim- Thanks for the response. However, in the hgu133plus2probe package there is complete agreement between what is in the probe package and what the Affybatch object reports (604258 sequences). Why would that be so? On Fri, Dec 20, 2013 at 2:05 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Eric, > > Most if not all of those probes are the oligo-dT probes that surround the > chip (and I believe there are some in the middle as well). These probes are > used by the scanner as 'landing lights' to allow the scanner to accurately > align to the array prior to doing the scan. > > The scanner does collect data from these probes, which ends up in the cel > file, but they are then ignored when the array is processed further. > > Best, > > Jim > > > > On 12/20/2013 1:28 PM, Eric Zollars wrote: > >> All- >> >> I have been attempting to compare sequences on the HGU133 Plus 2.0 chip to >> the HT HGU 133+ PM. >> I am doing this to compare values of vectors in frma. >> >> The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes removes >> and >> some probesets reduced in size. >> >> Looking at the probe package: >> >> hthgu133pluspmprobe$sequence: 519370 >> >> However, when looking at an Affybatch object made from HT CEL files: >> Taking an Affybatch object: 'dat' >> >> Index <- pmindex(dat) >> tv = unlist(Index) >> length(tv) #536460 >> >> It appears that the Affybatch reports that there are 536460 sequences and >> the hthgu133pluspmprobe package is reporting only 519370. >> >> What is the difference? It is possible to find the information on the >> 17090 sequences not in the hthgu133pluspmprobe package? >> >> Thanks for any information or direction. >> >> Eric Zollars >> >> Session info below: bioconductor 2.13, R 3.0.2 >> >> sessionInfo() >>> >> R version 3.0.2 (2013-09-25) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 >> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] affy_1.40.0 hthgu133pluspmcdf_2.13.0 >> hgu133plus2frmavecs_1.3.0 >> [4] hgu133plus2probe_2.13.0 hthgu133pluspmprobe_2.13.0 >> AnnotationDbi_1.24.0 >> [7] Biobase_2.22.0 BiocGenerics_0.8.0 >> BiocInstaller_1.12.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.30.0 DBI_0.2-7 IRanges_1.20.6 >> [4] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2 >> [7] tools_3.0.2 zlibbioc_1.8.0 >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Eric, Good point. So let's look, shall we? > library(hthgu133pluspmprobe) > library(hthgu133pluspmcdf) > ht <- as.data.frame(hthgu133pluspmprobe) > prb.lst <- tapply(1:nrow(ht), ht$Probe.Set.Name, function(x) ht[x,2:3]) > cdf.lst <- mget(ls(hthgu133pluspmcdf), hthgu133pluspmcdf) > names(prb.lst) <- tolower(names(prb.lst)) ## because stupid Affy can't keep their names consistent > names(cdf.lst) <- tolower(names(cdf.lst)) > all.equal(names(prb.lst), names(cdf.lst)) [1] TRUE > prb.lst.len <- sapply(prb.lst, nrow) > cdf.lst.len <- sapply(cdf.lst, nrow) > all.equal(prb.lst.len, cdf.lst.len) [1] "Mean relative difference: 427.25" > length(which(prb.lst.len != cdf.lst.len)) [1] 40 > cbind(prb.lst.len, cdf.lst.len)[prb.lst.len != cdf.lst.len,] prb.lst.len cdf.lst.len affx-nonspecificgc10_at 1 952 affx-nonspecificgc11_at 1 960 affx-nonspecificgc12_at 1 973 affx-nonspecificgc13_at 1 968 affx-nonspecificgc14_at 1 960 affx-nonspecificgc15_at 1 949 affx-nonspecificgc16_at 1 963 affx-nonspecificgc17_at 1 942 affx-nonspecificgc18_at 1 912 affx-nonspecificgc19_at 1 849 affx-nonspecificgc20_at 1 813 affx-nonspecificgc21_at 1 697 affx-nonspecificgc22_at 1 585 affx-nonspecificgc23_at 1 407 affx-nonspecificgc24_at 1 268 affx-nonspecificgc25_at 1 9 affx-nonspecificgc3_at 1 25 affx-nonspecificgc4_at 1 322 affx-nonspecificgc5_at 1 703 affx-nonspecificgc6_at 1 873 affx-nonspecificgc7_at 1 914 affx-nonspecificgc8_at 1 940 affx-nonspecificgc9_at 1 959 affx-r2-taga_at 1 11 affx-r2-tagb_at 1 11 affx-r2-tagc_at 1 11 affx-r2-tagd_at 1 11 affx-r2-tage_at 1 11 affx-r2-tagf_at 1 11 affx-r2-tagg_at 1 11 affx-r2-tagh_at 1 11 affx-r2-tagin-3_at 1 11 affx-r2-tagin-5_at 1 11 affx-r2-tagin-m_at 1 11 affx-r2-tagj-3_at 1 11 affx-r2-tagj-5_at 1 11 affx-r2-tago-3_at 1 11 affx-r2-tago-5_at 1 11 affx-r2-tagq-3_at 1 11 affx-r2-tagq-5_at 1 11 So there you go - there's a bunch of control probes of different sorts for which Affy gives us a single sequence, but for which there appear to be lots of probes. Netaffx seems unwilling to say much about the nonspecificgc probes, but as an example, it does say there are 11 individual probe sequences for e.g., affx-r2-tagin-3_at. Best, Jim On 12/20/2013 2:15 PM, Eric Zollars wrote: > Jim- > Thanks for the response. > > However, in the hgu133plus2probe package there is complete agreement > between what is in the probe package and what the Affybatch object reports > (604258 sequences). > > Why would that be so? > > > On Fri, Dec 20, 2013 at 2:05 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > >> Hi Eric, >> >> Most if not all of those probes are the oligo-dT probes that surround the >> chip (and I believe there are some in the middle as well). These probes are >> used by the scanner as 'landing lights' to allow the scanner to accurately >> align to the array prior to doing the scan. >> >> The scanner does collect data from these probes, which ends up in the cel >> file, but they are then ignored when the array is processed further. >> >> Best, >> >> Jim >> >> >> >> On 12/20/2013 1:28 PM, Eric Zollars wrote: >> >>> All- >>> >>> I have been attempting to compare sequences on the HGU133 Plus 2.0 chip to >>> the HT HGU 133+ PM. >>> I am doing this to compare values of vectors in frma. >>> >>> The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes removes >>> and >>> some probesets reduced in size. >>> >>> Looking at the probe package: >>> >>> hthgu133pluspmprobe$sequence: 519370 >>> >>> However, when looking at an Affybatch object made from HT CEL files: >>> Taking an Affybatch object: 'dat' >>> >>> Index <- pmindex(dat) >>> tv = unlist(Index) >>> length(tv) #536460 >>> >>> It appears that the Affybatch reports that there are 536460 sequences and >>> the hthgu133pluspmprobe package is reporting only 519370. >>> >>> What is the difference? It is possible to find the information on the >>> 17090 sequences not in the hthgu133pluspmprobe package? >>> >>> Thanks for any information or direction. >>> >>> Eric Zollars >>> >>> Session info below: bioconductor 2.13, R 3.0.2 >>> >>> sessionInfo() >>> R version 3.0.2 (2013-09-25) >>> Platform: i386-w64-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>> States.1252 >>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>> >>> [5] LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> base >>> >>> other attached packages: >>> [1] affy_1.40.0 hthgu133pluspmcdf_2.13.0 >>> hgu133plus2frmavecs_1.3.0 >>> [4] hgu133plus2probe_2.13.0 hthgu133pluspmprobe_2.13.0 >>> AnnotationDbi_1.24.0 >>> [7] Biobase_2.22.0 BiocGenerics_0.8.0 >>> BiocInstaller_1.12.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.30.0 DBI_0.2-7 IRanges_1.20.6 >>> [4] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2 >>> [7] tools_3.0.2 zlibbioc_1.8.0 >>> >>> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY

Login before adding your answer.

Traffic: 518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6