ath1121501probe_1.0 error (was GCRMA missing value error on ATH1 chip)

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 11.3 years ago

Thanks, The main problem is that I'm a complete beginner with R, trying to learn as I go along. I've investigated abit more and because the ath1121501probe package already corresponds to the ATH1-121501_probe_tab theres no point in reading in that file. What I did with assistance from help was- >x <- ath1121501probe$Probe.Set.Name >y <-unique(x, incomparables = FALSE) >y This then returned a vector of the probeset names from the ath1121501probe package which contained 22814 probesets, which is 4 more than there are on the chip! So this could account for the 43 extra sequences. The problem now is that I don't know a simple way of finding and eliminating them. I can get a vector of the actual probesets on the chip by- >data <- ReadAffy() >gn <- geneNames(data) So basically I want to compare gn with y to find the 4 unique values in y. I combined gn and y with- >z <-c(gn, y) But the function I want to use 'uniquecombs' is in the mgcv package and I don't seem to be able to get that from CRAN for my windows R1.9devel. Is there an easier way? Thanks Matt -----Original Message----- From: Robert Gentleman [mailto:rgentlem@jimmy.harvard.edu] Sent: Freitag, 13. Februar 2004 14:04 To: Matthew Hannah Subject: Re: [BioC] ath1121501probe_1.0 error (was GCRMA missing value error on ATH1 chip) On Fri, Feb 13, 2004 at 12:59:48PM +0100, Matthew Hannah wrote: > Thanks, > > I've investigated this some more and found that the ATH1-121501_probe_tab.zip > file from the affy website contains 251,121 sequences whilst the CEL files and > the ATH1-121501_probe_fasta.zip only contain 251,078 probes. It therefore seems > that the errors were there in the tab file before the BioC ath1121501probe > package was made. I've emailed affymetrix about it but don't expect a quick > response judging from past queries. > > So does anyone know how to find the extra values in the tab file? It doesn't > look like there are simply extra values added at the start or finish. Does anyone > familiar with R know how to obtain a list of Affy ID vs. # of probes from the > ath1121501probe package or by reading in the ATH1-121501_probe_tab file. This > would be easy to cross-reference with the Affy ID vs. probe number that you get > from the CEL file during MAS5 analysis. Basically you can look at the file format, then use scan to suck in pretty much anything. Stick the input into a data.frame and then compare the affy ids from the two files. I can't imagine that it is more than a 1/2 hour of work. And if you did it generally we could add it to one of the packages (matchprobes?) so that others could do the same if this problem resurfaces. Because of the costs involved in changing the layout of a chip I expect that Affy often has to drop some probes from the analysis. However, since they do not seem to version anything (or at least not the last time I checked) there is very little point to checking until a problem is found - like now. > > Has this been an issue for any other chips, are we just trusting affymetrix to > provide the correct sequence data? I've seen some data showing that ~700 ATH1 > probesets don't match their intended target when an independent BLAST was done. > It would be nice to have a tool here too. We have played a bit with notions of sensitivity and specificity (do the probes go to the gene they are annotated at and do they only go there). That would not be too hard to do (although some substantial computing resource would be needed and again a lack of version numbers on Affy's part makes like a little hard). However, a somewhat larger problem looms and that is determining just what to blast against (and probably with short sequences I would not blast but rather use some sort of perfect matching - with 1 error algorithm; the Biostrings package has some stuff that could be used for this purpose). Robert > Thanks > Matt > > > > >HI, > > there seems to be a disagreement on how many pm probes there are on the > >chip. This is causing problem in matching the pm intensities with > >sequences. I am not sure if this is true for all ATH1 chip... > > > > After reading in your Cel file into "object", > >########### > > pmIndex <- unlist(indexProbes(object,"pm")) > > length(pmIndex) > > #[1]251078 > > #however the probe package gives 251121 pm probe sequences. > > length(get("ath1121501probe")$sequence) > > [1] 251121 > > > right now I am not sure which should be fixed-- whether the probe > >package has some redundent sequences that are not PM probes or the > >indexProbes missed some pm probes? > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- +--------------------------------------------------------------------- ------+ | Robert Gentleman phone : (617) 632-5250 | | Associate Professor fax: (617) 632-2444 | | Department of Biostatistics office: M1B20 | | Harvard School of Public Health email: rgentlem@jimmy.harvard.edu | +--------------------------------------------------------------------- ------+

GO probe affy gcrma Biostrings GO probe affy gcrma Biostrings • 1.1k views

ADD COMMENT • link 21.9 years ago Matthew Hannah ▴ 940

Login before adding your answer.