Entering edit mode
Matthew Hannah
▴
940
@matthew-hannah-621
Last seen 10.6 years ago
Thanks,
The main problem is that I'm a complete beginner with R, trying to
learn as
I go along. I've investigated abit more and because the
ath1121501probe package
already corresponds to the ATH1-121501_probe_tab theres no point in
reading in
that file.
What I did with assistance from help was-
>x <- ath1121501probe$Probe.Set.Name
>y <-unique(x, incomparables = FALSE)
>y
This then returned a vector of the probeset names from the
ath1121501probe
package which contained 22814 probesets, which is 4 more than there
are on the
chip! So this could account for the 43 extra sequences. The problem
now is that
I don't know a simple way of finding and eliminating them.
I can get a vector of the actual probesets on the chip by-
>data <- ReadAffy()
>gn <- geneNames(data)
So basically I want to compare gn with y to find the 4 unique values
in y. I
combined gn and y with-
>z <-c(gn, y)
But the function I want to use 'uniquecombs' is in the mgcv package
and I don't
seem to be able to get that from CRAN for my windows R1.9devel.
Is there an easier way?
Thanks
Matt
-----Original Message-----
From: Robert Gentleman [mailto:rgentlem@jimmy.harvard.edu]
Sent: Freitag, 13. Februar 2004 14:04
To: Matthew Hannah
Subject: Re: [BioC] ath1121501probe_1.0 error (was GCRMA missing value
error on ATH1 chip)
On Fri, Feb 13, 2004 at 12:59:48PM +0100, Matthew Hannah wrote:
> Thanks,
>
> I've investigated this some more and found that the
ATH1-121501_probe_tab.zip
> file from the affy website contains 251,121 sequences whilst the CEL
files and
> the ATH1-121501_probe_fasta.zip only contain 251,078 probes. It
therefore seems
> that the errors were there in the tab file before the BioC
ath1121501probe
> package was made. I've emailed affymetrix about it but don't expect
a quick
> response judging from past queries.
>
> So does anyone know how to find the extra values in the tab file? It
doesn't
> look like there are simply extra values added at the start or
finish. Does anyone
> familiar with R know how to obtain a list of Affy ID vs. # of probes
from the
> ath1121501probe package or by reading in the ATH1-121501_probe_tab
file. This
> would be easy to cross-reference with the Affy ID vs. probe number
that you get
> from the CEL file during MAS5 analysis.
Basically you can look at the file format, then use scan to suck in
pretty much anything. Stick the input into a data.frame and then
compare the affy ids from the two files. I can't imagine that it is
more than a 1/2 hour of work. And if you did it generally we could
add it to one of the packages (matchprobes?) so that others could do
the same if this problem resurfaces.
Because of the costs involved in changing the layout of a chip I
expect that Affy often has to drop some probes from the analysis.
However, since they do not seem to version anything (or at least not
the last time I checked) there is very little point to checking until
a problem is found - like now.
>
> Has this been an issue for any other chips, are we just trusting
affymetrix to
> provide the correct sequence data? I've seen some data showing that
~700 ATH1
> probesets don't match their intended target when an independent
BLAST was done.
>
It would be nice to have a tool here too. We have played a bit with
notions of sensitivity and specificity (do the probes go to the gene
they are annotated at and do they only go there). That would not be
too hard to do (although some substantial computing resource would
be needed and again a lack of version numbers on Affy's part makes
like a little hard). However, a somewhat larger problem looms and
that is determining just what to blast against (and probably with
short sequences I would not blast but rather use some sort of
perfect matching - with 1 error algorithm; the Biostrings package
has some stuff that could be used for this purpose).
Robert
> Thanks
> Matt
>
>
>
> >HI,
> > there seems to be a disagreement on how many pm probes there are
on the
> >chip. This is causing problem in matching the pm intensities with
> >sequences. I am not sure if this is true for all ATH1 chip...
> >
> > After reading in your Cel file into "object",
> >###########
> > pmIndex <- unlist(indexProbes(object,"pm"))
> > length(pmIndex)
> > #[1]251078
> > #however the probe package gives 251121 pm probe sequences.
> > length(get("ath1121501probe")$sequence)
> > [1] 251121
>
> > right now I am not sure which should be fixed-- whether the probe
> >package has some redundent sequences that are not PM probes or the
> >indexProbes missed some pm probes?
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
--
+---------------------------------------------------------------------
------+
| Robert Gentleman phone : (617) 632-5250
|
| Associate Professor fax: (617) 632-2444
|
| Department of Biostatistics office: M1B20
|
| Harvard School of Public Health email: rgentlem@jimmy.harvard.edu
|
+---------------------------------------------------------------------
------+