GEOquery: incomplete feature data from GPL soft file
On Mon, Jun 17, 2013 at 7:09 AM, Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> wrote: > Hi, > > I am getting incorrect feature annotation data when loading a dataset from > GPL4133. > The feature data looks like this: > > head(fData(eset)[, 1:2]) > ID COL > 12 12 266 > NA <na> <na> > NA.1 <na> <na> > 15 15 266 > 16 16 266 > NA.2 <na> <na> > > This possibly also results in having less features in the final expression > matrix, if it is at some point restricted to feature names matching the > ones in the loaded annotation data. > > The real issue here seems to be with the soft file being badly formatted, > with lines having double quotes where there should not be: > > 12 266 148 A_24_P66027 A_24_P66027 FALSE > NM_004900 NM_004900 9582 APOBEC3B apolipoprotein B > mRNA editing enzyme, catalytic polypeptide-like 3B" Hs.226307 ... > > Looking at the way GEOquery loads the annotation soft files, we see that > they are read using quote="\"", which clearly returns a messed up > data.frame. Thanks, Renaud for the report. I finally got around to making this adjustment, so this should work for you now. Sean