Search
Question: GEOquery: incomplete feature data from GPL soft file
0
gravatar for Sean Davis
5.0 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Mon, Jun 17, 2013 at 7:09 AM, Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> wrote: > Hi, > > I am getting incorrect feature annotation data when loading a dataset from > GPL4133. > The feature data looks like this: > > head(fData(eset)[, 1:2]) > ID COL > 12 12 266 > NA <na> <na> > NA.1 <na> <na> > 15 15 266 > 16 16 266 > NA.2 <na> <na> > > This possibly also results in having less features in the final expression > matrix, if it is at some point restricted to feature names matching the > ones in the loaded annotation data. > > The real issue here seems to be with the soft file being badly formatted, > with lines having double quotes where there should not be: > > 12 266 148 A_24_P66027 A_24_P66027 FALSE > NM_004900 NM_004900 9582 APOBEC3B apolipoprotein B > mRNA editing enzyme, catalytic polypeptide-like 3B" Hs.226307 ... > > Looking at the way GEOquery loads the annotation soft files, we see that > they are read using `quote="\""`, which clearly returns a messed up > data.frame. Thanks, Renaud for the report. I finally got around to making this adjustment, so this should work for you now. Sean
ADD COMMENTlink written 5.0 years ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 247 users visited in the last hour