Entering edit mode
Ibrahim Emam
▴
50
@ibrahim-emam-5823
Last seen 10.3 years ago
Hi Steven,
On 15 Mar 2013, at 17:05, Steven Sheridan <ssherida@embl.de> wrote:
> Hi Ibrahim,
>
> There's a new issue with the queryAE() function. It now reports that
no datasets have processed files (which is an issue, since my script
processes only datasets with processed files--i.e., none). The broken
line is this one:
> pr = getNodeSet(x2,"/experiments//fgem[@count]")
>
> The relevant part of the downloaded XML file is:
> <fgem name="E-GEOD-28146.processed.1.zip" available="true"/>
>
> Perhaps ArrayExpress changed "count" to "available" (it still says
count="true" for raw data files). In any case, changing queryAE from
fgem[@count] to fgem[@available] fixed the issue.
>
>
> But as you're probably aware, the XML file from ArrayExpress
includes this comment:
> This section is deprecated and unsupported.
> Please use webservice located at:
> http://www.ebi.ac.uk/arrayexpress/xml/files
> Will the next version of the ArrayExpress package use this new
webservice?
>
thanks for pointing this out. I was only aware of the change in the
XML for files "http://www.ebi.ac.uk/arrayexpress/xml/files " where
"kind" of processed file changed from "fgem" to "processed" and that
broke getAE() since it uses the new files webservices URL. I have
updated that to the development branch. It should be easy to update
queryAE() to use the new file webservice too. I'll try to squeeze it
for the coming release.
>
> Final comment: queryAE() is a very slow function, because it does a
lot. I am just using it to look up the accessions of datasets that
include my search terms and have processed data; as such, I've
modified the function so it runs much more quickly. I think it would
be useful to offer this option to others, either as another function
or as option(s) within the queryAE function.
>
Point taken. I have actuality been using some java code to do the
exact same thing, but indeed it's worth having a simple retrieval
option for queryAE(). Actually, since I took on the maintenance of the
package queryAE() has been the least of my concerns, mainly because it
was not broken after the AE2 migration, but you are right it needs
optimisation. I'd be happy to take on your modifications with due
credit of course.
Thanks
Ibrahim
> Regards,
>
> Steve
>
>
> On 2/22/2013 6:15 PM, Ibrahim Emam wrote:
>> Hi Steven,
>>
>> Thanks for your input. I will add this to the code. I believe there
should be a bioC release soon.
>> Good to know someone out there is actually using the package ;)
>>
>> Best regards,
>> Ibrahim
>>
>> On 22 Feb 2013, at 14:11, Steven Sheridan wrote:
>>
>>> Hello Ibrahim,
>>>
>>> I have been working with the ArrayExpress package, and I have a
small suggestion for the queryAE() function. If the search keywords
contain characters that are illegal in filenames, the function will
fail upon trying to save the xml file. (The ArrayExpress advanced
query syntax uses brackets: http://www.ebi.ac.uk/fg/doc/help/ae_help.h
tml#AdvancedSearchesNewInterface)
>>>
>>> My fix was just to sanitize the filename before downloading the
xml (the second line below):
>>>
>>> queryfilename = paste("query",keywords,species,".xml",sep="")
>>> queryfilename = gsub("[\\Q*()[]<>:?/\\\\E]", ".", queryfilename,
perl=T) ## sanitizes names to allow filename to be stored
>>> query = try(download.file(qr, queryfilename, mode="wb"))
>>>
>>> Cheers,
>>>
>>> Steve
>>
>
[[alternative HTML version deleted]]