Fix for queryAE() in the ArrayExpress package
0
2
Entering edit mode
Ibrahim Emam ▴ 50
@ibrahim-emam-5823
Last seen 10.3 years ago
Hi Steven, On 15 Mar 2013, at 17:05, Steven Sheridan <ssherida@embl.de> wrote: > Hi Ibrahim, > > There's a new issue with the queryAE() function. It now reports that no datasets have processed files (which is an issue, since my script processes only datasets with processed files--i.e., none). The broken line is this one: > pr = getNodeSet(x2,"/experiments//fgem[@count]") > > The relevant part of the downloaded XML file is: > <fgem name="E-GEOD-28146.processed.1.zip" available="true"/> > > Perhaps ArrayExpress changed "count" to "available" (it still says count="true" for raw data files). In any case, changing queryAE from fgem[@count] to fgem[@available] fixed the issue. > > > But as you're probably aware, the XML file from ArrayExpress includes this comment: > This section is deprecated and unsupported. > Please use webservice located at: > http://www.ebi.ac.uk/arrayexpress/xml/files > Will the next version of the ArrayExpress package use this new webservice? > thanks for pointing this out. I was only aware of the change in the XML for files "http://www.ebi.ac.uk/arrayexpress/xml/files " where "kind" of processed file changed from "fgem" to "processed" and that broke getAE() since it uses the new files webservices URL. I have updated that to the development branch. It should be easy to update queryAE() to use the new file webservice too. I'll try to squeeze it for the coming release. > > Final comment: queryAE() is a very slow function, because it does a lot. I am just using it to look up the accessions of datasets that include my search terms and have processed data; as such, I've modified the function so it runs much more quickly. I think it would be useful to offer this option to others, either as another function or as option(s) within the queryAE function. > Point taken. I have actuality been using some java code to do the exact same thing, but indeed it's worth having a simple retrieval option for queryAE(). Actually, since I took on the maintenance of the package queryAE() has been the least of my concerns, mainly because it was not broken after the AE2 migration, but you are right it needs optimisation. I'd be happy to take on your modifications with due credit of course. Thanks Ibrahim > Regards, > > Steve > > > On 2/22/2013 6:15 PM, Ibrahim Emam wrote: >> Hi Steven, >> >> Thanks for your input. I will add this to the code. I believe there should be a bioC release soon. >> Good to know someone out there is actually using the package ;) >> >> Best regards, >> Ibrahim >> >> On 22 Feb 2013, at 14:11, Steven Sheridan wrote: >> >>> Hello Ibrahim, >>> >>> I have been working with the ArrayExpress package, and I have a small suggestion for the queryAE() function. If the search keywords contain characters that are illegal in filenames, the function will fail upon trying to save the xml file. (The ArrayExpress advanced query syntax uses brackets: http://www.ebi.ac.uk/fg/doc/help/ae_help.h tml#AdvancedSearchesNewInterface) >>> >>> My fix was just to sanitize the filename before downloading the xml (the second line below): >>> >>> queryfilename = paste("query",keywords,species,".xml",sep="") >>> queryfilename = gsub("[\\Q*()[]<>:?/\\\\E]", ".", queryfilename, perl=T) ## sanitizes names to allow filename to be stored >>> query = try(download.file(qr, queryfilename, mode="wb")) >>> >>> Cheers, >>> >>> Steve >> > [[alternative HTML version deleted]]
ArrayExpress ArrayExpress • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6