GEOquery
7
1
Entering edit mode
@harpreet-saini-2897
Last seen 6.6 years ago
Hi, I am trying to obtain GSE matrix files as expression sets by turning the GSEMatrix true as following: >gse<-getGEO("GSE2553", GSEMatrix = TRUE) I am getting the following error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 8 elements In addition: Warning messages: 1: In if (nchar(val) == nchar(x)) return(NA) : the condition has length > 1 and only the first element will be used 2: In if (nchar(val) == nchar(x)) return(NA) : the condition has length > 1 and only the first element will be used Any help? Thanks Harpreet
• 1.2k views
0
Entering edit mode
@sean-davis-490
Last seen 3 days ago
United States
On Mon, Jul 7, 2008 at 11:34 PM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> wrote: > Hi, > > I am trying to obtain GSE matrix files as expression sets by turning the GSEMatrix true as following: > >>gse<-getGEO("GSE2553", GSEMatrix = TRUE) > > I am getting the following error: > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > line 1 did not have 8 elements > In addition: Warning messages: > 1: In if (nchar(val) == nchar(x)) return(NA) : > the condition has length > 1 and only the first element will be used > 2: In if (nchar(val) == nchar(x)) return(NA) : > the condition has length > 1 and only the first element will be used > > Any help? Thanks, Harpreet, for the report. Could you send the output of sessionInfo()? On R-devel, I am not able to reproduce the error, so it would help to have the further detail. Sean
0
Entering edit mode
@sean-davis-490
Last seen 3 days ago
United States
On Tue, Jul 8, 2008 at 12:00 AM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> wrote: > Hi Sean, > > Here is the output of sessionInfo() > > R version 2.7.0 (2008-04-22) > i686-pc-linux-gnu > > locale: > C > > attached base packages: > [1] tools stats graphics grDevices datasets utils methods > [8] base > > other attached packages: > [1] GEOquery_2.4.0 RCurl_0.9-3 Biobase_2.0.1 > Thanks, Harpreet. That looks fine. Was the download interrupted? If you could try it again and include the entire session (input and output) if it fails, that might be helpful. Sean > "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: >> On Mon, Jul 7, 2008 at 11:34 PM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> wrote: >> > Hi, >> > >> > I am trying to obtain GSE matrix files as expression sets by turning the GSEMatrix true as following: >> > >> >>gse<-getGEO("GSE2553", GSEMatrix = TRUE) >> > >> > I am getting the following error: >> > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : >> > line 1 did not have 8 elements >> > In addition: Warning messages: >> > 1: In if (nchar(val) == nchar(x)) return(NA) : >> > the condition has length > 1 and only the first element will be used >> > 2: In if (nchar(val) == nchar(x)) return(NA) : >> > the condition has length > 1 and only the first element will be used >> > >> > Any help? >> >> Thanks, Harpreet, for the report. Could you send the output of >> sessionInfo()? On R-devel, I am not able to reproduce the error, so >> it would help to have the further detail. >> >> Sean >> >> > >
0
Entering edit mode
@sean-davis-490
Last seen 3 days ago
United States
On Tue, Jul 8, 2008 at 6:17 AM, Svetlana Vinogradova <kintany at="" gmail.com=""> wrote: > Dear Sean Davis, > > I'm a student of Moscow State University and I want to use GEOquery to get > the information from GEO. I use R on ubuntu, so I've downloaded the > source... Then I'm trying to install the package but got some problems... > >> install.packages("GEOquery_1.7.2.tar.gz", repos=NULL, >> lib=.libPaths()[[1]]) > * Installing *source* package 'GEOquery' ... > ** R > ** inst > ** help > >>> Building/Updating help pages for package 'GEOquery' > Formats: text html latex example > GDS-class text html latex > GDS2MA text html latex example > GEOData-class text html latex > missing link(s): dataTable-class > GEODataTable-class text html latex > GPL-class text html latex > GSE-class text html latex > GSM-class text html latex > getGEO text html latex example > getGEOfile text html latex example > parseGEO text html latex > ** building package indices ... > * DONE (GEOquery) > > > Could you help me to solve it? Hello, Svetlana. Thanks for the interest in GEOquery. The output above looks OK--I do not see any errors.... However, the better way to install bioconductor packages (and, indeed, R packages in general) is to use the biocLite() script: source('http://bioconductor.org/biocLite.R') biocLite('GEOquery') Doing so ensures that the package versions match each other and the appropriate version of R. You can refer to the bioconductor website for more information about installation of bioconductor packages. If you have further problems, could you include the output of sessionInfo() in the email? Sean
0
Entering edit mode
@sean-davis-490
Last seen 3 days ago
United States
0
Entering edit mode
@sean-davis-490
Last seen 3 days ago
United States
On Tue, Jul 8, 2008 at 11:09 AM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> wrote: > Hi Sean, > > Sorry to bother you again. > But, I tried again many times, and still I am getting the same error: > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 1 did not have 8 elements > In addition: Warning messages: > 1: In if (nchar(val) == nchar(x)) return(NA) : > the condition has length > 1 and only the first element will be used > 2: In if (nchar(val) == nchar(x)) return(NA) : > the condition has length > 1 and only the first element will be used > > I use the following commands: >>library(GEOquery) >>gse4 <- getGEO("GSE4201", GSEMatrix = TRUE) Thanks, Harpreet. Could you send me the complete input and output? I cannot see what the output looks like to see if there is a problem with download or what else might be the problem. Sean > > Sean Davis wrote: >> >> On Tue, Jul 8, 2008 at 12:00 AM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> wrote: >> >>> >>> Hi Sean, >>> >>> Here is the output of sessionInfo() >>> >>> R version 2.7.0 (2008-04-22) >>> i686-pc-linux-gnu >>> >>> locale: >>> C >>> >>> attached base packages: >>> [1] tools stats graphics grDevices datasets utils methods >>> [8] base >>> >>> other attached packages: >>> [1] GEOquery_2.4.0 RCurl_0.9-3 Biobase_2.0.1 >>> >>> >> >> Thanks, Harpreet. That looks fine. Was the download interrupted? If >> you could try it again and include the entire session (input and >> output) if it fails, that might be helpful. >> >> Sean >> >> >>> >>> "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: >>> >>>> >>>> On Mon, Jul 7, 2008 at 11:34 PM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> >>>> wrote: >>>> >>>>> >>>>> Hi, >>>>> >>>>> I am trying to obtain GSE matrix files as expression sets by turning >>>>> the GSEMatrix true as following: >>>>> >>>>> >>>>>> >>>>>> gse<-getGEO("GSE2553", GSEMatrix = TRUE) >>>>>> >>>>> >>>>> I am getting the following error: >>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>>>> na.strings, : >>>>> line 1 did not have 8 elements >>>>> In addition: Warning messages: >>>>> 1: In if (nchar(val) == nchar(x)) return(NA) : >>>>> the condition has length > 1 and only the first element will be used >>>>> 2: In if (nchar(val) == nchar(x)) return(NA) : >>>>> the condition has length > 1 and only the first element will be used >>>>> >>>>> Any help? >>>>> >>>> >>>> Thanks, Harpreet, for the report. Could you send the output of >>>> sessionInfo()? On R-devel, I am not able to reproduce the error, so >>>> it would help to have the further detail. >>>> >>>> Sean >>>> >>>> >>>> >>> >>> >> >> >> >> > > > -- > Harpreet Kaur Saini > Team 101, Room No. D313 > Wellcome Trust Sanger Institute > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SA > United Kingdom > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. >
0
Entering edit mode
@sean-davis-490
Last seen 3 days ago
United States
On Tue, Jul 8, 2008 at 11:56 AM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> wrote: > Here is the output: > >> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/") > [1] " \"http://www.w3.org/TR/html4/loose.dtd\">\n\n\n<html><head>\nFTP Directory: > <a rel="nofollow" href="<a href=">ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/</a>" rel="nofollow"><a rel="nofollow" href="ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/">ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/</a>\n\ n<style> type=\"text/css\"></style>\n</head><body>\n

## \nFTP > Directory: ftp://ftp.ncbi.nih.gov/ HREF=\"/pub/\">pub/geo/ HREF=\"/pub/geo/DATA/\">DATA/ HREF=\"/pub/geo/DATA/SeriesMatrix/\">SeriesMatrix/ HREF=\"/pub/geo/DATA/SeriesMatrix/GSE4201/\">GSE4201/

\n
\n HREF=\"../\"> SRC=\"http://cachesrv1a.internal.sanger.ac.uk/squid-internal-
static/icons/anthony-dirup.gif\"
> ALT=\"[DIRUP]\"> Parent Directory \n HREF=\"GSE4201_series_matrix.txt.gz\"> SRC=\"http://cachesrv1a.internal.sanger.ac.uk/squid-internal-
static/icons/anthony-text.gif\"
> ALT=\"[FILE]\">
HREF=\"GSE4201_series_matrix.txt.gz\">GSE4201_series_matrix.txt.gz
. .
> Apr 13 05:32    909K\n
\n
size=\"1px\">\n
\nGenerated Tue, 08 Jul 2008 15:54:09 GMT by > cachesrv1a.internal.sanger.ac.uk > (squid/2.7.STABLE3)\n
</body></html>\n" > Warning messages: > 1: In if (nchar(val) == nchar(x)) return(NA) : > the condition has length > 1 and only the first element will be used > 2: In if (nchar(val) == nchar(x)) return(NA) : > the condition has length > 1 and only the first element will be used So, this appears to be the problem. It looks like your proxy is intercepting the ftp directory listing and converting it to HTML. I do not know how to solve this problem, as it appears to be a proxy configuration issue at your institution. However, I can't say for sure. The output of the getURL() command should look like: > getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/") [1] "-r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32 GSE4201_series_matrix.txt.gz\n" Notice how yours is much longer and is HTML, not plain text. Sean > Sean Davis wrote: >> >> On Tue, Jul 8, 2008 at 11:46 AM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> wrote: >> >>> >>> Hi Sean, >>> >>> There is one more thing. In my .Rprofile file, the download.file.method >>> option is 'wget' and we are behind the firewall. >>> >>> But, when I used GSEMatrix=FALSE" option, then its working. >>> >> >> Harpreet, could you do me another favor and send the output of: >> >> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/") >> >> Sean >> >> >> >>>> >>>> g<-getGEO("GSE4201",GSEMatrix=F) >>>> >>> >>> --16:41:59-- >>> ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/by_series/GSE4201/GSE4201 >>> _family.soft.gz >>> => /tmp/Rtmpv8lgf9/GSE4201.soft.gz' >>> Resolving wwwcache.sanger.ac.uk... 172.18.24.2, 172.18.24.1 >>> Connecting to wwwcache.sanger.ac.uk[172.18.24.2]:3128... connected. >>> Proxy request sent, awaiting response... 200 OK >>> Length: 4,305,926 [text/plain] >>> >>> 100%[====================================>] 4,305,926 10.77M/s >>> >>> 16:41:59 (10.76 MB/s) - /tmp/Rtmpv8lgf9/GSE4201.soft.gz' saved >>> [4305926/4305926 ] >>> >>> File stored at: >>> /tmp/Rtmpv8lgf9/GSE4201.soft >>> Parsing.... >>> ^PLATFORM = GPL1319 >>> >>> Harpreet >>> >>> >>> Sean Davis wrote: >>> >>>> >>>> On Tue, Jul 8, 2008 at 11:09 AM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> >>>> wrote: >>>> >>>> >>>>> >>>>> Hi Sean, >>>>> >>>>> Sorry to bother you again. >>>>> But, I tried again many times, and still I am getting the same error: >>>>> >>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>>>> na.strings, >>>>> : >>>>> line 1 did not have 8 elements >>>>> In addition: Warning messages: >>>>> 1: In if (nchar(val) == nchar(x)) return(NA) : >>>>> the condition has length > 1 and only the first element will be used >>>>> 2: In if (nchar(val) == nchar(x)) return(NA) : >>>>> the condition has length > 1 and only the first element will be used >>>>> >>>>> I use the following commands: >>>>> >>>>> >>>>>> >>>>>> library(GEOquery) >>>>>> gse4 <- getGEO("GSE4201", GSEMatrix = TRUE) >>>>>> >>>>>> >>>> >>>> Thanks, Harpreet. Could you send me the complete input and output? I >>>> cannot see what the output looks like to see if there is a problem >>>> with download or what else might be the problem. >>>> >>>> Sean >>>> >>>> >>>> >>>>> >>>>> Sean Davis wrote: >>>>> >>>>> >>>>>> >>>>>> On Tue, Jul 8, 2008 at 12:00 AM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Hi Sean, >>>>>>> >>>>>>> Here is the output of sessionInfo() >>>>>>> >>>>>>> R version 2.7.0 (2008-04-22) >>>>>>> i686-pc-linux-gnu >>>>>>> >>>>>>> locale: >>>>>>> C >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] tools stats graphics grDevices datasets utils >>>>>>> methods >>>>>>> [8] base >>>>>>> >>>>>>> other attached packages: >>>>>>> [1] GEOquery_2.4.0 RCurl_0.9-3 Biobase_2.0.1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> Thanks, Harpreet. That looks fine. Was the download interrupted? If >>>>>> you could try it again and include the entire session (input and >>>>>> output) if it fails, that might be helpful. >>>>>> >>>>>> Sean >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Mon, Jul 7, 2008 at 11:34 PM, Harpreet Saini <hs1 at="" sanger.ac.uk=""> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am trying to obtain GSE matrix files as expression sets by >>>>>>>>> turning >>>>>>>>> the GSEMatrix true as following: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> gse<-getGEO("GSE2553", GSEMatrix = TRUE) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> I am getting the following error: >>>>>>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>>>>>>>> na.strings, : >>>>>>>>> line 1 did not have 8 elements >>>>>>>>> In addition: Warning messages: >>>>>>>>> 1: In if (nchar(val) == nchar(x)) return(NA) : >>>>>>>>> the condition has length > 1 and only the first element will be >>>>>>>>> used >>>>>>>>> 2: In if (nchar(val) == nchar(x)) return(NA) : >>>>>>>>> the condition has length > 1 and only the first element will be >>>>>>>>> used >>>>>>>>> >>>>>>>>> Any help? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> Thanks, Harpreet, for the report. Could you send the output of >>>>>>>> sessionInfo()? On R-devel, I am not able to reproduce the error, so >>>>>>>> it would help to have the further detail. >>>>>>>> >>>>>>>> Sean >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Harpreet Kaur Saini >>>>> Team 101, Room No. D313 >>>>> Wellcome Trust Sanger Institute >>>>> Wellcome Trust Genome Campus >>>>> Hinxton, Cambridge, CB10 1SA >>>>> United Kingdom >>>>> >>>>> >>>>> >>>>> -- >>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>> Limited, >>>>> a charity registered in England with number 1021457 and a company >>>>> registered >>>>> in England with number 2742969, whose registered office is 215 Euston >>>>> Road, >>>>> London, NW1 2BE. >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> -- >>> Harpreet Kaur Saini >>> Team 101, Room No. D313 >>> Wellcome Trust Sanger Institute >>> Wellcome Trust Genome Campus >>> Hinxton, Cambridge, CB10 1SA >>> United Kingdom >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, >>> a charity registered in England with number 1021457 and a company >>> registered >>> in England with number 2742969, whose registered office is 215 Euston >>> Road, >>> London, NW1 2BE. >>> >>> >> >> >> >> > > > -- > Harpreet Kaur Saini > Team 101, Room No. D313 > Wellcome Trust Sanger Institute > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SA > United Kingdom > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. >
0
Entering edit mode
Hi Sean, I'm trying to help Harpreet to get the GEOquery library working properly over here. Thanks to what you pointed out, we are able to track the problem down to curl using our http proxy, which for ftp transfers is not required. We still have one problem, that I can't figure how to turn off the "ftp.use.epsv" option in RCurl. So, on a linux terminal, I can use: curl --disable-epsv "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/" -r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32 GSE4201_series_matrix.txt.gz (without the --disable-epsv it times out unless I set the ftp_proxy, but then I get the HTML index instead of the file listing) inside R, I imagine I have to turn the "ftp.use.epsv" option off, and I've tried doing something like this: myCurl <- getCurlOptionsConstants() myCurl[["ftp.use.epsv"]] <- 0 getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/", .opts=list(myCurl)) but it keeps timing out... I also tried: curlSetOpt("ftp.use.epsv"=0) but that doesn't seem to have any effect on what getCurlOptionsConstants() returns, it just creates a CURLOptions object, which I can't figure out how to use. Do you have any suggestions, or should I search for help directly with the RCurl developers? Many thanks, Cei > So, this appears to be the problem. It looks like your proxy is > intercepting the ftp directory listing and converting it to HTML. I > do not know how to solve this problem, as it appears to be a proxy > configuration issue at your institution. However, I can't say for > sure. The output of the getURL() command should look like: > > >> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/") >> > [1] "-r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32 > GSE4201_series_matrix.txt.gz\n" > > Notice how yours is much longer and is HTML, not plain text. > > Sean > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
0
Entering edit mode
Ok, I just realized that the options can be passed quite easily: getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/", "ftp.use.epsv"=0) But now, we return to the original issue, how do I use this parameter to get geoGEO working, since it doesn't pass on extra parameters. Let me re-state: library(GEOquery) g<-getGEO("GSE4201",GSEMatrix=TRUE) Times out when no ftp_proxy is set (which could be solved if I was able to disable the ftp.use.epsv option of RCurl): Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : couldn't connect to host or if I use our proxy server, it gets trapped in HTML garbage: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 8 elements Which apparently cannot be worked around, I've already asked our IT department to see if they could change the proxy server settings. Any suggestions? Cei > sessionInfo() R version 2.7.0 (2008-04-22) x86_64-unknown-linux-gnu locale: C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] biomaRt_1.14.0 GEOquery_2.4.0 RCurl_0.9-3 Biobase_2.0.1 loaded via a namespace (and not attached): [1] XML_1.95-2 Cei Abreu-Goodger wrote: > Hi Sean, > > I'm trying to help Harpreet to get the GEOquery library working > properly over here. Thanks to what you pointed out, we are able to > track the problem down to curl using our http proxy, which for ftp > transfers is not required. We still have one problem, that I can't > figure how to turn off the "ftp.use.epsv" option in RCurl. So, on a > linux terminal, I can use: > > curl --disable-epsv > "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/" > -r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32 > GSE4201_series_matrix.txt.gz > > (without the --disable-epsv it times out unless I set the ftp_proxy, > but then I get the HTML index instead of the file listing) > > inside R, I imagine I have to turn the "ftp.use.epsv" option off, and > I've tried doing something like this: > > myCurl <- getCurlOptionsConstants() > myCurl[["ftp.use.epsv"]] <- 0 > getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/", > .opts=list(myCurl)) > > but it keeps timing out... > > I also tried: > > curlSetOpt("ftp.use.epsv"=0) > > but that doesn't seem to have any effect on what > getCurlOptionsConstants() returns, it just creates a CURLOptions > object, which I can't figure out how to use. > > Do you have any suggestions, or should I search for help directly with > the RCurl developers? > > Many thanks, > > Cei >> So, this appears to be the problem. It looks like your proxy is >> intercepting the ftp directory listing and converting it to HTML. I >> do not know how to solve this problem, as it appears to be a proxy >> configuration issue at your institution. However, I can't say for >> sure. The output of the getURL() command should look like: >> >> >>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/") >>> >> [1] "-r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32 >> GSE4201_series_matrix.txt.gz\n" >> >> Notice how yours is much longer and is HTML, not plain text. >> >> Sean >> >> >> > > -- Cei Abreu-Goodger, PhD Wellcome Trust Sanger Institute Computational and Functional Genomics Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
0
Entering edit mode
On Fri, Jul 11, 2008 at 11:57 AM, Cei Abreu-Goodger <cei at="" sanger.ac.uk=""> wrote: > Ok, I just realized that the options can be passed quite easily: > getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/", > "ftp.use.epsv"=0) > > But now, we return to the original issue, how do I use this parameter to get > geoGEO working, since it doesn't pass on extra parameters. > > Let me re-state: > > library(GEOquery) > g<-getGEO("GSE4201",GSEMatrix=TRUE) > > Times out when no ftp_proxy is set (which could be solved if I was able to > disable the ftp.use.epsv option of RCurl): > > Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : > couldn't connect to host > > > or if I use our proxy server, it gets trapped in HTML garbage: > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 1 did not have 8 elements > > > Which apparently cannot be worked around, I've already asked our IT > department to see if they could change the proxy server settings. > > Any suggestions? Thanks for all the work to get to this point. I'll look into what the best changes would be for GEOquery. The original idea of the getGEO() function was to maximize simplicity, but there are obviously issues that come up with passing arguments to internal functions. Sean >> sessionInfo() > R version 2.7.0 (2008-04-22) > x86_64-unknown-linux-gnu > > locale: > C > > attached base packages: > [1] stats graphics grDevices datasets tools utils methods [8] > base > other attached packages: > [1] biomaRt_1.14.0 GEOquery_2.4.0 RCurl_0.9-3 Biobase_2.0.1 > > loaded via a namespace (and not attached): > [1] XML_1.95-2 > > > > > > Cei Abreu-Goodger wrote: >> >> Hi Sean, >> >> I'm trying to help Harpreet to get the GEOquery library working properly >> over here. Thanks to what you pointed out, we are able to track the problem >> down to curl using our http proxy, which for ftp transfers is not required. >> We still have one problem, that I can't figure how to turn off the >> "ftp.use.epsv" option in RCurl. So, on a linux terminal, I can use: >> >> curl --disable-epsv >> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/" >> -r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32 >> GSE4201_series_matrix.txt.gz >> >> (without the --disable-epsv it times out unless I set the ftp_proxy, but >> then I get the HTML index instead of the file listing) >> >> inside R, I imagine I have to turn the "ftp.use.epsv" option off, and I've >> tried doing something like this: >> >> myCurl <- getCurlOptionsConstants() >> myCurl[["ftp.use.epsv"]] <- 0 >> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/", >> .opts=list(myCurl)) >> >> but it keeps timing out... >> >> I also tried: >> >> curlSetOpt("ftp.use.epsv"=0) >> >> but that doesn't seem to have any effect on what getCurlOptionsConstants() >> returns, it just creates a CURLOptions object, which I can't figure out how >> to use. >> >> Do you have any suggestions, or should I search for help directly with the >> RCurl developers? >> >> Many thanks, >> >> Cei >>> >>> So, this appears to be the problem. It looks like your proxy is >>> intercepting the ftp directory listing and converting it to HTML. I >>> do not know how to solve this problem, as it appears to be a proxy >>> configuration issue at your institution. However, I can't say for >>> sure. The output of the getURL() command should look like: >>> >>> >>>> >>>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/") >>>> >>> >>> [1] "-r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32 >>> GSE4201_series_matrix.txt.gz\n" >>> >>> Notice how yours is much longer and is HTML, not plain text. >>> >>> Sean >>> >>> >>> >> >> > > > -- > Cei Abreu-Goodger, PhD > > Wellcome Trust Sanger Institute > Computational and Functional Genomics > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SA, UK > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. >
0
Entering edit mode
@sean-davis-490
Last seen 3 days ago
United States
On Tue, Jul 22, 2008 at 10:53 AM, Florence Combes <florence.combes at="" cea.fr=""> wrote: > Dear Mister Davis, > > I am currently working as a bioinformatician in a public lab in Grenoble, > France. > > I am using the GEOquery package to download data from A. thaliana that are > stored in the GEO database. > As I would like to get all the data for A. thaliana it is quite large, and I > am interested in the 'GSEMatrix=TRUE' option, like here : > > gse.5738<-getGEO("GSE5738",GSEMatrix=TRUE) > > but I cannot see how to then access the processed data in the structure > gse.5738. I have : > > >> str(gse.5738) > List of 1 > \$ GSE5738_series_matrix.txt.gz:Formal class 'ExpressionSet' [package > "Biobase"] with 6 slots > > and all seems ok but nowhere I see values which seem to be processed > microarray data. > > Would you mind give me a hint please ? First, thanks for the interest in GEOquery. Next, I think you might want to update GEOquery to the latest version for your version of R using the installation instructions on the bioconductor website (use biocLite). Now, to answer your question, gse.5738 is a list of ExpressionSet objects. In this case, it is a list of length 1. You can get at the ExpressionSet object itself by doing: gse.5738[[1]] This is an ExpressionSet and is one of the standard data structures for microarray data in Bioconductor. There is documentation about the ExpressionSet class in the Biobase vignettes, which I encourage you to look into. Hope that helps, Sean