Question: GEOquery: getGEO() doesn\'t work (error \"invalid \'nlines\' argument\")
1
gravatar for Guest User
7.5 years ago by
Guest User12k
Guest User12k wrote:
Hi! Currently I am trying to use the library "GEOquery" to retrieve meta information (phenodata information) and the data tables of GEO samples (from GEO series without GEO datasets). I already have got the data (.soft.gz files), so I tried it the following way for example: GSE19711 <- getGEO(filename=system.file("mypath/GSE19711_family.soft.gz", package="GEOquery")) But I get the following error: Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) : invalid 'nlines' argument In addition: Warning messages: 1: In file(fname, "r") : file("") only supports open = "w+" and open = "w+b": using the former 2: In file(con, "r") : file("") only supports open = "w+" and open = "w+b": using the former 3: In file(fname, "r") : file("") only supports open = "w+" and open = "w+b": using the former I tried it on windows and linux and also with the newest version of R and GEOquery. On both machines there occurs the same error, also with other GEO accession numbers. What is going wrong and what can I do to get the information I need? When I use a path to my file which is not existing, I get the same error, but I am quite sure that I set the working directory and path to the GEO file correctly. Greets, Simone -- output of sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252 LC_MONETARY=German_Austria.1252 [4] LC_NUMERIC=C LC_TIME=German_Austria.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GEOquery_2.23.3 Biobase_2.16.0 BiocGenerics_0.2.0 BiocInstaller_1.4.4 loaded via a namespace (and not attached): [1] RCurl_1.91-1.1 tools_2.15.0 XML_3.9-4.1 > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GEOquery_2.21.9 Biobase_2.14.0 loaded via a namespace (and not attached): [1] RCurl_1.91-1 tools_2.14.1 XML_3.9-4 > traceback() 6: read.table(con, sep = "\t", header = FALSE, nrows = nseries) 5: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) 4: suppressWarnings(read.table(con, sep = "\t", header = FALSE, nrows = nseries)) 3: parseGSEMatrix(fname) 2: parseGEO(filename, GSElimits) 1: getGEO(filename = system.file("homo_sapiens/peripheral_whole_blood/ GSE19711/GSE19711_family.soft.gz", package = "GEOquery")) -- Sent via the guest posting facility at bioconductor.org.
geoquery • 6.8k views
ADD COMMENTlink modified 7.5 years ago by James W. MacDonald52k • written 7.5 years ago by Guest User12k
Answer: GEOquery: getGEO() doesn\'t work (error \"invalid \'nlines\' argument\")
0
gravatar for Sean Davis
7.5 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Tue, May 29, 2012 at 9:18 AM, Simone [guest] <guest@bioconductor.org>wrote: > > Hi! > > Currently I am trying to use the library "GEOquery" to retrieve meta > information (phenodata information) and the data tables of GEO samples > (from GEO series without GEO datasets). > > I already have got the data (.soft.gz files), so I tried it the following > way for example: > > GSE19711 <- getGEO(filename=system.file("mypath/GSE19711_family.soft.gz", > package="GEOquery")) > > Hi, Simone. The "system.file" part of your command above is not necessary (and is probably the problem). System.file is for locating files that came with a specific software package. So, you want something like: GSE19711 <- getGEO('mypath/GSE19711_family.soft.gz') Note that you will have to do a fair bit of work to get the data out of a SOFT format file. Instead, you should consider using a GSEMatrix file. Alternatively, download the raw data and use a platform-appropriate package to read in and analyze the data. Finally, note that you do not need to download files separately. You can use GEOquery to download and even make a repository of GEO files. GSE19711 <- getGEO('GSE19711',destdir='mypath') Executing the command above again will not download the file again. Sean > But I get the following error: > > Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) : > invalid 'nlines' argument > In addition: Warning messages: > 1: In file(fname, "r") : > file("") only supports open = "w+" and open = "w+b": using the former > 2: In file(con, "r") : > file("") only supports open = "w+" and open = "w+b": using the former > 3: In file(fname, "r") : > file("") only supports open = "w+" and open = "w+b": using the former > > I tried it on windows and linux and also with the newest version of R and > GEOquery. > > On both machines there occurs the same error, also with other GEO > accession numbers. > > What is going wrong and what can I do to get the information I need? > > When I use a path to my file which is not existing, I get the same error, > but I am quite sure that I set the working directory and path to the GEO > file correctly. > > Greets, > Simone > > -- output of sessionInfo(): > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252 > LC_MONETARY=German_Austria.1252 > [4] LC_NUMERIC=C LC_TIME=German_Austria.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GEOquery_2.23.3 Biobase_2.16.0 BiocGenerics_0.2.0 > BiocInstaller_1.4.4 > > loaded via a namespace (and not attached): > [1] RCurl_1.91-1.1 tools_2.15.0 XML_3.9-4.1 > > > sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 > LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GEOquery_2.21.9 Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.91-1 tools_2.14.1 XML_3.9-4 > > > traceback() > 6: read.table(con, sep = "\t", header = FALSE, nrows = nseries) > 5: withCallingHandlers(expr, warning = function(w) > invokeRestart("muffleWarning")) > 4: suppressWarnings(read.table(con, sep = "\t", header = FALSE, > nrows = nseries)) > 3: parseGSEMatrix(fname) > 2: parseGEO(filename, GSElimits) > 1: getGEO(filename = > system.file("homo_sapiens/peripheral_whole_blood/GSE19711/GSE19711_f amily.soft.gz", > package = "GEOquery")) > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 7.5 years ago by Sean Davis21k
Hi Sean, > The "system.file" part of your command above is not necessary (and is > probably the problem). System.file is for locating files that came > with a specific software package. So, you want something like: > > GSE19711 <- getGEO('mypath/GSE19711_family.soft.gz') This works! Thanks a lot! > Note that you will have to do a fair bit of work to get the data out > of a SOFT format file. Instead, you should consider using a GSEMatrix > file. Alternatively, download the raw data and use a > platform-appropriate package to read in and analyze the data. > Finally, note that you do not need to download files separately. Well, my problem is that I am not quite sure about the "best" way to get the data I need. I'll try to give an example: We have the GEO Series GSE19711. For all the samples of this series, I need some specific information. Let's use the first sample of GSE19711 as an example: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM491937 I need to know the age of the patient ("ageatdiagnosis", if it is a pre- or a post-treatment sample, and the sex of the patient (in this case all samples are from women) and maybe some other information (in case of other series). And of course, I need the data matrix itself, to be finally able to create something similar to an ExpressionSet, but using the methylumi package, because all this is about methylation and not gene expression. I have to deal with several thousand samples from many different GEO series, therefore I want to automate the fetching of the phenodata information of the patients. Searching for a solution to do this, I found the GEOquery package and I thought it would be the best way to deal with the soft-Files because these files are available for all series I want to analyze, and they contain all information available, I thought (so far I worked only with expression data where I used RAW files, but there were always also phenodata files available, so it was a lot easier). If you can think of any better way to get the data I need and to annotate the sample <-> phenodata information in an easy way, please tell me, I would be very happy. Simone
ADD REPLYlink written 7.5 years ago by ecsi@gmx.net70
Answer: GEOquery: getGEO() doesn\'t work (error \"invalid \'nlines\' argument\")
0
gravatar for James W. MacDonald
7.5 years ago by
United States
James W. MacDonald52k wrote:
Hi Simone, On 5/29/2012 9:18 AM, Simone [guest] wrote: > Hi! > > Currently I am trying to use the library "GEOquery" to retrieve meta information (phenodata information) and the data tables of GEO samples (from GEO series without GEO datasets). > > I already have got the data (.soft.gz files), so I tried it the following way for example: > > GSE19711<- getGEO(filename=system.file("mypath/GSE19711_family.soft.gz", package="GEOquery")) Why are you using system.file() in this context? Did you really download the soft file to your GEOquery library directory? That seems odd to me. The default is to download to a tempdir, which is likely to be something like C:\Users\<yourusername>\AppData\Local\Temp\<sometmpdir> or did you download somewhere else? Where does mypath point? Best, Jim > > But I get the following error: > > Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) : > invalid 'nlines' argument > In addition: Warning messages: > 1: In file(fname, "r") : > file("") only supports open = "w+" and open = "w+b": using the former > 2: In file(con, "r") : > file("") only supports open = "w+" and open = "w+b": using the former > 3: In file(fname, "r") : > file("") only supports open = "w+" and open = "w+b": using the former > > I tried it on windows and linux and also with the newest version of R and GEOquery. > > On both machines there occurs the same error, also with other GEO accession numbers. > > What is going wrong and what can I do to get the information I need? > > When I use a path to my file which is not existing, I get the same error, but I am quite sure that I set the working directory and path to the GEO file correctly. > > Greets, > Simone > > -- output of sessionInfo(): > >> sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252 LC_MONETARY=German_Austria.1252 > [4] LC_NUMERIC=C LC_TIME=German_Austria.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GEOquery_2.23.3 Biobase_2.16.0 BiocGenerics_0.2.0 BiocInstaller_1.4.4 > > loaded via a namespace (and not attached): > [1] RCurl_1.91-1.1 tools_2.15.0 XML_3.9-4.1 > >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GEOquery_2.21.9 Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.91-1 tools_2.14.1 XML_3.9-4 > >> traceback() > 6: read.table(con, sep = "\t", header = FALSE, nrows = nseries) > 5: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) > 4: suppressWarnings(read.table(con, sep = "\t", header = FALSE, > nrows = nseries)) > 3: parseGSEMatrix(fname) > 2: parseGEO(filename, GSElimits) > 1: getGEO(filename = system.file("homo_sapiens/peripheral_whole_bloo d/GSE19711/GSE19711_family.soft.gz", > package = "GEOquery")) > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENTlink written 7.5 years ago by James W. MacDonald52k
Hi Jim, > Why are you using system.file() in this context? Because there is an example in the GEOquery vignette ("2 Getting Started using GEOquery") which does it like this. > Did you really download the soft file to your GEOquery library > directory? That seems odd to me. I downloaded it to a local data repository in our network (it is obligatory to do it this way in this case). Why does it seem odd to you? Because I downloaded the soft file? This was a recommendation of a colleague who works a lot with GEO, we thought the soft files would be the best option because they contain all the information available and furthermore they are available for all the GEO series I have analyze. As I already wrote in reply to the answer of Sean, if there is any better way to do it, I will be happy to hear about it! Best, Simone
ADD REPLYlink written 7.5 years ago by ecsi@gmx.net70
Hi Simone, On 5/29/2012 10:25 AM, ecsi at gmx.net wrote: > Hi Jim, > >> Why are you using system.file() in this context? > > Because there is an example in the GEOquery vignette ("2 Getting > Started using GEOquery") which does it like this. I see. That is one of the downsides of the vignette system - in order to have a vignette work correctly, using some external data, those data have to be parked somewhere in the package directory. An alternative would be to have a separate data package, but that means end users have to download one additional thing. So the reason the vignette uses that paradigm is because the data being used are in the package directory. However, as you note below, you _haven't_ downloaded data to the package directory, so system.file() isn't the way to go. In other words, system.file() is only designed to help people easily detect where a given install of R has its package directory - it is not intended for reading files in general. > >> Did you really download the soft file to your GEOquery library >> directory? That seems odd to me. > > I downloaded it to a local data repository in our network (it is > obligatory to do it this way in this case). > > Why does it seem odd to you? Because I downloaded the soft file? No, not that you downloaded the file, what seemed odd was that you were using system.file(), which implies that you had downloaded the soft file to a very specific place. Let me give you an example: On my Linux box > system.file(package="GEOquery") [1] "/misc/staff/jmacdon/R-devel/library/GEOquery" On my Windows box > system.file(package="GEOquery") [1] "C:/Users/bioinf_admin/R/win-library/2.14/GEOquery" So when you use system.file() you are specifically telling GEOquery to look for a file that is in your GEOquery library directory, rather than telling GEOquery the actual directory. That is what Sean was getting at in his response to you. > This was a recommendation of a colleague who works a lot with GEO, we > thought the soft files would be the best option because they contain > all the information available and furthermore they are available for > all the GEO series I have analyze. As I already wrote in reply to the > answer of Sean, if there is any better way to do it, I will be happy > to hear about it! Sean already gave it to you. To further elaborate: > mypath <- "C:/Users/bioinf_admin/Desktop/" > GSE19711 <- getGEO('GSE19711',destdir=mypath) This will result in a list of ExpressionSets > length(GSE19711) [1] 3 > GSE19711[[1]] ExpressionSet (storageMode: lockedEnvironment) assayData: 27578 features, 255 samples element names: exprs protocolData: none phenoData sampleNames: GSM491937 GSM491938 ... GSM492191 (255 total) varLabels: title geo_accession ... data_row_count (44 total) varMetadata: labelDescription featureData featureNames: cg00000292 cg00002426 ... cg27665659 (27578 total) fvarLabels: ID Name ... ORF (38 total) fvarMetadata: Column Description labelDescription experimentData: use 'experimentData(object)' Annotation: GPL8490 I doubt you will be able to automate too much of this, as the phenoData slots for these ExpressionSets can contain whatever the experimenter thought was interesting, in addition to what is required by GEO: > names(pData(phenoData(GSE19711[[1]]))) [1] "title" "geo_accession" [3] "status" "submission_date" [5] "last_update_date" "type" [7] "channel_count" "source_name_ch1" [9] "organism_ch1" "characteristics_ch1" [11] "characteristics_ch1.1" "characteristics_ch1.2" [13] "characteristics_ch1.3" "characteristics_ch1.4" [15] "characteristics_ch1.5" "characteristics_ch1.6" [17] "characteristics_ch1.7" "characteristics_ch1.8" [19] "characteristics_ch1.9" "characteristics_ch1.10" [21] "characteristics_ch1.11" "characteristics_ch1.12" [23] "characteristics_ch1.13" "molecule_ch1" [25] "extract_protocol_ch1" "label_ch1" [27] "label_protocol_ch1" "taxid_ch1" [29] "hyb_protocol" "scan_protocol" [31] "description" "data_processing" [33] "platform_id" "contact_name" [35] "contact_email" "contact_phone" [37] "contact_department" "contact_institute" [39] "contact_address" "contact_city" [41] "contact_zip/postal_code" "contact_country" [43] "supplementary_file" "data_row_count" And we can then see what the characteristics are: > head(pData(phenoData(GSE19711[[1]])), 2)[,11:23] characteristics_ch1.1 characteristics_ch1.2 GSM491937 agegroupatsampledraw: 65 to 70 ageatrecruitment: 68 GSM491938 agegroupatsampledraw: Over 75 ageatrecruitment: 81 characteristics_ch1.3 characteristics_ch1.4 characteristics_ch1.5 GSM491937 ageatdiagnosis: 68 histology: Endometrioid stage: Ic GSM491938 ageatdiagnosis: 80 histology: Carcinosarcoma stage: IIIb characteristics_ch1.6 characteristics_ch1.7 GSM491937 grade: Grade 2 pre-treatment sample: Yes GSM491938 grade: Grade 3 pre-treatment sample: No characteristics_ch1.8 characteristics_ch1.9 GSM491937 post-treatment sample: No ca125: 1717 GSM491938 post-treatment sample: Yes ca125: 32.89 characteristics_ch1.10 characteristics_ch1.11 GSM491937 batch: 1 beadchip_well: 4447820175_A GSM491938 batch: 1 beadchip_well: 4447820175_B characteristics_ch1.12 characteristics_ch1.13 GSM491937 bs conversion c1: Grn 5706 bs conversion c2: Grn 5538 GSM491938 bs conversion c1: Grn 6861 bs conversion c2: Grn 6141 Does that help? Best, Jim > > Best, > Simone > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLYlink written 7.5 years ago by James W. MacDonald52k
> So when you use system.file() you are specifically telling GEOquery to > look for a file that is in your GEOquery library directory, rather > than telling GEOquery the actual directory. Thank you for explaining the system.file() thing, I didn't know that this was about the package repository. I thought it would be necessary to be able to access the downloaded files, but now I understand what's happening. > > mypath <- "C:/Users/bioinf_admin/Desktop/" > > GSE19711 <- getGEO('GSE19711',destdir=mypath) > > This will result in a list of ExpressionSets The problem is, that here I work with methylation data, so have to create MethyLumiSets instead of ExpressionSets. My idea was to create phenodata.txt files using the data I get from getGEO(): > GSE19711 <- getGEO(filename="mypath/GSE19711_family.soft.gz") (Btw, I always get warnings when doing this, but it seems to work anyway: > warnings() Warning messages: 1: In readLines(con, n = chunksize) : seek on a gzfile connection returned an internal error ...) And then accessing the information with some code like this for example: > Meta(GSMList(GSE19711)[[1]])$characteristics_ch1[3] [1] "ageatrecruitment: 68" And extract the relevant substrings and create a data.frame with all the information I need (age, sex, treatment, etc.). And all this in an apply function for every GSE or something like this. Furthermore getting the data matrices from the soft files as well and finally creating MethyLumiSets out of the data matrices and the phenodata.txt files I created. Maybe it would be better to first create ExpressionSets and convert them into MethyLumiSets somehow, but I would have to manipulate the objects anyway, because I can't use the phenodata information as it comes from GEO in these cases. I need the phenodata to be the same style for all the GEO sets I have to analyze, so in any case I'll have to do the work to extract (only) the information I need for the different GEO sets. But I'm still not quite sure about the best way to create the MethyLumiSets efficiently ... Best, Simone
ADD REPLYlink written 7.5 years ago by ecsi@gmx.net70
On Tue, May 29, 2012 at 11:45 AM, <ecsi@gmx.net> wrote: > > So when you use system.file() you are specifically telling GEOquery to >> look for a file that is in your GEOquery library directory, rather than >> telling GEOquery the actual directory. >> > > Thank you for explaining the system.file() thing, I didn't know that this > was about the package repository. I thought it would be necessary to be > able to access the downloaded files, but now I understand what's happening. > > > > mypath <- "C:/Users/bioinf_admin/**Desktop/" >> > GSE19711 <- getGEO('GSE19711',destdir=**mypath) >> >> This will result in a list of ExpressionSets >> > > The problem is, that here I work with methylation data, so have to create > MethyLumiSets instead of ExpressionSets. > > My idea was to create phenodata.txt files using the data I get from > getGEO(): > > > GSE19711 <- getGEO(filename="mypath/**GSE19711_family.soft.gz") > > (Btw, I always get warnings when doing this, but it seems to work anyway: > > warnings() > Warning messages: > 1: In readLines(con, n = chunksize) : > seek on a gzfile connection returned an internal error > ...) > > And then accessing the information with some code like this for example: > > > Meta(GSMList(GSE19711)[[1]])$**characteristics_ch1[3] > > [1] "ageatrecruitment: 68" > > > And extract the relevant substrings and create a data.frame with all the > information I need (age, sex, treatment, etc.). And all this in an apply > function for every GSE or something like this. Furthermore getting the data > matrices from the soft files as well and finally creating MethyLumiSets out > of the data matrices and the phenodata.txt files I created. > > Maybe it would be better to first create ExpressionSets and convert them > into MethyLumiSets somehow, but I would have to manipulate the objects > anyway, because I can't use the phenodata information as it comes from GEO > in these cases. I need the phenodata to be the same style for all the GEO > sets I have to analyze, so in any case I'll have to do the work to extract > (only) the information I need for the different GEO sets. > > But I'm still not quite sure about the best way to create the > MethyLumiSets efficiently ... > > >From what you have described so far, an ExpressionSet will suffice. I don't think there is a need for a MethyLumiSet since you describe simply getting the normalized data from the GSE. I'd suggest going the route that Jim outlined using GSEMatrix files and moving forward from there. If you can fill in details of what downstream analysis you want to do with the data, perhaps we can be more directive on that point. Sean [[alternative HTML version deleted]]
ADD REPLYlink written 7.5 years ago by Sean Davis21k
Hi Sean, > From what you have described so far, an ExpressionSet will suffice. I > don't think there is a need for a MethyLumiSet since you describe > simply getting the normalized data from the GSE. Well, I am still not really sure. Initially I wanted to download the raw data whereever available. Then we decided to use the soft files for all GEO series to make the task a bit easier, and because I was told that it should be okay to use already normalized data sets in case of methylation data. However, this is not really a Bioconductor question and maybe I should talk to my supervisor again about this ... it is the first time I work with methylation data ... > I'd suggest going the route that Jim outlined using GSEMatrix files > and moving forward from there. I tried this, but then some files get downloaded to another directory and not to the one I have got to use. All data have to be downloaded to a mapped network drive and must not be downloaded to C:/ (see my reply to Jims message). > If you can fill in details of what downstream analysis you want to do > with the data, perhaps we can be more directive on that point. Well, so far our general plan is to conduct region-based differential methylation analysis, but also to investigate for example differential methylation of enhancers with known long distance chromosomal interactions, the correlation/relationship to gene expression data and many other things. But I think it won't be a problem to later get the data from Expressionsets and put it into MethyLumiSets or whatever necessary. Best, Simone
ADD REPLYlink written 7.5 years ago by ecsi@gmx.net70
> Well, I am still not really sure. Initially I wanted to download the > raw data whereever available. Then we decided to use the soft files > for all GEO series to make the task a bit easier, and because I was > told that it should be okay to use already normalized data sets in > case of methylation data. However, this is not really a Bioconductor > question and maybe I should talk to my supervisor again about this ... > it is the first time I work with methylation data ... Okay, we've talked it over again and things changed. I need the raw data. So maybe the best way would be to create ExpressionSets with getGEO() to have all the sample meta information, manipulate the phenodata of the ExpressionSets the way I need them, download the RAW data with getGEOSuppFiles(), normalize it the way we want to do it and then put these data into the ExpressionSets I have got. Still, I have the problem that not all files get downloaded to the directory I set for destdir. The series_matrix files are downloaded to the correct directory, but some other files always land on the Temp directory in C:/
ADD REPLYlink written 7.5 years ago by ecsi@gmx.net70
Good morning, > > mypath <- "C:/Users/bioinf_admin/Desktop/" > > GSE19711 <- getGEO('GSE19711',destdir=mypath) I tried this, changing mypath to a mapped network drive directory (Z:/). Generally it worked, but files were downloaded to a directory at my local hard disk (C:/) which is not desired (see below, sorry for the german, R sometimes gives german errors when using german locale settings at system level). How can I tell GEOquery that all files must be downloaded to the directory given in "mypath"? > GSE19711<- getGEO("GSE19711", destdir=mypath) Found 3 file(s) GSE19711_series_matrix-1.txt.gz versuche URL 'ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/SeriesMatrix/GSE 19711/GSE19711_series_matrix-1.txt.gz' using Synchronous WinInet calls URL ge?ffnet downloaded 27.9 Mb File stored at: C:\Users\Myuser\AppData\Local\Temp\RtmpOWBTrt/GPL8490.soft GSE19711_series_matrix-2.txt.gz versuche URL 'ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/SeriesMatrix/GSE 19711/GSE19711_series_matrix-2.txt.gz' using Synchronous WinInet calls URL ge?ffnet downloaded 27.8 Mb Using locally cached version of GPL8490 found here: C:\Users\Myuser\AppData\Local\Temp\RtmpOWBTrt/GPL8490.soft GSE19711_series_matrix-3.txt.gz versuche URL 'ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/SeriesMatrix/GSE 19711/GSE19711_series_matrix-3.txt.gz' using Synchronous WinInet calls URL ge?ffnet downloaded 3.5 Mb Using locally cached version of GPL8490 found here: C:\Users\Myuser\AppData\Local\Temp\RtmpOWBTrt/GPL8490.soft Warnmeldung: In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : heruntergeladene L?nge 23108257 != angegebener L?nge 200 Best, Simone
ADD REPLYlink written 7.5 years ago by ecsi@gmx.net70
On Wed, May 30, 2012 at 5:37 AM, <ecsi@gmx.net> wrote: > Good morning, > > > mypath <- "C:/Users/bioinf_admin/**Desktop/" >> > GSE19711 <- getGEO('GSE19711',destdir=**mypath) >> > > I tried this, changing mypath to a mapped network drive directory (Z:/). > Generally it worked, but files were downloaded to a directory at my local > hard disk (C:/) which is not desired (see below, sorry for the german, R > sometimes gives german errors when using german locale settings at system > level). How can I tell GEOquery that all files must be downloaded to the > directory given in "mypath"? > > GSE19711<- getGEO("GSE19711", destdir=mypath) >> > Found 3 file(s) > GSE19711_series_matrix-1.txt.**gz > versuche URL 'ftp://ftp.ncbi.nlm.nih.gov/**pub/geo/DATA/SeriesMatrix/** > GSE19711/GSE19711_series_**matrix-1.txt.gz<ftp: ftp.ncbi.nlm.nih.go="" v="" pub="" geo="" data="" seriesmatrix="" gse19711="" gse19711_series_matrix-1.txt.gz=""> > ' > using Synchronous WinInet calls > URL geöffnet > downloaded 27.9 Mb > > File stored at: > C:\Users\Myuser\AppData\Local\**Temp\RtmpOWBTrt/GPL8490.soft > Sorry, Simone. That is a bug. It has been fixed in GEOquery and will be available as a new version in a day or so. Sean > GSE19711_series_matrix-2.txt.**gz > versuche URL 'ftp://ftp.ncbi.nlm.nih.gov/**pub/geo/DATA/SeriesMatrix/** > GSE19711/GSE19711_series_**matrix-2.txt.gz<ftp: ftp.ncbi.nlm.nih.go="" v="" pub="" geo="" data="" seriesmatrix="" gse19711="" gse19711_series_matrix-2.txt.gz=""> > ' > using Synchronous WinInet calls > URL geöffnet > downloaded 27.8 Mb > > Using locally cached version of GPL8490 found here: > C:\Users\Myuser\AppData\Local\**Temp\RtmpOWBTrt/GPL8490.soft > GSE19711_series_matrix-3.txt.**gz > versuche URL 'ftp://ftp.ncbi.nlm.nih.gov/**pub/geo/DATA/SeriesMatrix/** > GSE19711/GSE19711_series_**matrix-3.txt.gz<ftp: ftp.ncbi.nlm.nih.go="" v="" pub="" geo="" data="" seriesmatrix="" gse19711="" gse19711_series_matrix-3.txt.gz=""> > ' > using Synchronous WinInet calls > URL geöffnet > downloaded 3.5 Mb > > Using locally cached version of GPL8490 found here: > C:\Users\Myuser\AppData\Local\**Temp\RtmpOWBTrt/GPL8490.soft > Warnmeldung: > In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = > getOption("download.file.**method.GEOquery")) : > heruntergeladene Länge 23108257 != angegebener Länge 200 > > > Best, > Simone > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLYlink written 7.5 years ago by Sean Davis21k
> > File stored at: > C:\Users\Myuser\AppData\Local\Temp\RtmpOWBTrt/GPL8490.soft > > > Sorry, Simone. That is a bug. It has been fixed in GEOquery and will > be available as a new version in a day or so. Great, I will wait for the new version, thanks a lot! Simone
ADD REPLYlink written 7.5 years ago by ecsi@gmx.net70
Hello Sean,
I am having the same problem using getGEO().
I made query and i am downloading the GSE ids of my query.
most of my query are downloadable without error/interruption but few have problem of incomplete downloads and the error message is same as simone's.
the datasets id giving me the error is GSE8650( the first platform is alright but the second(GPL97) is the one problematic).#
the code is 

f.gse.ids=unique(f.results$gse_ids)
for (id in f.gse.ids){
gse <- getGEO(id, destdir = "/home/uwakah/project_test_1/data_script/DISEASED/GEO_ARRAY_DISEASED",GSEMatrix = TRUE)}}

my sessionInfo()
-------------------
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

locale:
[1] C

attached base packages:
[1] parallel  stats4    methods   stats     graphics  grDevices utils
[8] datasets  base

other attached packages:
 [1] GOstats_2.34.0       graph_1.46.0         Category_2.34.2
 [4] Matrix_1.2-3         globaltest_5.22.0    GO.db_3.1.2
 [7] hgu133a.db_3.1.3     annotate_1.46.1      XML_3.98-1.3
[10] hgu133plus2.db_3.1.3 limma_3.24.15        simpleaffy_2.44.0
[13] gcrma_2.40.0         genefilter_1.50.0    affy_1.46.1
[16] GEOmetadb_1.28.0     GEOquery_2.34.0      org.Mm.eg.db_3.1.2
[19] org.Hs.eg.db_3.1.2   RSQLite_1.0.0        DBI_0.3.1
[22] AnnotationDbi_1.30.1 GenomeInfoDb_1.4.3   IRanges_2.2.9
[25] S4Vectors_0.6.6      Biobase_2.28.0       BiocGenerics_0.14.0
[28] R.utils_2.2.0        R.oo_1.19.0          R.methodsS3_1.7.0
[31] BiocInstaller_1.18.5

loaded via a namespace (and not attached):
 [1] XVector_0.8.0          bitops_1.0-6           tools_3.2.2
 [4] zlibbioc_1.14.0        lattice_0.20-33        preprocessCore_1.30.0
 [7] Biostrings_2.36.4      grid_3.2.2             GSEABase_1.30.2
[10] RBGL_1.44.0            survival_2.38-3        splines_3.2.2
[13] AnnotationForge_1.10.1 xtable_1.8-2           RCurl_1.95-4.7
[16] affyio_1.36.0

Error message
-------------
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) :
invalid 'nlines' argument

I do ask if this bug has been fixed or help, thanks

Regards

Innocentia

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by i.eberechi0
1

Try removing the cached GPL97 file and then try again.  That bug was fixed several years ago.

 

ADD REPLYlink written 3.8 years ago by Sean Davis21k

Hi Sean,

Thanks alot for your response, in my case the source of the problem was space. there wasnt sufficient space, so when i deleted some heavy docs and had space, i ran the script again and it continued.

 

ADD REPLYlink written 3.8 years ago by i.eberechi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 222 users visited in the last hour