Hi,
I am re-downloading a GSE Series I have worked with for several years, and suddenly encountered error in my code I have been using without issue for also several years. As it turns out the probset information for the GPL3921 platform is not downloading properly. I am not sure if this is GEOquery download issue or the GPL soft file on GEO is corrupt (it was last updated on the GEO website 8/12/16).
I have tried to download several times (deleting the cached copy each time), and either no probeset information is downloaded, OR a partial probset information is downloaded leaving the rest NA.
This gse has 2 datasets attached to it, there is no problem with the second gpl, GPL4685
Case #1: No probeset info downloaded, see featureData: none, and warning error
> gse50444 <- getGEO('gse50444', GSEMatrix = TRUE)
Download warning produced:
Warning message:
In readLines(con, 1) :
incomplete final line found on '/var/folders/4z/w7_jy74n1nx7sf4hdcn4h4tm0000gn/T//RtmpWDrYrH/GPL3921.soft'
No featureData associated with Exression Set:
> gse50444.gpl3921 <- gse50444[[1]]
> gse50444.gpl3921
ExpressionSet (storageMode: lockedEnvironment) assayData: 22277 features, 13 samples element names: exprs protocolData: none phenoData sampleNames: GSM1219374 GSM1219375 ... GSM1219390 (13 total) varLabels: title geo_accession ... data_row_count (35 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: GPL3921
Case #2: No download warning error produced, but only partial probeset downloaded (different times produced different partial downloads: e.g. the first 53 probes, the first 10 probes etc), NA's introduced for remainder.
> gse50444 <- getGEO('gse50444', GSEMatrix = TRUE) > gse50444.gpl3921 <- gse50444[[1]] > gse50444.gpl4685 <- gse50444[[2]]
ExpressionSet (storageMode: lockedEnvironment) assayData: 22277 features, 13 samples element names: exprs protocolData: none phenoData sampleNames: GSM1219374 GSM1219375 ... GSM1219390 (13 total) varLabels: title geo_accession ... data_row_count (35 total) varMetadata: labelDescription featureData featureNames: 1007_s_at 1053_at ... NA.22263 (22277 total) fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total) fvarMetadata: Column Description labelDescription experimentData: use 'experimentData(object)' Annotation: GPL3921 Partial information for probeset downloaded > featureData(gse50444.gpl3921)$ID[1:50] [1] 1007_s_at 1053_at 117_at 121_at 1255_g_at 1294_at 1316_at 1320_at [9] 1405_i_at 1431_at 1438_at 1487_at 1494_f_at <NA> <NA> <NA> [17] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> [25] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> [33] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> [41] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> [49] <NA> <NA> > exprs(gse50444.gpl3921)[1:25, 1:5] GSM1219374 GSM1219375 GSM1219376 GSM1219377 GSM1219382 1007_s_at 0.41385892 -0.63996374 -1.73610412 -1.57517607 -1.07331220 1053_at -0.12878988 -1.00403804 -0.73231870 -1.16391881 -0.77008163 117_at -1.26070358 1.26937376 1.00263645 0.95988674 0.85146114 121_at 1.77211067 1.66987587 -0.98424704 -1.59976481 -0.50925136 1255_g_at 2.10582626 0.12738699 -0.12883987 -0.31430216 0.76752995 1294_at -1.35346212 -1.54300280 -1.51740293 -0.78928326 0.08942978 1316_at 0.64331735 0.88807124 0.72578215 0.57944596 -2.87895589 1320_at -2.48933703 -0.68146255 0.04693923 -1.29023574 1.42692626 1405_i_at -1.24342923 -0.98998659 -0.76558138 -0.89682371 1.56075510 1431_at 1.11565231 0.19878199 0.82711198 0.26595467 0.34545898 1438_at 1.39958352 1.15349628 1.25051712 1.27550538 -0.99547150 1487_at 0.07843472 0.85933824 1.45393915 1.96911334 -2.01832498 1494_f_at 0.46256203 -1.68187670 0.68478356 0.57685428 0.16596064 NA -0.11693547 0.43283197 -1.72592572 -1.81656636 0.59357536 NA.1 2.11614697 2.09774366 -0.46119476 -0.67307001 0.17831501 NA.2 -0.87134590 -0.01708055 -2.50694865 -1.15106562 -0.04434486 NA.3 -0.40180877 -0.43061066 -0.81721768 -0.99795365 -0.25247241 NA.4 1.14204017 -0.75941455 -1.89897048 -1.57962924 1.43090640 NA.5 0.37910692 0.12075928 -0.05456004 0.97155238 -0.64881493 NA.6 1.43481442 0.10490435 2.01492490 1.28410018 -0.17900697 NA.7 1.43346276 0.52050705 1.46460961 1.89953704 -0.96485518 NA.8 1.78761277 0.91176840 -0.07419414 0.03137913 1.46269737 NA.9 0.58698465 0.78548994 1.77385158 1.52612317 -1.36141323 NA.10 0.98628179 0.81391549 1.72908608 1.69564966 -1.22285385 NA.11 0.03509782 0.43000220 -2.50945545 -1.18538264 0.38360561
Session Info is below.
Thank you.
R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel grid stats graphics grDevices utils datasets [9] methods base other attached packages: [1] HGNChelper_0.3.1 org.Hs.eg.db_3.3.0 AnnotationDbi_1.34.4 IRanges_2.6.1 [5] S4Vectors_0.10.2 lumi_2.24.0 mclust_5.2 limma_3.28.17 [9] affy_1.50.0 cluster_2.0.4 reshape_0.8.5 ggplot2_2.1.0 [13] gplots_3.0.1 RColorBrewer_1.1-2 GEOquery_2.38.4 Biobase_2.32.0 [17] BiocGenerics_0.18.0 BiocInstaller_1.22.3 gridExtra_2.2.1 dendextend_1.2.0