getGEO invalid multibyte string
3
0
Entering edit mode
Tobias Straub ▴ 430
@tobias-straub-2182
Last seen 10.2 years ago
I tried to fetch and parse the GSE94 series from GEO using GEOquery library (gse<-getGEO('GSE94')). Operation is aborted with the message: Error in make.names(as.character(names), allow_) : invalid multibyte string 29 In addition: There were 12 warnings (use warnings() to see them) is that an error of getGEO or a problem of the data set? Tobias > sessionInfo() R version 2.5.1 (2007-06-27) i386-apple-darwin8.9.1 locale: en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" other attached packages: GEOquery "2.0.6" ====================================================================== Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany
• 2.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
Hi Tobias -- Very clever fix! I think the problem would also be solved by setting your 'locale' to a non-UTF-8., e.g., > Sys.setlocale(locale="C") (you can revert to previous settings with locale="en_US.UTF-8") I think also that there is a little mis-match between how clever R is about locales, and how clever developers like us are about locales -- I would recommend, at least at the current moment in time, setting the locale to "C" unless UTF-8 encoding is needed. Martin Tobias Straub <tstraub at="" med.uni-muenchen.de=""> writes: > ok, I figured it out myself! modified GEOquery > > *** GEOquery Fri Aug 10 08:24:53 2007 > --- GEOquery_new Fri Aug 31 14:43:15 2007 > *************** > *** 495,500 **** > --- 495,501 ---- > nextEntity <- "" > while(!finished) { > line <- readLines(con,1) > + line <- iconv(line, "LATIN2", "UTF-8") > if(length(line)==0) finished <- TRUE > a[lines] <- line > lines <- lines+1 > *************** > *** 510,515 **** > --- 511,517 ---- > finished <- FALSE > while(!finished) { > line <- readLines(con,1) > + line <- iconv(line, "LATIN2", "UTF-8") > if(length(line)==0) { > finished <- TRUE > } else { > > > On Aug 31, 2007, at 2:17 PM, Tobias Straub wrote: > >> I tried to fetch and parse the GSE94 series from GEO using GEOquery >> library (gse<-getGEO('GSE94')). Operation is aborted with the message: >> >> Error in make.names(as.character(names), allow_) : >> invalid multibyte string 29 >> In addition: There were 12 warnings (use warnings() to see them) >> >> is that an error of getGEO or a problem of the data set? >> >> Tobias >> >>> sessionInfo() >> R version 2.5.1 (2007-06-27) >> i386-apple-darwin8.9.1 >> >> locale: >> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] "stats" "graphics" "grDevices" "utils" "datasets" >> "methods" "base" >> >> other attached packages: >> GEOquery >> "2.0.6" >> >> >> ====================================================================== >> Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology >> tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/ >> gmane.science.biology.informatics.conductor > > ====================================================================== > Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology > tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org
ADD COMMENT
0
Entering edit mode
@saroj-mohapatra-1446
Last seen 10.2 years ago
Hi Tobias: I could not reproduce the error with R 2.5.0 and GEOquery 2.0.5 > library(GEOquery) > gse<-getGEO('GSE94') trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/by_series/GSE94/GSE94_family .soft.gz' ftp data connection made, file length 100293949 bytes opened URL downloaded 97943Kb File stored at: C:\DOCUME~1\smohapat\LOCALS~1\Temp\RtmpabhBsb/GSE94.soft.gz Parsing.... ^PLATFORM = GPL218 ^SAMPLE = GSM2628 ^SAMPLE = GSM2629 ^SAMPLE = GSM2630 ^SAMPLE = GSM3514 ^SAMPLE = GSM3515 ^SAMPLE = GSM3516 ^SAMPLE = GSM3517 ^SAMPLE = GSM3518 ^SAMPLE = GSM3519 ^SAMPLE = GSM3520 ^SAMPLE = GSM3521 ^SAMPLE = GSM3522 ^SAMPLE = GSM3523 ^SAMPLE = GSM3524 ^SAMPLE = GSM3525 ^SAMPLE = GSM3526 ^SAMPLE = GSM3527 ^SAMPLE = GSM3528 ^SAMPLE = GSM3529 ^SAMPLE = GSM3530 ^SAMPLE = GSM3531 ^SAMPLE = GSM3532 ^SAMPLE = GSM3533 ^SAMPLE = GSM3534 ^SAMPLE = GSM3535 ^SAMPLE = GSM3536 ^SAMPLE = GSM3537 ^SAMPLE = GSM3538 ^SAMPLE = GSM3539 ^SAMPLE = GSM3540 ^SAMPLE = GSM3541 ^SAMPLE = GSM3542 ^SAMPLE = GSM3543 ^SAMPLE = GSM3544 ^SAMPLE = GSM3545 ^SAMPLE = GSM3546 ^SAMPLE = GSM3547 ^SAMPLE = GSM3548 ^SAMPLE = GSM3549 ^SAMPLE = GSM3550 ^SAMPLE = GSM3551 ^SAMPLE = GSM3552 ^SAMPLE = GSM3553 ^SAMPLE = GSM3554 ^SAMPLE = GSM3555 ^SAMPLE = GSM3556 ^SAMPLE = GSM3557 ^SAMPLE = GSM3558 ^SAMPLE = GSM3559 ^SAMPLE = GSM3560 ^SAMPLE = GSM3561 ^SAMPLE = GSM3562 ^SAMPLE = GSM3563 ^SAMPLE = GSM3564 ^SAMPLE = GSM3565 ^SAMPLE = GSM3566 ^SAMPLE = GSM3567 ^SAMPLE = GSM3568 ^SAMPLE = GSM3569 ^SAMPLE = GSM3570 ^SAMPLE = GSM3571 ^SAMPLE = GSM3572 ^SAMPLE = GSM3573 ^SAMPLE = GSM3574 ^SAMPLE = GSM3575 ^SAMPLE = GSM3576 ^SAMPLE = GSM3577 ^SAMPLE = GSM3578 ^SAMPLE = GSM3579 ^SAMPLE = GSM3580 ^SAMPLE = GSM3581 ^SAMPLE = GSM3582 ^SAMPLE = GSM3583 ^SAMPLE = GSM3584 ^SAMPLE = GSM3585 ^SAMPLE = GSM3586 ^SAMPLE = GSM3587 ^SAMPLE = GSM3588 ^SAMPLE = GSM3589 ^SAMPLE = GSM3590 ^SAMPLE = GSM3591 ^SAMPLE = GSM3592 ^SAMPLE = GSM3593 ^SAMPLE = GSM3594 ^SAMPLE = GSM3595 ^SAMPLE = GSM3596 ^SAMPLE = GSM3597 ^SAMPLE = GSM3598 ^SAMPLE = GSM3599 ^SAMPLE = GSM3600 ^SAMPLE = GSM3601 ^SAMPLE = GSM3602 ^SAMPLE = GSM3603 ^SAMPLE = GSM3604 ^SAMPLE = GSM3605 ^SAMPLE = GSM3606 ^SAMPLE = GSM3607 ^SAMPLE = GSM3608 ^SAMPLE = GSM3609 ^SAMPLE = GSM3610 ^SAMPLE = GSM3611 ^SAMPLE = GSM3612 ^SAMPLE = GSM3613 ^SAMPLE = GSM3614 ^SAMPLE = GSM3615 ^SAMPLE = GSM3616 ^SAMPLE = GSM3617 ^SAMPLE = GSM3618 ^SAMPLE = GSM3619 ^SAMPLE = GSM3620 ^SAMPLE = GSM3621 ^SAMPLE = GSM3622 ^SAMPLE = GSM3623 ^SAMPLE = GSM3624 ^SAMPLE = GSM3625 ^SAMPLE = GSM3626 ^SAMPLE = GSM3627 ^SAMPLE = GSM3628 ^SAMPLE = GSM3629 ^SAMPLE = GSM3630 ^SAMPLE = GSM3631 ^SAMPLE = GSM3632 ^SAMPLE = GSM3633 ^SAMPLE = GSM3634 ^SAMPLE = GSM3635 ^SAMPLE = GSM3636 ^SAMPLE = GSM3637 ^SAMPLE = GSM3638 ^SAMPLE = GSM3639 ^SAMPLE = GSM3640 ^SAMPLE = GSM3641 ^SAMPLE = GSM3642 ^SAMPLE = GSM3643 ^SAMPLE = GSM3644 ^SAMPLE = GSM3645 ^SAMPLE = GSM3646 ^SAMPLE = GSM3647 ^SAMPLE = GSM3648 ^SAMPLE = GSM3649 ^SAMPLE = GSM3650 ^SAMPLE = GSM3651 ^SAMPLE = GSM3652 ^SAMPLE = GSM3653 ^SAMPLE = GSM3654 ^SAMPLE = GSM3655 ^SAMPLE = GSM3656 ^SAMPLE = GSM3657 ^SAMPLE = GSM3658 ^SAMPLE = GSM3659 ^SAMPLE = GSM3660 ^SAMPLE = GSM3661 ^SAMPLE = GSM3662 ^SAMPLE = GSM3663 ^SAMPLE = GSM3664 ^SAMPLE = GSM3665 ^SAMPLE = GSM3666 ^SAMPLE = GSM3667 ^SAMPLE = GSM3668 > sessionInfo() R version 2.5.0 (2007-04-23) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" other attached packages: GEOquery "2.0.5" > Best, Saroj Tobias Straub wrote: >I tried to fetch and parse the GSE94 series from GEO using GEOquery >library (gse<-getGEO('GSE94')). Operation is aborted with the message: > >Error in make.names(as.character(names), allow_) : > invalid multibyte string 29 >In addition: There were 12 warnings (use warnings() to see them) > >is that an error of getGEO or a problem of the data set? > >Tobias > > > sessionInfo() >R version 2.5.1 (2007-06-27) >i386-apple-darwin8.9.1 > >locale: >en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > >attached base packages: >[1] "stats" "graphics" "grDevices" "utils" "datasets" >"methods" "base" > >other attached packages: >GEOquery >"2.0.6" > > >===================================================================== = >Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology >tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
Tobias Straub ▴ 430
@tobias-straub-2182
Last seen 10.2 years ago
ok, I figured it out myself! modified GEOquery *** GEOquery Fri Aug 10 08:24:53 2007 --- GEOquery_new Fri Aug 31 14:43:15 2007 *************** *** 495,500 **** --- 495,501 ---- nextEntity <- "" while(!finished) { line <- readLines(con,1) + line <- iconv(line, "LATIN2", "UTF-8") if(length(line)==0) finished <- TRUE a[lines] <- line lines <- lines+1 *************** *** 510,515 **** --- 511,517 ---- finished <- FALSE while(!finished) { line <- readLines(con,1) + line <- iconv(line, "LATIN2", "UTF-8") if(length(line)==0) { finished <- TRUE } else { On Aug 31, 2007, at 2:17 PM, Tobias Straub wrote: > I tried to fetch and parse the GSE94 series from GEO using GEOquery > library (gse<-getGEO('GSE94')). Operation is aborted with the message: > > Error in make.names(as.character(names), allow_) : > invalid multibyte string 29 > In addition: There were 12 warnings (use warnings() to see them) > > is that an error of getGEO or a problem of the data set? > > Tobias > >> sessionInfo() > R version 2.5.1 (2007-06-27) > i386-apple-darwin8.9.1 > > locale: > en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" "datasets" > "methods" "base" > > other attached packages: > GEOquery > "2.0.6" > > > ====================================================================== > Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology > tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor ====================================================================== Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany
ADD COMMENT

Login before adding your answer.

Traffic: 634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6