GenomicFeatures Transcripts Retrieval Fails
1
0
Entering edit mode
@sharvarigujjasanoficom-6610
Last seen 9.7 years ago
Hi Steve, I get the same error trying to run txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='knownGene') Error in function (type, msg, asError = TRUE) : couldn't connect to host txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene') Error in function (type, msg, asError = TRUE) : couldn't connect to host I did install the required packages, so not what I am missing here. source("http://bioconductor.org/biocLite.R") biocLite() biocLite(c("GenomicFeatures", "AnnotationDbi")) library("GenomicFeatures") Could you please help me with this error. Many Thanks Sharvari Gujja [[alternative HTML version deleted]]
• 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 38 minutes ago
United States
Hi Sharvari, On 6/17/2014 10:56 AM, Sharvari.Gujja at sanofi.com wrote: > Hi Steve, > > > I get the same error trying to run txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='knownGene') > > Error in function (type, msg, asError = TRUE) : couldn't connect to host This error means you are not able to connect to UCSC. This may be due to an intermittent outage on their end, or possibly because you are behind a firewall. But note that if you want the knownGene transcript package, you can get that from Bioconductor without having to build it yourself: library(BiocInstaller) biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") If you want the ensGene table you will have to build that one yourself. I just tried that using your code, and it works for me: > txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene') Download the ensGene table ... OK Extract the 'transcripts' data frame ... OK Extract the 'splicings' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TranscriptDb object ... OK Warning message: In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : UCSC data anomaly in 19284 transcript(s): the cds cumulative length is not a multiple of 3 for transcripts ?ENST00000513161? ?ENST00000417833? ?ENST00000450884? ?ENST00000431193? ?ENST00000367667? ?ENST00000498306? ?ENST00000434641? ?ENST00000462097? ?ENST00000475119? ?ENST00000480643? ?ENST00000525843? ?ENST00000498419? ?ENST00000532678? ?ENST00000460428? ?ENST00000478853? ?ENST00000372925? ?ENST00000437607? ?ENST00000416121? ?ENST00000582567? ?ENST00000413489? ?ENST00000425265? ?ENST00000534717? ?ENST00000436685? ?ENST00000606954? ?ENST00000484054? ?ENST00000414971? ?ENST00000443667? ?ENST00000417191? ?ENST00000559578? ?ENST00000482110? ?ENST00000524607? ?ENST00000419169? ?ENST00000295713? ?ENST00000609181? ?ENST00000327794? ?ENST00000450490? ?ENST00000602582? ?ENST00000453676? ?ENST00000513088? ?ENST [... truncated] > txdb TranscriptDb object: | Db type: TranscriptDb | Supporting package: GenomicFeatures | Data source: UCSC | Genome: hg19 | Organism: Homo sapiens | UCSC Table: ensGene | Resource URL: http://genome.ucsc.edu/ | Type of Gene ID: Ensembl gene ID | Full dataset: yes | miRBase build ID: NA | transcript_nrow: 204940 | exon_nrow: 584914 | cds_nrow: 280379 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2014-06-17 09:34:13 -0700 (Tue, 17 Jun 2014) | GenomicFeatures version at creation time: 1.16.2 | RSQLite version at creation time: 0.11.4 | DBSCHEMAVERSION: 1.0 So you might try again. If you are on Windows, you might be having a proxy issue, in which case you might use the setInternet2() function prior to running makeTranscriptDbFromUCSC(). Best, Jim > > > > txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene') > > Error in function (type, msg, asError = TRUE) : couldn't connect to host > > I did install the required packages, so not what I am missing here. > > source("http://bioconductor.org/biocLite.R") > biocLite() > biocLite(c("GenomicFeatures", "AnnotationDbi")) > library("GenomicFeatures") > > Could you please help me with this error. > > Many Thanks > Sharvari Gujja > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi Jim, Thanks for the reply. Yes, I am running this on Windows. I followed your suggestion to use setInternet2() function first, but I still get an error: > setInternet2() > txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene') Error in function (type, msg, asError = TRUE) : couldn't connect to host I also tried: > biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") BioC_mirror: http://bioconductor.org Using Bioconductor version 2.14 (BiocInstaller 1.14.2), R version 3.1.0. Installing package(s) 'TxDb.Hsapiens.UCSC.hg19.knownGene' trying URL ' http://bioconductor.org/packages/2.14/data/annotation/bin/windows/cont rib/3.1/TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0.zip ' Content type 'application/zip' length 18546564 bytes (17.7 Mb) opened URL downloaded 17.7 Mb How do I read this table "TxDb.Hsapiens.UCSC.hg19.knownGene"? Also, is there documentation on the differences between "knownGene" and "ensGene"? Thanks for helping. Sharvari On Tue, Jun 17, 2014 at 12:39 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Sharvari, > > > On 6/17/2014 10:56 AM, Sharvari.Gujja@sanofi.com wrote: > >> Hi Steve, >> >> >> I get the same error trying to run txdb <- makeTranscriptDbFromUCSC( >> genome='hg19',tablename='knownGene') >> >> Error in function (type, msg, asError = TRUE) : couldn't connect to host >> > > This error means you are not able to connect to UCSC. This may be due to > an intermittent outage on their end, or possibly because you are behind a > firewall. > > But note that if you want the knownGene transcript package, you can get > that from Bioconductor without having to build it yourself: > > library(BiocInstaller) > biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") > > If you want the ensGene table you will have to build that one yourself. I > just tried that using your code, and it works for me: > > > > txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene') > Download the ensGene table ... OK > Extract the 'transcripts' data frame ... OK > Extract the 'splicings' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... OK > Warning message: > In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : > UCSC data anomaly in 19284 transcript(s): the cds cumulative length is > not a multiple of 3 for transcripts ‘ENST00000513161’ > ‘ENST00000417833’ ‘ENST00000450884’ ‘ENST00000431193’ > ‘ENST00000367667’ ‘ENST00000498306’ ‘ENST00000434641’ > ‘ENST00000462097’ ‘ENST00000475119’ ‘ENST00000480643’ > ‘ENST00000525843’ ‘ENST00000498419’ ‘ENST00000532678’ > ‘ENST00000460428’ ‘ENST00000478853’ ‘ENST00000372925’ > ‘ENST00000437607’ ‘ENST00000416121’ ‘ENST00000582567’ > ‘ENST00000413489’ ‘ENST00000425265’ ‘ENST00000534717’ > ‘ENST00000436685’ ‘ENST00000606954’ ‘ENST00000484054’ > ‘ENST00000414971’ ‘ENST00000443667’ ‘ENST00000417191’ > ‘ENST00000559578’ ‘ENST00000482110’ ‘ENST00000524607’ > ‘ENST00000419169’ ‘ENST00000295713’ ‘ENST00000609181’ > ‘ENST00000327794’ ‘ENST00000450490’ ‘ENST00000602582’ > ‘ENST00000453676’ ‘ENST00000513088’ ‘ENST [... truncated] > > txdb > TranscriptDb object: > | Db type: TranscriptDb > | Supporting package: GenomicFeatures > | Data source: UCSC > | Genome: hg19 > | Organism: Homo sapiens > | UCSC Table: ensGene > | Resource URL: http://genome.ucsc.edu/ > | Type of Gene ID: Ensembl gene ID > | Full dataset: yes > | miRBase build ID: NA > | transcript_nrow: 204940 > | exon_nrow: 584914 > | cds_nrow: 280379 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2014-06-17 09:34:13 -0700 (Tue, 17 Jun 2014) > | GenomicFeatures version at creation time: 1.16.2 > | RSQLite version at creation time: 0.11.4 > | DBSCHEMAVERSION: 1.0 > > So you might try again. If you are on Windows, you might be having a proxy > issue, in which case you might use the setInternet2() function prior to > running makeTranscriptDbFromUCSC(). > > Best, > > Jim > > > > > >> >> >> txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene') >> >> Error in function (type, msg, asError = TRUE) : couldn't connect to host >> >> I did install the required packages, so not what I am missing here. >> >> source("http://bioconductor.org/biocLite.R") >> biocLite() >> biocLite(c("GenomicFeatures", "AnnotationDbi")) >> library("GenomicFeatures") >> >> Could you please help me with this error. >> >> Many Thanks >> Sharvari Gujja >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Sharvari, On 6/17/2014 1:04 PM, sharvari gujja wrote: > Hi Jim, > > Thanks for the reply. Yes, I am running this on Windows. I followed your > suggestion to use setInternet2() function first, but I still get an error: > > > setInternet2() > > txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene') > Error in function (type, msg, asError = TRUE) : couldn't connect to host You might consult one of your IT people about that, but otherwise I have nothing more for you. Well, except you could use the makeTranscriptDbFromGFF() function if you are really a fan of ensGene. You can go to UCSC using a browser, then click on the 'Tables' menu item. There you could choose group: Genes and Gene Predictions track: Ensembl Genes table: ensGene output format: GTF - gene transfer format Set the output file to be something reasonable, then click 'get output'. You can then use that to create a TxDb. But unless you are dead set on using Ensembl genes, it's probably not worth the bother. > > I also tried: > > > biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") > BioC_mirror: http://bioconductor.org > Using Bioconductor version 2.14 (BiocInstaller 1.14.2), R version 3.1.0. > Installing package(s) 'TxDb.Hsapiens.UCSC.hg19.knownGene' > trying URL > 'http://bioconductor.org/packages/2.14/data/annotation/bin/windows/c ontrib/3.1/TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0.zip' > Content type 'application/zip' length 18546564 bytes (17.7 Mb) > opened URL > downloaded 17.7 Mb > > How do I read this table "TxDb.Hsapiens.UCSC.hg19.knownGene"? Also, is > there documentation on the differences between "knownGene" and "ensGene"? You want to do some reading: http://bioconductor.org/packages/release/bioc/vignettes/GenomicFeature s/inst/doc/GenomicFeatures.pdf There are literally a bazillion things you can do with a TxDb object, so unless you have a use case that you want to talk about, you will have to do some self-learning (which you should be doing anyway, so there you go). As far as documentation, you can start with UCSC's table page. If you do as I describe above, and then click on the 'describe table schema' button you get a page that says that the genes and gene predictions come from Ensembl, and they have a link to ensembl.org, where you can do more reading. For the knownGene table, if you change to track: UCSC Genes table: knownGene and then click 'describe table schema' there is a whole webpage describing how they generate those data. Best, Jim > > Thanks for helping. > Sharvari > > > > > On Tue, Jun 17, 2014 at 12:39 PM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Sharvari, > > > On 6/17/2014 10:56 AM, Sharvari.Gujja at sanofi.com > <mailto:sharvari.gujja at="" sanofi.com=""> wrote: > > Hi Steve, > > > I get the same error trying to run txdb <- > makeTranscriptDbFromUCSC(__genome='hg19',tablename='__knownGene') > > Error in function (type, msg, asError = TRUE) : couldn't > connect to host > > > This error means you are not able to connect to UCSC. This may be > due to an intermittent outage on their end, or possibly because you > are behind a firewall. > > But note that if you want the knownGene transcript package, you can > get that from Bioconductor without having to build it yourself: > > library(BiocInstaller) > biocLite("TxDb.Hsapiens.UCSC.__hg19.knownGene") > > If you want the ensGene table you will have to build that one > yourself. I just tried that using your code, and it works for me: > > > > txdb <- > makeTranscriptDbFromUCSC(__genome='hg19',tablename='__ensGene') > Download the ensGene table ... OK > Extract the 'transcripts' data frame ... OK > Extract the 'splicings' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... OK > Warning message: > In .__extractCdsLocsFromUCSCTxTable(__ucsc_txtable, exon_locs) : > UCSC data anomaly in 19284 transcript(s): the cds cumulative > length is > not a multiple of 3 for transcripts ?ENST00000513161? > ?ENST00000417833? ?ENST00000450884? ?ENST00000431193? > ?ENST00000367667? ?ENST00000498306? ?ENST00000434641? > ?ENST00000462097? ?ENST00000475119? ?ENST00000480643? > ?ENST00000525843? ?ENST00000498419? ?ENST00000532678? > ?ENST00000460428? ?ENST00000478853? ?ENST00000372925? > ?ENST00000437607? ?ENST00000416121? ?ENST00000582567? > ?ENST00000413489? ?ENST00000425265? ?ENST00000534717? > ?ENST00000436685? ?ENST00000606954? ?ENST00000484054? > ?ENST00000414971? ?ENST00000443667? ?ENST00000417191? > ?ENST00000559578? ?ENST00000482110? ?ENST00000524607? > ?ENST00000419169? ?ENST00000295713? ?ENST00000609181? > ?ENST00000327794? ?ENST00000450490? ?ENST00000602582? > ?ENST00000453676? ?ENST00000513088? ?ENST [... truncated] > > txdb > TranscriptDb object: > | Db type: TranscriptDb > | Supporting package: GenomicFeatures > | Data source: UCSC > | Genome: hg19 > | Organism: Homo sapiens > | UCSC Table: ensGene > | Resource URL: http://genome.ucsc.edu/ > | Type of Gene ID: Ensembl gene ID > | Full dataset: yes > | miRBase build ID: NA > | transcript_nrow: 204940 > | exon_nrow: 584914 > | cds_nrow: 280379 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2014-06-17 09:34:13 -0700 (Tue, 17 Jun 2014) > | GenomicFeatures version at creation time: 1.16.2 > | RSQLite version at creation time: 0.11.4 > | DBSCHEMAVERSION: 1.0 > > So you might try again. If you are on Windows, you might be having a > proxy issue, in which case you might use the setInternet2() function > prior to running makeTranscriptDbFromUCSC(). > > Best, > > Jim > > > > > > > > txdb <- > makeTranscriptDbFromUCSC(__genome='hg19',tablename='__ensGene') > > Error in function (type, msg, asError = TRUE) : couldn't > connect to host > > I did install the required packages, so not what I am missing here. > > source("http://bioconductor.__org/biocLite.R > <http: bioconductor.org="" bioclite.r="">") > biocLite() > biocLite(c("GenomicFeatures", "AnnotationDbi")) > library("GenomicFeatures") > > Could you please help me with this error. > > Many Thanks > Sharvari Gujja > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY

Login before adding your answer.

Traffic: 744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6