Hi,
I'm trying to make a txDb object from biomart.
When running this command:
txdb <- makeTxDbFromBiomart (biomart="plants_mart", dataset="athaliana_eg_gene", host="plants.ensembl.org")
I get this:
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... Error in .Ensembl_getMySQLCoreDir(dataset) :
found 0 or more than 1 subdir for "athaliana_eg_gene" dataset at ftp://ftp.ensembl.org/pub/current_mysql/
Please can you tell me what I am doing wrong.
I get the same error when running makeTxDb package.
Thanks a lot.
Thanks a lot. really helpful.
Is it normal to get these warnings:
Warning messages:
1: In readLines(gtf, n = 10) : line 1 appears to contain an embedded nul
2: In readLines(gtf, n = 10) : line 3 appears to contain an embedded nul
3: In readLines(gtf, n = 10) : line 5 appears to contain an embedded nul
4: In grep(tmp, pattern = "^#") : input string 1 is invalid in this locale
5: In grep(tmp, pattern = "^#") : input string 2 is invalid in this locale
6: In grep(tmp, pattern = "^#") : input string 3 is invalid in this locale
7: In grep(tmp, pattern = "^#") : input string 4 is invalid in this locale
8: In grep(tmp, pattern = "^#") : input string 5 is invalid in this locale
9: In ensDbFromGRanges(GTF, outfile = outfile, path = path, organism = organism, :
I'm missing column(s): 'entrezid'. The corresponding database column(s) will be empty!
These warnings are strange. The one related to the entrezid is fine, since they are not provided in the GTF file and hence the database column will be empty. Also warnings related to not fetching the sequence lengths should be OK - the function first tries to get them from ensembl and fails, but should be able to fetch them from the ensemblgenomes. Just check afterwards using the seqinfo if you've got sequence lengths (I did with using ensembldb from BioC 3.4).
Could you provide the output of the sessionInfo? And what exactly are you doing? Did you download the GTF file locally and use the ensDbFromGtf function?
Hi,
I used this script.
dbFile <- ensDbFromGtf("ftp://ftp.ensemblgenomes.org/pub/plants/release-34/gtf/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.34.gtf.gz")
## Load the database.
edb <- EnsDb(dbFile)
seqinfo(edb)
And I've got this:
Importing GTF file...trying URL 'ftp://ftp.ensemblgenomes.org/pub/plants/release-34/gtf/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.34.gtf.gz'
downloaded 9.5 MB
OK
Processing metadata...OK
Processing genes...
Attribute availability:
o gene_id... OK
o gene_name... OK
o entrezid... Nope
o gene_biotype... OK
OK
Processing transcripts...
Attribute availability:
o transcript_id... OK
o gene_id... OK
o transcript_biotype... OK
OK
Processing exons...OK
Processing chromosomes...Fetch seqlengths from ensembl, dataset athaliana_gene_ensembl version 34...Error in function (type, msg, asError = TRUE) :
Server denied you to change to the given directory
Unable to get sequence lengths from Ensembl for dataset: athaliana_gene_ensembl. Error was:
OK
OK
Generating index...OK
-------------
Verifying validity of the information in the database:
Checking transcripts...OK
Checking exons...OK
Warning messages:
1: In readLines(gtf, n = 10) : line 1 appears to contain an embedded nul
2: In readLines(gtf, n = 10) : line 5 appears to contain an embedded nul
3: In readLines(gtf, n = 10) : line 6 appears to contain an embedded nul
4: In readLines(gtf, n = 10) : line 7 appears to contain an embedded nul
5: In readLines(gtf, n = 10) : line 8 appears to contain an embedded nul
6: In readLines(gtf, n = 10) : line 9 appears to contain an embedded nul
7: In readLines(gtf, n = 10) : line 10 appears to contain an embedded nul
8: In ensDbFromGRanges(GTF, outfile = outfile, path = path, organism = organism, :
I'm missing column(s): 'entrezid'. The corresponding database column(s) will be empty!
I want to make a TxDb object.
Thanks a lot.
D.
I proposed this as an alternative to the TxDb object. EnsDb objects provide the same annotations, same methods and same functionality, but are specifically designed for Ensembl annotations. There is no way to convert an EnsDb to a TxDb, but you should be able to use the EnsDb as it was a TxDb.
Yes it works thanks. Is there a way to add entezid?
Thanks a lot.
D.
I can provide you an EnsDb package for A. thaliana for ensemblgenomes-34 (corresponds to Ensembl 87) build from the MySQL database dumps and using the Ensembl perl API. But I checked, also there is no entrezid available - seems NCBI does not provide annotations for plants?