Question

Question about TranscriptDb and makeTranscriptDb method

0

Entering edit mode

Song Li ▴ 60

@song-li-4383

Last seen 9.7 years ago

Hi, All, I want to thank you for the incredible package which greatly simplifies our analysis for RNA-seq. However, I am working with Arabidopsis RNA-seq data, however, it seems that I have to build a transcriptDb object by myself. Is there a function that reads GTF file and make transcriptDB object? Thanks, Song Li -- Postdoctoral Associate Institute for Genome Sciences and Policy Duke University

TranscriptDb TranscriptDb • 896 views

ADD COMMENT • link updated 13.4 years ago by Hervé Pagès 16k • written 13.4 years ago by Song Li ▴ 60

score 0 · Answer 1 · 2010-12-11

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 13 hours ago

Seattle, WA, United States

Hi Song, On 12/10/2010 11:16 AM, Song Li wrote: > Hi, All, > > I want to thank you for the incredible package which greatly > simplifies our analysis for RNA-seq. > > However, I am working with Arabidopsis RNA-seq data, however, it seems > that I have to build a transcriptDb object by myself. Is there a > function that reads GTF file and make transcriptDB object? No we don't have this yet but we might add it in the future. In the mean time you can build a TranscriptDb object for Arabidopsis by using the alyrata_eg_gene dataset from the plant_mart_7 Mart: > library(GenomicFeatures) > txdb <- makeTranscriptDbFromBiomart("plant_mart_7", "alyrata_eg_gene") Download and preprocess the 'transcripts' data frame ... OK Download and preprocess the 'splicings' data frame ... OK Download and preprocess the 'genes' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TranscriptDb object ... OK Warning messages: 1: In .normargSplicings(splicings, unique_tx_ids) : no CDS information for this TranscriptDb object 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) : chromosome lengths and circularity flags are not available for this TranscriptDb object > txdb TranscriptDb object: | Db type: TranscriptDb | Data source: BioMart | BioMart database: plant_mart_7 | BioMart database version: ENSEMBL PLANT 7 (EBI UK) | BioMart dataset: alyrata_eg_gene | BioMart dataset description: Arabidopsis lyrata genes (Araly1) | BioMart dataset version: Araly1 | Full dataset: yes | transcript_nrow: 32667 | exon_nrow: 174271 | cds_nrow: 0 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010) | GenomicFeatures version at creation time: 1.2.3 | RSQLite version at creation time: 0.9-4 | DBSCHEMAVERSION: 1.0 Just a reminder though that if you decide to use this then it's *crucial* that you align your RNA-seq data against the reference genome that corresponds to those annotations (I'm not sure which one it is, you'll need to investigate). Cheers, H. > > Thanks, > Song Li -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 13.4 years ago Hervé Pagès 16k

0

Entering edit mode

Hi Herv? , Thank you for the reply. I am little worried about the warning message that "CDS" is not available. However, it does not seem to be a crucial factor to consider at this moment. Best, Song 2010/12/11 Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Song, > > On 12/10/2010 11:16 AM, Song Li wrote: >> >> Hi, All, >> >> I want to thank you for the incredible package which greatly >> simplifies our analysis for RNA-seq. >> >> However, I am working with Arabidopsis RNA-seq data, however, it seems >> that I have to build a transcriptDb object by myself. ?Is there a >> function that reads GTF file and make transcriptDB object? > > No we don't have this yet but we might add it in the future. > In the mean time you can build a TranscriptDb object for > Arabidopsis by using the alyrata_eg_gene dataset from the > plant_mart_7 Mart: > >> library(GenomicFeatures) > >> txdb <- makeTranscriptDbFromBiomart("plant_mart_7", "alyrata_eg_gene") > Download and preprocess the 'transcripts' data frame ... OK > Download and preprocess the 'splicings' data frame ... OK > Download and preprocess the 'genes' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... OK > Warning messages: > 1: In .normargSplicings(splicings, unique_tx_ids) : > ?no CDS information for this TranscriptDb object > 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, > splicings$exon_chrom) : > ?chromosome lengths and circularity flags are not available for this > TranscriptDb object > >> txdb > TranscriptDb object: > | Db type: TranscriptDb > | Data source: BioMart > | BioMart database: plant_mart_7 > | BioMart database version: ENSEMBL PLANT 7 (EBI UK) > | BioMart dataset: alyrata_eg_gene > | BioMart dataset description: Arabidopsis lyrata genes (Araly1) > | BioMart dataset version: Araly1 > | Full dataset: yes > | transcript_nrow: 32667 > | exon_nrow: 174271 > | cds_nrow: 0 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010) > | GenomicFeatures version at creation time: 1.2.3 > | RSQLite version at creation time: 0.9-4 > | DBSCHEMAVERSION: 1.0 > > Just a reminder though that if you decide to use this then it's > *crucial* that you align your RNA-seq data against the reference > genome that corresponds to those annotations (I'm not sure which > one it is, you'll need to investigate). > > Cheers, > H. > >> >> Thanks, >> Song Li > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319 > -- Postdoctoral Associate Institute for Genome Sciences and Policy Duke University

ADD REPLY • link 13.4 years ago Song Li ▴ 60

0

Entering edit mode

Hi Song, The message about CDS availability refers just to the ranges needed to populate the CDS tables. However, if you are like a lot of people you will only be asking questions about transcripts and exons, and in that case, I bet that this will not affect you. Marc On 12/13/2010 07:00 AM, Song Li wrote: > Hi Herv? , > > Thank you for the reply. > > I am little worried about the warning message that "CDS" is not > available. However, it does not seem to be a crucial factor to > consider at this moment. > > Best, > Song > > 2010/12/11 Hervé Pagès <hpages at="" fhcrc.org="">: > >> Hi Song, >> >> On 12/10/2010 11:16 AM, Song Li wrote: >> >>> Hi, All, >>> >>> I want to thank you for the incredible package which greatly >>> simplifies our analysis for RNA-seq. >>> >>> However, I am working with Arabidopsis RNA-seq data, however, it seems >>> that I have to build a transcriptDb object by myself. Is there a >>> function that reads GTF file and make transcriptDB object? >>> >> No we don't have this yet but we might add it in the future. >> In the mean time you can build a TranscriptDb object for >> Arabidopsis by using the alyrata_eg_gene dataset from the >> plant_mart_7 Mart: >> >> >>> library(GenomicFeatures) >>> >> >>> txdb <- makeTranscriptDbFromBiomart("plant_mart_7", "alyrata_eg_gene") >>> >> Download and preprocess the 'transcripts' data frame ... OK >> Download and preprocess the 'splicings' data frame ... OK >> Download and preprocess the 'genes' data frame ... OK >> Prepare the 'metadata' data frame ... OK >> Make the TranscriptDb object ... OK >> Warning messages: >> 1: In .normargSplicings(splicings, unique_tx_ids) : >> no CDS information for this TranscriptDb object >> 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, >> splicings$exon_chrom) : >> chromosome lengths and circularity flags are not available for this >> TranscriptDb object >> >> >>> txdb >>> >> TranscriptDb object: >> | Db type: TranscriptDb >> | Data source: BioMart >> | BioMart database: plant_mart_7 >> | BioMart database version: ENSEMBL PLANT 7 (EBI UK) >> | BioMart dataset: alyrata_eg_gene >> | BioMart dataset description: Arabidopsis lyrata genes (Araly1) >> | BioMart dataset version: Araly1 >> | Full dataset: yes >> | transcript_nrow: 32667 >> | exon_nrow: 174271 >> | cds_nrow: 0 >> | Db created by: GenomicFeatures package from Bioconductor >> | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010) >> | GenomicFeatures version at creation time: 1.2.3 >> | RSQLite version at creation time: 0.9-4 >> | DBSCHEMAVERSION: 1.0 >> >> Just a reminder though that if you decide to use this then it's >> *crucial* that you align your RNA-seq data against the reference >> genome that corresponds to those annotations (I'm not sure which >> one it is, you'll need to investigate). >> >> Cheers, >> H. >> >> >>> Thanks, >>> Song Li >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> > > >

ADD REPLY • link 13.4 years ago Marc Carlson ★ 7.2k