Search
Question: Renaming the seqlevels in a transcript database from GenomicFeatures
4
gravatar for Fong Chun Chan
3.4 years ago by
Fong Chun Chan320 wrote:
Hi, I am using the GenomicFeatures package to extract exons from a transcript database file. I am using the ensembl transcript database which has no chr, yet my bam files that I am working have the chr prefix. I was thinking one could append chr to the seqlevels in the transcript database like this: library('GenomicFeatures') txdb <- loadDb(txdbFile) newSeqNames <- paste('chr', seqlevels(txdb), sep = '') names(newSeqNames) <- seqlevels(txdb) txdb <- renameSeqlevels( txdb, newSeqNames ) seqlevels(txdb) [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" "chr11" "chr12" [13] "chr13" "chr14" This appears to work. But when I run: transcripts(txdb, vals = list( tx_chrom = "chr1") ) GRanges with 0 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> --- seqlengths: It suggests that the chromosome renaming is not working...because when I run: transcripts(txdb, vals = list( tx_chrom = '1') ) GRanges with 17239 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr1 [11869, 14409] + | 126213 ENST00000456328 Has anyone had any luck in getting this to work? Thanks, [[alternative HTML version deleted]]
ADD COMMENTlink modified 3.4 years ago by Sonali Arora360 • written 3.4 years ago by Fong Chun Chan320
3
gravatar for Sonali Arora
3.4 years ago by
Sonali Arora360
United States
Sonali Arora360 wrote:
Hi Fong, A TranscriptDb cannot be directly subset with 'keepSeqlevels' and 'dropSeqlevels'. But the GRanges or GRangesLists extracted from the TranscriptDb can be subset. Below I show an example for how to extract GRanges, subset and change the chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1") ## load packages library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg18.knownGene) ## extract GRanges txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene seqlevels(txdb) txbygene <- transcriptsBy(txdb, "gene") seqlevels(txbygene) ## subset to get only ranges from chr 1 to 22 auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC", group="auto") txbygene <- keepSeqlevels(txbygene,auto) seqlevels(txbygene) ##convert GRangesList to GRanges ucsc_gr <- unlist(txbygene) ## renaming styles. newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI") ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle) ##check that naming was successful ncbi_gr[seqnames(ncbi_gr)=="1"] ucsc_gr[seqnames(ucsc_gr)=="chr1"] You can easily adapt this example for your scenario - to convert from "1" to "chr1". For more details : Please look at examples section of: http://www.bioconductor.org/packages/devel/bioc/vignettes/GenomeInfoDb /inst/doc/GenomeInfoDb.pdf and the man page for ?keepSeqlevels. Thanks and Regards, Sonali. On 6/26/2014 2:02 PM, Fong Chun Chan wrote: > Hi, > > I am using the GenomicFeatures package to extract exons from a transcript > database file. I am using the ensembl transcript database which has no chr, > yet my bam files that I am working have the chr prefix. > > I was thinking one could append chr to the seqlevels in the transcript > database like this: > > library('GenomicFeatures') > txdb <- loadDb(txdbFile) > newSeqNames <- paste('chr', seqlevels(txdb), sep = '') > names(newSeqNames) <- seqlevels(txdb) > txdb <- renameSeqlevels( txdb, newSeqNames ) > seqlevels(txdb) > > [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" > "chr11" "chr12" [13] "chr13" "chr14" > > This appears to work. > > But when I run: > > transcripts(txdb, vals = list( tx_chrom = "chr1") ) > > GRanges with 0 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > --- > seqlengths: > > It suggests that the chromosome renaming is not working...because when I > run: > > transcripts(txdb, vals = list( tx_chrom = '1') ) > > GRanges with 17239 ranges and 2 metadata columns: > seqnames ranges strand | tx_id > tx_name > <rle> <iranges> <rle> | <integer> > <character> > [1] chr1 [11869, 14409] + | 126213 > ENST00000456328 > > Has anyone had any luck in getting this to work? > > Thanks, > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Thanks and Regards, Sonali
ADD COMMENTlink written 3.4 years ago by Sonali Arora360
Hi Sonali, I'm pretty sure what Fong has reported is a legitimate bug. The seqlevels are mapped at some level, but the values for tx_chrom have not been updated. Also, what you say about keepSeqlevels and TranscriptDb is not true, at least for devel (and this stuff was written a year ago): > seqlevels(keepSeqlevels(txdb, "1")) [1] "1" Michael On Fri, Jun 27, 2014 at 12:18 PM, Sonali Arora <sarora@fhcrc.org> wrote: > Hi Fong, > > A TranscriptDb cannot be directly subset with 'keepSeqlevels' and > 'dropSeqlevels'. > But the GRanges or GRangesLists extracted from the TranscriptDb can be > subset. > Below I show an example for how to extract GRanges, subset and change the > chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1") > > ## load packages > library(GenomicFeatures) > library(TxDb.Hsapiens.UCSC.hg18.knownGene) > > ## extract GRanges > txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene > seqlevels(txdb) > txbygene <- transcriptsBy(txdb, "gene") > seqlevels(txbygene) > > ## subset to get only ranges from chr 1 to 22 > auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC", > group="auto") > txbygene <- keepSeqlevels(txbygene,auto) > seqlevels(txbygene) > > ##convert GRangesList to GRanges > ucsc_gr <- unlist(txbygene) > > ## renaming styles. > newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI") > ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle) > > ##check that naming was successful > ncbi_gr[seqnames(ncbi_gr)=="1"] > ucsc_gr[seqnames(ucsc_gr)=="chr1"] > > You can easily adapt this example for your scenario - to convert from > "1" to "chr1". For more details : Please look at examples section of: > http://www.bioconductor.org/packages/devel/bioc/vignettes/ > GenomeInfoDb/inst/doc/GenomeInfoDb.pdf > and the man page for ?keepSeqlevels. > > Thanks and Regards, > Sonali. > > > > > On 6/26/2014 2:02 PM, Fong Chun Chan wrote: > >> Hi, >> >> I am using the GenomicFeatures package to extract exons from a transcript >> database file. I am using the ensembl transcript database which has no >> chr, >> yet my bam files that I am working have the chr prefix. >> >> I was thinking one could append chr to the seqlevels in the transcript >> database like this: >> >> library('GenomicFeatures') >> txdb <- loadDb(txdbFile) >> newSeqNames <- paste('chr', seqlevels(txdb), sep = '') >> names(newSeqNames) <- seqlevels(txdb) >> txdb <- renameSeqlevels( txdb, newSeqNames ) >> seqlevels(txdb) >> >> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" >> "chr11" "chr12" [13] "chr13" "chr14" >> >> This appears to work. >> >> But when I run: >> >> transcripts(txdb, vals = list( tx_chrom = "chr1") ) >> >> GRanges with 0 ranges and 2 metadata columns: >> seqnames ranges strand | tx_id tx_name >> <rle> <iranges> <rle> | <integer> <character> >> --- >> seqlengths: >> >> It suggests that the chromosome renaming is not working...because when I >> run: >> >> transcripts(txdb, vals = list( tx_chrom = '1') ) >> >> GRanges with 17239 ranges and 2 metadata columns: >> seqnames ranges strand | tx_id >> tx_name >> <rle> <iranges> <rle> | <integer> >> <character> >> [1] chr1 [11869, 14409] + | 126213 >> ENST00000456328 >> >> Has anyone had any luck in getting this to work? >> >> Thanks, >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > -- > Thanks and Regards, > Sonali > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 3.4 years ago by Michael Lawrence9.8k
Hi guys, Indeed, 'transcripts(txdb, vals=list(tx_chrom="some_chrom"))' is broken when the "effective" seqlevels are different from the "native" ones. More precisely, the transcripts(), exons(), and cds() extractors are not translating the user-supplied seqlevels to native seqlevels before injecting them in the SQL query they generate. We need to fix that. We will post again here when it's done. Thanks for the report. H. On 06/27/2014 07:53 PM, Michael Lawrence wrote: > Hi Sonali, > > I'm pretty sure what Fong has reported is a legitimate bug. The seqlevels > are mapped at some level, but the values for tx_chrom have not been > updated. > > Also, what you say about keepSeqlevels and TranscriptDb is not true, at > least for devel (and this stuff was written a year ago): > >> seqlevels(keepSeqlevels(txdb, "1")) > [1] "1" > > Michael > > > On Fri, Jun 27, 2014 at 12:18 PM, Sonali Arora <sarora at="" fhcrc.org=""> wrote: > >> Hi Fong, >> >> A TranscriptDb cannot be directly subset with 'keepSeqlevels' and >> 'dropSeqlevels'. >> But the GRanges or GRangesLists extracted from the TranscriptDb can be >> subset. >> Below I show an example for how to extract GRanges, subset and change the >> chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1") >> >> ## load packages >> library(GenomicFeatures) >> library(TxDb.Hsapiens.UCSC.hg18.knownGene) >> >> ## extract GRanges >> txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene >> seqlevels(txdb) >> txbygene <- transcriptsBy(txdb, "gene") >> seqlevels(txbygene) >> >> ## subset to get only ranges from chr 1 to 22 >> auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC", >> group="auto") >> txbygene <- keepSeqlevels(txbygene,auto) >> seqlevels(txbygene) >> >> ##convert GRangesList to GRanges >> ucsc_gr <- unlist(txbygene) >> >> ## renaming styles. >> newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI") >> ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle) >> >> ##check that naming was successful >> ncbi_gr[seqnames(ncbi_gr)=="1"] >> ucsc_gr[seqnames(ucsc_gr)=="chr1"] >> >> You can easily adapt this example for your scenario - to convert from >> "1" to "chr1". For more details : Please look at examples section of: >> http://www.bioconductor.org/packages/devel/bioc/vignettes/ >> GenomeInfoDb/inst/doc/GenomeInfoDb.pdf >> and the man page for ?keepSeqlevels. >> >> Thanks and Regards, >> Sonali. >> >> >> >> >> On 6/26/2014 2:02 PM, Fong Chun Chan wrote: >> >>> Hi, >>> >>> I am using the GenomicFeatures package to extract exons from a transcript >>> database file. I am using the ensembl transcript database which has no >>> chr, >>> yet my bam files that I am working have the chr prefix. >>> >>> I was thinking one could append chr to the seqlevels in the transcript >>> database like this: >>> >>> library('GenomicFeatures') >>> txdb <- loadDb(txdbFile) >>> newSeqNames <- paste('chr', seqlevels(txdb), sep = '') >>> names(newSeqNames) <- seqlevels(txdb) >>> txdb <- renameSeqlevels( txdb, newSeqNames ) >>> seqlevels(txdb) >>> >>> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" >>> "chr11" "chr12" [13] "chr13" "chr14" >>> >>> This appears to work. >>> >>> But when I run: >>> >>> transcripts(txdb, vals = list( tx_chrom = "chr1") ) >>> >>> GRanges with 0 ranges and 2 metadata columns: >>> seqnames ranges strand | tx_id tx_name >>> <rle> <iranges> <rle> | <integer> <character> >>> --- >>> seqlengths: >>> >>> It suggests that the chromosome renaming is not working...because when I >>> run: >>> >>> transcripts(txdb, vals = list( tx_chrom = '1') ) >>> >>> GRanges with 17239 ranges and 2 metadata columns: >>> seqnames ranges strand | tx_id >>> tx_name >>> <rle> <iranges> <rle> | <integer> >>> <character> >>> [1] chr1 [11869, 14409] + | 126213 >>> ENST00000456328 >>> >>> Has anyone had any luck in getting this to work? >>> >>> Thanks, >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane. >>> science.biology.informatics.conductor >>> >> >> -- >> Thanks and Regards, >> Sonali >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 3.4 years ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 159 users visited in the last hour