predictCoding() return empty Granges

0

Entering edit mode

sun ▴ 100

@sun-5534

Last seen 9.1 years ago

United States

After I successfully adjusted the chromosomes of VCF and TranscriptDb objects to match the names of the BSgenome object, I ran predictCoding(), it returned nothing except a warning message, see below, > coding <- predictCoding(vcf, txdb, seqSource=Athaliana) *Warning message: In .setActiveSubjectSeq(query, subject) : circular sequence(s) in query 'chrM' ignored* > coding GRanges with 0 ranges and 1 metadata column: seqnames ranges strand | GENEID <rle> <iranges> <rle> | <character> --- seqlengths: Here are the values of VCF, TranscriptDb and BSgenome Object, > vcf class: VCF dim: 551201 2 genome: TAIR10 exptData(1): header fixed(4): REF ALT QUAL FILTER info(18): DP DP4 ... PR VDB geno(6): GT GQ ... SP PL rownames(551201): Chr1:711 Chr1:956 ... ChrM:366919 ChrM:366920 *Qustion: After used renameSeqlevels() to change chr name of vcf (Chr to chr), the rownames here are still old chr name. not sure if this will be the problem for predictCoding(). * rowData values names(1): paramRangeID colnames(2): sample1_aln.sorted.bam sample2_aln.sorted.bam colData names(1): Samples > txdb TranscriptDb object: | Db type: TranscriptDb | Supporting package: GenomicFeatures | Data source: TAIR 10 | Genus and Species: Arabodpsis | miRBase build ID: NA | transcript_nrow: 35386 | exon_nrow: 207465 | cds_nrow: 197160 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2012-09-28 16:43:28 -0700 (Fri, 28 Sep 2012) | GenomicFeatures version at creation time: 1.9.37 | RSQLite version at creation time: 0.11.1 | DBSCHEMAVERSION: 1.0 > Athaliana Arabidopsis genome | | organism: Arabidopsis thaliana (Arabidopsis) | provider: TAIR | provider version: 04232008 | release date: NA | release name: dumped from ADB: Mar/14/08 | | sequences (see '?seqnames'): | chr1 chr2 chr3 chr4 chr5 chrC chrM | | (use the '$' or '[[' operator to access a given sequence) > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome.Athaliana.TAIR.04232008_1.3.18 BSgenome_1.26.0 VariantAnnotation_1.4.0 [4] Rsamtools_1.10.1 Biostrings_2.26.0 GenomicFeatures_1.10.0 [7] AnnotationDbi_1.20.0 Biobase_2.18.0 GenomicRanges_1.10.0 [10] IRanges_1.16.2 BiocGenerics_0.4.0 vimcom_0.9-2 [13] setwidth_1.0-0 colorout_0.9-9 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.95-0.1.2 RSQLite_0.11.2 XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.1 [7] parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1 tools_2.15.1 zlibbioc_1.4.0 any ideas? thank you. Rebecca On Thu, Oct 4, 2012 at 1:18 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Rebecca, > > > On 10/04/2012 12:10 PM, sun wrote: > >> Hi All, >> >> I am going to use "coding <- predictCoding(vcf, txdb, >> seqSource=Athaliana)" >> to detect coding SNPs. The problem is that the chromosome names are not >> consistent among VCF, txdb and BSgenome. In vcf, the chromosome name is >> "Chr*", in txdb, the chr name is "Chr", but in BSgenome, the chr name is >> "chr*" . >> >> I know I can use renameSeqlevels() to adjust the seqlevels (chromosome >> names) of the VCF object to match that of the txdb annotation. But how can >> I adjust the chr name of BSgenome or TranscriptDB? >> > > In BioC 2.11 (released yesterday), you can rename the chromosomes of a > TranscriptDb object, so you could rename the chromosomes of your > VCF and TranscriptDb objects to match the names of the BSgenome object. > > E.g. for the TranscriptDb object: > > seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb)) > > Note that renaming the chromosomes of a TranscriptDb object is a new > feature and is not fully implemented yet. For example, if you use > select() on the object you'll still get the original names (those > stored in the db), and if you try to specify a chromosome name thru > the 'vals' arg of the transcripts(), exons() and cds() extractors, > you still need to use the original names. This will be addressed soon. > > Our plan is to also support renaming of the chromosomes of BSgenome > and SNPlocs objects very soon. > > Also, an additional level of convenience will be provided via the > seqnameStyle() getter and setter, so you'll be able to quickly rename > with something like: > > seqnameStyle(x) <- "UCSC" > > or > > seqnameStyle(vcf) <- seqnameStyle(txdb) <- seqnameStyle(genome) > > This will work on almost any 'x' object that contains chromosome > names (GRanges, GRangesList, GappedAlignments, TranscriptDb, VCF, > BSgenome, SNPlocs, etc...) > > Cheers, > H. > > > > >> Thanks, >> >> Rebecca >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]

Annotation Cancer Arabidopsis thaliana BSgenome SNPlocs TranscriptDb BSgenome Annotation • 1.6k views

ADD COMMENT • link updated 12.5 years ago by Hervé Pagès 16k • written 12.5 years ago by sun ▴ 100

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 3.2 years ago

United States

Hi Sun, On 10/04/2012 02:44 PM, sun wrote: > After I successfully adjusted the chromosomes of VCF and TranscriptDb > objects to match the names of the BSgenome object, I ran predictCoding(), > it returned nothing except a warning message, see below, > >> coding<- predictCoding(vcf, txdb, seqSource=Athaliana) > *Warning message: > In .setActiveSubjectSeq(query, subject) : > circular sequence(s) in query 'chrM' ignored* >> coding > GRanges with 0 ranges and 1 metadata column: > seqnames ranges strand | GENEID > <rle> <iranges> <rle> |<character> > --- > seqlengths: > > Here are the values of VCF, TranscriptDb and BSgenome Object, > >> vcf > class: VCF > dim: 551201 2 > genome: TAIR10 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(18): DP DP4 ... PR VDB > geno(6): GT GQ ... SP PL > rownames(551201): Chr1:711 Chr1:956 ... ChrM:366919 ChrM:366920 *Qustion: > After used renameSeqlevels() to change chr name of vcf (Chr to chr), the > rownames here are still old chr name. not sure if this will be the problem > for predictCoding(). * Yes, having different chromosome names in the VCF file and the txdb will be a problem. Any time we want to match / find overlaps in ranges it is important that the seqnames (i.e., chromosomes) match. There is an example of how to use renameSeqlevels() on a VCF file on the predictCoding() man page. library(VariantAnnotation) ?predictCoding ## Read variants from a VCF file fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") vcf <- readVcf(fl, "hg19") ## Rename seqlevels in the VCF object to match those in the TxDb vcf<- renameSeqlevels(vcf, c("22"="chr22")) ## Confirm common seqlevels intersect(seqlevels(vcf), seqlevels(txdb)) Be sure to confirm that you have common seqlevels between your VCF and annotation. If you are having trouble please send a small reproducible example. Valerie > rowData values names(1): paramRangeID > colnames(2): sample1_aln.sorted.bam sample2_aln.sorted.bam > colData names(1): Samples > >> txdb > TranscriptDb object: > | Db type: TranscriptDb > | Supporting package: GenomicFeatures > | Data source: TAIR 10 > | Genus and Species: Arabodpsis > | miRBase build ID: NA > | transcript_nrow: 35386 > | exon_nrow: 207465 > | cds_nrow: 197160 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2012-09-28 16:43:28 -0700 (Fri, 28 Sep 2012) > | GenomicFeatures version at creation time: 1.9.37 > | RSQLite version at creation time: 0.11.1 > | DBSCHEMAVERSION: 1.0 > >> Athaliana > Arabidopsis genome > | > | organism: Arabidopsis thaliana (Arabidopsis) > | provider: TAIR > | provider version: 04232008 > | release date: NA > | release name: dumped from ADB: Mar/14/08 > | > | sequences (see '?seqnames'): > | chr1 chr2 chr3 chr4 chr5 chrC chrM > | > | (use the '$' or '[[' operator to access a given sequence) > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BSgenome.Athaliana.TAIR.04232008_1.3.18 > BSgenome_1.26.0 > VariantAnnotation_1.4.0 > [4] Rsamtools_1.10.1 > Biostrings_2.26.0 > GenomicFeatures_1.10.0 > [7] AnnotationDbi_1.20.0 > Biobase_2.18.0 > GenomicRanges_1.10.0 > [10] IRanges_1.16.2 > BiocGenerics_0.4.0 > vimcom_0.9-2 > [13] setwidth_1.0-0 > colorout_0.9-9 > > loaded via a namespace (and not attached): > [1] DBI_0.2-5 RCurl_1.95-0.1.2 RSQLite_0.11.2 > XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.1 > [7] parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1 > tools_2.15.1 zlibbioc_1.4.0 > > any ideas? thank you. > > Rebecca > > > On Thu, Oct 4, 2012 at 1:18 PM, Hervé Pagès<hpages@fhcrc.org> wrote: > >> Hi Rebecca, >> >> >> On 10/04/2012 12:10 PM, sun wrote: >> >>> Hi All, >>> >>> I am going to use "coding<- predictCoding(vcf, txdb, >>> seqSource=Athaliana)" >>> to detect coding SNPs. The problem is that the chromosome names are not >>> consistent among VCF, txdb and BSgenome. In vcf, the chromosome name is >>> "Chr*", in txdb, the chr name is "Chr", but in BSgenome, the chr name is >>> "chr*" . >>> >>> I know I can use renameSeqlevels() to adjust the seqlevels (chromosome >>> names) of the VCF object to match that of the txdb annotation. But how can >>> I adjust the chr name of BSgenome or TranscriptDB? >>> >> In BioC 2.11 (released yesterday), you can rename the chromosomes of a >> TranscriptDb object, so you could rename the chromosomes of your >> VCF and TranscriptDb objects to match the names of the BSgenome object. >> >> E.g. for the TranscriptDb object: >> >> seqlevels(txdb)<- sub("^c", "C", seqlevels(txdb)) >> >> Note that renaming the chromosomes of a TranscriptDb object is a new >> feature and is not fully implemented yet. For example, if you use >> select() on the object you'll still get the original names (those >> stored in the db), and if you try to specify a chromosome name thru >> the 'vals' arg of the transcripts(), exons() and cds() extractors, >> you still need to use the original names. This will be addressed soon. >> >> Our plan is to also support renaming of the chromosomes of BSgenome >> and SNPlocs objects very soon. >> >> Also, an additional level of convenience will be provided via the >> seqnameStyle() getter and setter, so you'll be able to quickly rename >> with something like: >> >> seqnameStyle(x)<- "UCSC" >> >> or >> >> seqnameStyle(vcf)<- seqnameStyle(txdb)<- seqnameStyle(genome) >> >> This will work on almost any 'x' object that contains chromosome >> names (GRanges, GRangesList, GappedAlignments, TranscriptDb, VCF, >> BSgenome, SNPlocs, etc...) >> >> Cheers, >> H. >> >> >> >> >>> Thanks, >>> >>> Rebecca >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 12.5 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 13 hours ago

Seattle, WA, United States

Hi Rebecca, On 10/04/2012 02:44 PM, sun wrote: > After I successfully adjusted the chromosomes of VCF and TranscriptDb > objects to match the names of the BSgenome object, I ran predictCoding(), > it returned nothing except a warning message, see below, > >> coding <- predictCoding(vcf, txdb, seqSource=Athaliana) > *Warning message: > In .setActiveSubjectSeq(query, subject) : > circular sequence(s) in query 'chrM' ignored* >> coding > GRanges with 0 ranges and 1 metadata column: > seqnames ranges strand | GENEID > <rle> <iranges> <rle> | <character> > --- > seqlengths: > > Here are the values of VCF, TranscriptDb and BSgenome Object, > >> vcf > class: VCF > dim: 551201 2 > genome: TAIR10 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(18): DP DP4 ... PR VDB > geno(6): GT GQ ... SP PL > rownames(551201): Chr1:711 Chr1:956 ... ChrM:366919 ChrM:366920 *Qustion: > After used renameSeqlevels() to change chr name of vcf (Chr to chr), the > rownames here are still old chr name. not sure if this will be the problem > for predictCoding(). * > rowData values names(1): paramRangeID > colnames(2): sample1_aln.sorted.bam sample2_aln.sorted.bam > colData names(1): Samples > >> txdb > TranscriptDb object: > | Db type: TranscriptDb > | Supporting package: GenomicFeatures > | Data source: TAIR 10 > | Genus and Species: Arabodpsis > | miRBase build ID: NA > | transcript_nrow: 35386 > | exon_nrow: 207465 > | cds_nrow: 197160 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2012-09-28 16:43:28 -0700 (Fri, 28 Sep 2012) > | GenomicFeatures version at creation time: 1.9.37 > | RSQLite version at creation time: 0.11.1 > | DBSCHEMAVERSION: 1.0 > >> Athaliana > Arabidopsis genome > | > | organism: Arabidopsis thaliana (Arabidopsis) > | provider: TAIR > | provider version: 04232008 > | release date: NA > | release name: dumped from ADB: Mar/14/08 > | > | sequences (see '?seqnames'): > | chr1 chr2 chr3 chr4 chr5 chrC chrM > | > | (use the '$' or '[[' operator to access a given sequence) We provide 2 BSgenome packages for Arabidopsis: a very old one BSgenome.Athaliana.TAIR.04232008 with chromosome names and lengths: > seqlengths(Athaliana) chr1 chr2 chr3 chr4 chr5 chrC chrM 30432563 19705359 23470805 18585042 26992728 154478 366924 and a somewhat less old one: BSgenome.Athaliana.TAIR.TAIR9 with chromosome names and lengths: > seqlengths(Athaliana) Chr1 Chr2 Chr3 Chr4 Chr5 ChrM ChrC 30427671 19698289 23459830 18585056 26975502 366924 154478 They correspond to 2 different assemblies of course. 2 things: (1) The code I showed you in the previous thread for renaming the sequence names of your TranscriptDb object was assuming that you were using the less old BSgenome package: seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb)) However it seems that you are using the very old one (where chr names are "chr*"), and that your txdb chr names are "Chr*", so you would actually need to do something like: seqlevels(txdb) <- sub("^C", "c", seqlevels(txdb)) (2) However, and this is much more important than (1): it seems that your VCF and TranscriptDb objects are based on the TAIR10 assembly. So it makes no sense to use anything else but a BSgenome package that contains the exact same assembly. Otherwise it's very likely that most of the predictions returned by predictCoding() will be wrong! Unfortunately at the moment we don't provide a BSgenome package for TAIR10 (nobody requested one so far). I'll try to make one in the next couple of weeks. I'll also remove BSgenome.Athaliana.TAIR.04232008 from the devel branch (BioC 2.12). Cheers, H. > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BSgenome.Athaliana.TAIR.04232008_1.3.18 > BSgenome_1.26.0 > VariantAnnotation_1.4.0 > [4] Rsamtools_1.10.1 > Biostrings_2.26.0 > GenomicFeatures_1.10.0 > [7] AnnotationDbi_1.20.0 > Biobase_2.18.0 > GenomicRanges_1.10.0 > [10] IRanges_1.16.2 > BiocGenerics_0.4.0 > vimcom_0.9-2 > [13] setwidth_1.0-0 > colorout_0.9-9 > > loaded via a namespace (and not attached): > [1] DBI_0.2-5 RCurl_1.95-0.1.2 RSQLite_0.11.2 > XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.1 > [7] parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1 > tools_2.15.1 zlibbioc_1.4.0 > > any ideas? thank you. > > Rebecca > > > On Thu, Oct 4, 2012 at 1:18 PM, Hervé Pagès <hpages at="" fhcrc.org=""> wrote: > >> Hi Rebecca, >> >> >> On 10/04/2012 12:10 PM, sun wrote: >> >>> Hi All, >>> >>> I am going to use "coding <- predictCoding(vcf, txdb, >>> seqSource=Athaliana)" >>> to detect coding SNPs. The problem is that the chromosome names are not >>> consistent among VCF, txdb and BSgenome. In vcf, the chromosome name is >>> "Chr*", in txdb, the chr name is "Chr", but in BSgenome, the chr name is >>> "chr*" . >>> >>> I know I can use renameSeqlevels() to adjust the seqlevels (chromosome >>> names) of the VCF object to match that of the txdb annotation. But how can >>> I adjust the chr name of BSgenome or TranscriptDB? >>> >> >> In BioC 2.11 (released yesterday), you can rename the chromosomes of a >> TranscriptDb object, so you could rename the chromosomes of your >> VCF and TranscriptDb objects to match the names of the BSgenome object. >> >> E.g. for the TranscriptDb object: >> >> seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb)) >> >> Note that renaming the chromosomes of a TranscriptDb object is a new >> feature and is not fully implemented yet. For example, if you use >> select() on the object you'll still get the original names (those >> stored in the db), and if you try to specify a chromosome name thru >> the 'vals' arg of the transcripts(), exons() and cds() extractors, >> you still need to use the original names. This will be addressed soon. >> >> Our plan is to also support renaming of the chromosomes of BSgenome >> and SNPlocs objects very soon. >> >> Also, an additional level of convenience will be provided via the >> seqnameStyle() getter and setter, so you'll be able to quickly rename >> with something like: >> >> seqnameStyle(x) <- "UCSC" >> >> or >> >> seqnameStyle(vcf) <- seqnameStyle(txdb) <- seqnameStyle(genome) >> >> This will work on almost any 'x' object that contains chromosome >> names (GRanges, GRangesList, GappedAlignments, TranscriptDb, VCF, >> BSgenome, SNPlocs, etc...) >> >> Cheers, >> H. >> >> >> >> >>> Thanks, >>> >>> Rebecca >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 12.5 years ago Hervé Pagès 16k

0

Entering edit mode

So the good news is that TAIR10 is the same assembly as TAIR9. See last line in: ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/README _whole_chromosomes.txt We don't need to add a TAIR10 BSgenome package to Bioconductor. You can use TAIR9. Cheers, H. On 10/04/2012 11:01 PM, Hervé Pagès wrote: > Hi Rebecca, > > On 10/04/2012 02:44 PM, sun wrote: >> After I successfully adjusted the chromosomes of VCF and TranscriptDb >> objects to match the names of the BSgenome object, I ran predictCoding(), >> it returned nothing except a warning message, see below, >> >>> coding <- predictCoding(vcf, txdb, seqSource=Athaliana) >> *Warning message: >> In .setActiveSubjectSeq(query, subject) : >> circular sequence(s) in query 'chrM' ignored* >>> coding >> GRanges with 0 ranges and 1 metadata column: >> seqnames ranges strand | GENEID >> <rle> <iranges> <rle> | <character> >> --- >> seqlengths: >> >> Here are the values of VCF, TranscriptDb and BSgenome Object, >> >>> vcf >> class: VCF >> dim: 551201 2 >> genome: TAIR10 >> exptData(1): header >> fixed(4): REF ALT QUAL FILTER >> info(18): DP DP4 ... PR VDB >> geno(6): GT GQ ... SP PL >> rownames(551201): Chr1:711 Chr1:956 ... ChrM:366919 ChrM:366920 >> *Qustion: >> After used renameSeqlevels() to change chr name of vcf (Chr to chr), the >> rownames here are still old chr name. not sure if this will be the >> problem >> for predictCoding(). * >> rowData values names(1): paramRangeID >> colnames(2): sample1_aln.sorted.bam sample2_aln.sorted.bam >> colData names(1): Samples >> >>> txdb >> TranscriptDb object: >> | Db type: TranscriptDb >> | Supporting package: GenomicFeatures >> | Data source: TAIR 10 >> | Genus and Species: Arabodpsis >> | miRBase build ID: NA >> | transcript_nrow: 35386 >> | exon_nrow: 207465 >> | cds_nrow: 197160 >> | Db created by: GenomicFeatures package from Bioconductor >> | Creation time: 2012-09-28 16:43:28 -0700 (Fri, 28 Sep 2012) >> | GenomicFeatures version at creation time: 1.9.37 >> | RSQLite version at creation time: 0.11.1 >> | DBSCHEMAVERSION: 1.0 >> >>> Athaliana >> Arabidopsis genome >> | >> | organism: Arabidopsis thaliana (Arabidopsis) >> | provider: TAIR >> | provider version: 04232008 >> | release date: NA >> | release name: dumped from ADB: Mar/14/08 >> | >> | sequences (see '?seqnames'): >> | chr1 chr2 chr3 chr4 chr5 chrC chrM >> | >> | (use the '$' or '[[' operator to access a given sequence) > > We provide 2 BSgenome packages for Arabidopsis: a very old one > > BSgenome.Athaliana.TAIR.04232008 > > with chromosome names and lengths: > > > seqlengths(Athaliana) > chr1 chr2 chr3 chr4 chr5 chrC chrM > 30432563 19705359 23470805 18585042 26992728 154478 366924 > > and a somewhat less old one: > > BSgenome.Athaliana.TAIR.TAIR9 > > with chromosome names and lengths: > > > seqlengths(Athaliana) > Chr1 Chr2 Chr3 Chr4 Chr5 ChrM ChrC > 30427671 19698289 23459830 18585056 26975502 366924 154478 > > They correspond to 2 different assemblies of course. > > 2 things: > > (1) The code I showed you in the previous thread for renaming > the sequence names of your TranscriptDb object was assuming > that you were using the less old BSgenome package: > > seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb)) > > However it seems that you are using the very old one (where > chr names are "chr*"), and that your txdb chr names are > "Chr*", so you would actually need to do something like: > > seqlevels(txdb) <- sub("^C", "c", seqlevels(txdb)) > > (2) However, and this is much more important than (1): it seems that > your VCF and TranscriptDb objects are based on the TAIR10 assembly. > So it makes no sense to use anything else but a BSgenome > package that contains the exact same assembly. Otherwise it's > very likely that most of the predictions returned by > predictCoding() will be wrong! > > Unfortunately at the moment we don't provide a BSgenome package for > TAIR10 (nobody requested one so far). I'll try to make one in the > next couple of weeks. I'll also remove > BSgenome.Athaliana.TAIR.04232008 from the devel branch (BioC 2.12). > > Cheers, > H. > > >> >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] BSgenome.Athaliana.TAIR.04232008_1.3.18 >> BSgenome_1.26.0 >> VariantAnnotation_1.4.0 >> [4] Rsamtools_1.10.1 >> Biostrings_2.26.0 >> GenomicFeatures_1.10.0 >> [7] AnnotationDbi_1.20.0 >> Biobase_2.18.0 >> GenomicRanges_1.10.0 >> [10] IRanges_1.16.2 >> BiocGenerics_0.4.0 >> vimcom_0.9-2 >> [13] setwidth_1.0-0 >> colorout_0.9-9 >> >> loaded via a namespace (and not attached): >> [1] DBI_0.2-5 RCurl_1.95-0.1.2 RSQLite_0.11.2 >> XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.1 >> [7] parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1 >> tools_2.15.1 zlibbioc_1.4.0 >> >> any ideas? thank you. >> >> Rebecca >> >> >> On Thu, Oct 4, 2012 at 1:18 PM, Hervé Pagès <hpages at="" fhcrc.org=""> wrote: >> >>> Hi Rebecca, >>> >>> >>> On 10/04/2012 12:10 PM, sun wrote: >>> >>>> Hi All, >>>> >>>> I am going to use "coding <- predictCoding(vcf, txdb, >>>> seqSource=Athaliana)" >>>> to detect coding SNPs. The problem is that the chromosome names are not >>>> consistent among VCF, txdb and BSgenome. In vcf, the chromosome name is >>>> "Chr*", in txdb, the chr name is "Chr", but in BSgenome, the chr >>>> name is >>>> "chr*" . >>>> >>>> I know I can use renameSeqlevels() to adjust the seqlevels (chromosome >>>> names) of the VCF object to match that of the txdb annotation. But >>>> how can >>>> I adjust the chr name of BSgenome or TranscriptDB? >>>> >>> >>> In BioC 2.11 (released yesterday), you can rename the chromosomes of a >>> TranscriptDb object, so you could rename the chromosomes of your >>> VCF and TranscriptDb objects to match the names of the BSgenome object. >>> >>> E.g. for the TranscriptDb object: >>> >>> seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb)) >>> >>> Note that renaming the chromosomes of a TranscriptDb object is a new >>> feature and is not fully implemented yet. For example, if you use >>> select() on the object you'll still get the original names (those >>> stored in the db), and if you try to specify a chromosome name thru >>> the 'vals' arg of the transcripts(), exons() and cds() extractors, >>> you still need to use the original names. This will be addressed soon. >>> >>> Our plan is to also support renaming of the chromosomes of BSgenome >>> and SNPlocs objects very soon. >>> >>> Also, an additional level of convenience will be provided via the >>> seqnameStyle() getter and setter, so you'll be able to quickly rename >>> with something like: >>> >>> seqnameStyle(x) <- "UCSC" >>> >>> or >>> >>> seqnameStyle(vcf) <- seqnameStyle(txdb) <- seqnameStyle(genome) >>> >>> This will work on almost any 'x' object that contains chromosome >>> names (GRanges, GRangesList, GappedAlignments, TranscriptDb, VCF, >>> BSgenome, SNPlocs, etc...) >>> >>> Cheers, >>> H. >>> >>> >>> >>> >>>> Thanks, >>>> >>>> Rebecca >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> >>>> Search the archives: http://news.gmane.org/gmane.** >>>> science.biology.informatics.**conductor<http: news.gmane.org="" gma="" ne.science.biology.informatics.conductor=""> >>>> >>>> >>>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 12.5 years ago Hervé Pagès 16k

Login before adding your answer.