Entering edit mode
After I successfully adjusted the chromosomes of VCF and TranscriptDb
objects to match the names of the BSgenome object, I ran
predictCoding(),
it returned nothing except a warning message, see below,
> coding <- predictCoding(vcf, txdb, seqSource=Athaliana)
*Warning message:
In .setActiveSubjectSeq(query, subject) :
circular sequence(s) in query 'chrM' ignored*
> coding
GRanges with 0 ranges and 1 metadata column:
seqnames ranges strand | GENEID
<rle> <iranges> <rle> | <character>
---
seqlengths:
Here are the values of VCF, TranscriptDb and BSgenome Object,
> vcf
class: VCF
dim: 551201 2
genome: TAIR10
exptData(1): header
fixed(4): REF ALT QUAL FILTER
info(18): DP DP4 ... PR VDB
geno(6): GT GQ ... SP PL
rownames(551201): Chr1:711 Chr1:956 ... ChrM:366919 ChrM:366920
*Qustion:
After used renameSeqlevels() to change chr name of vcf (Chr to chr),
the
rownames here are still old chr name. not sure if this will be the
problem
for predictCoding(). *
rowData values names(1): paramRangeID
colnames(2): sample1_aln.sorted.bam sample2_aln.sorted.bam
colData names(1): Samples
> txdb
TranscriptDb object:
| Db type: TranscriptDb
| Supporting package: GenomicFeatures
| Data source: TAIR 10
| Genus and Species: Arabodpsis
| miRBase build ID: NA
| transcript_nrow: 35386
| exon_nrow: 207465
| cds_nrow: 197160
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2012-09-28 16:43:28 -0700 (Fri, 28 Sep 2012)
| GenomicFeatures version at creation time: 1.9.37
| RSQLite version at creation time: 0.11.1
| DBSCHEMAVERSION: 1.0
> Athaliana
Arabidopsis genome
|
| organism: Arabidopsis thaliana (Arabidopsis)
| provider: TAIR
| provider version: 04232008
| release date: NA
| release name: dumped from ADB: Mar/14/08
|
| sequences (see '?seqnames'):
| chr1 chr2 chr3 chr4 chr5 chrC chrM
|
| (use the '$' or '[[' operator to access a given sequence)
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Athaliana.TAIR.04232008_1.3.18
BSgenome_1.26.0
VariantAnnotation_1.4.0
[4] Rsamtools_1.10.1
Biostrings_2.26.0
GenomicFeatures_1.10.0
[7] AnnotationDbi_1.20.0
Biobase_2.18.0
GenomicRanges_1.10.0
[10] IRanges_1.16.2
BiocGenerics_0.4.0
vimcom_0.9-2
[13] setwidth_1.0-0
colorout_0.9-9
loaded via a namespace (and not attached):
[1] DBI_0.2-5 RCurl_1.95-0.1.2 RSQLite_0.11.2
XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.1
[7] parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1
tools_2.15.1 zlibbioc_1.4.0
any ideas? thank you.
Rebecca
On Thu, Oct 4, 2012 at 1:18 PM, Hervé Pagès <hpages@fhcrc.org> wrote:
> Hi Rebecca,
>
>
> On 10/04/2012 12:10 PM, sun wrote:
>
>> Hi All,
>>
>> I am going to use "coding <- predictCoding(vcf, txdb,
>> seqSource=Athaliana)"
>> to detect coding SNPs. The problem is that the chromosome names are
not
>> consistent among VCF, txdb and BSgenome. In vcf, the chromosome
name is
>> "Chr*", in txdb, the chr name is "Chr", but in BSgenome, the chr
name is
>> "chr*" .
>>
>> I know I can use renameSeqlevels() to adjust the seqlevels
(chromosome
>> names) of the VCF object to match that of the txdb annotation. But
how can
>> I adjust the chr name of BSgenome or TranscriptDB?
>>
>
> In BioC 2.11 (released yesterday), you can rename the chromosomes of
a
> TranscriptDb object, so you could rename the chromosomes of your
> VCF and TranscriptDb objects to match the names of the BSgenome
object.
>
> E.g. for the TranscriptDb object:
>
> seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb))
>
> Note that renaming the chromosomes of a TranscriptDb object is a new
> feature and is not fully implemented yet. For example, if you use
> select() on the object you'll still get the original names (those
> stored in the db), and if you try to specify a chromosome name thru
> the 'vals' arg of the transcripts(), exons() and cds() extractors,
> you still need to use the original names. This will be addressed
soon.
>
> Our plan is to also support renaming of the chromosomes of BSgenome
> and SNPlocs objects very soon.
>
> Also, an additional level of convenience will be provided via the
> seqnameStyle() getter and setter, so you'll be able to quickly
rename
> with something like:
>
> seqnameStyle(x) <- "UCSC"
>
> or
>
> seqnameStyle(vcf) <- seqnameStyle(txdb) <- seqnameStyle(genome)
>
> This will work on almost any 'x' object that contains chromosome
> names (GRanges, GRangesList, GappedAlignments, TranscriptDb, VCF,
> BSgenome, SNPlocs, etc...)
>
> Cheers,
> H.
>
>
>
>
>> Thanks,
>>
>> Rebecca
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor="">
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages@fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]