I have encountered an unexpected interaction between renameSeqLevels and isActiveSeq when called on a TxDb. Specifically, when I use both renameSeqLevels (or seqlevels directly as a setter function) and isActiveSeq, I am then unable to access transcript information via select().
Here is code to reproduce the behavior:
library (GenomicFeatures) refSeqDb = suppressWarnings (makeTranscriptDbFromUCSC ("hg19", tablename = "refGene")) isActiveSeq (refSeqDb)[seqlevels (refSeqDb)] = FALSE assembledchromosomes = grep ("^chr([0-9]+|[MXY])$", seqlevels (refSeqDb), value = TRUE) isActiveSeq (refSeqDb)[intersect (seqlevels (refSeqDb), assembledchromosomes)] = TRUE head (select (refSeqDb, keys = names (exonsBy (refSeqDb, by = "tx")), columns = "TXNAME", keytype = "TXID"))
This code block returns:
TXID TXNAME 1 1 NR_046018 2 2 NR_036051 3 3 NR_036266 4 4 NR_036267 5 5 NR_036268 6 6 NM_001005484
If I instead invoke renameSeqLevels after creating refSeqDb as follows:
refSeqDb = suppressWarnings (makeTranscriptDbFromUCSC ("hg19", tablename = "refGene")) refSeqDb = renameSeqlevels (refSeqDb, gsub ("^chr", "", seqlevels (refSeqDb))) isActiveSeq (refSeqDb)[seqlevels (refSeqDb)] = FALSE assembledchromosomes = grep ("^([0-9]+|[MXY])$", seqlevels (refSeqDb), value = TRUE) isActiveSeq (refSeqDb)[intersect (seqlevels (refSeqDb), assembledchromosomes)] = TRUE head (select (refSeqDb, keys = names (exonsBy (refSeqDb, by = "tx")), columns = "TXNAME", keytype = "TXID"))
then I get:
[1] TXID TXNAME <0 rows> (or 0-length row.names)
I obtain the same result if I replace the call to renameSeqLevels
with:
seqlevels (refSeqDb) = gsub ("^chr", "", seqlevels (refSeqDb))
My version information is:
> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.5 (Yosemite) locale: [1] C attached base packages: [1] stats4 parallel grDevices utils datasets stats graphics methods base other attached packages: [1] GenomicFeatures_1.20.3 AnnotationDbi_1.30.1 Biobase_2.28.0 GenomicRanges_1.20.5 GenomeInfoDb_1.4.2 IRanges_2.2.7 S4Vectors_0.6.3 [8] BiocGenerics_0.14.0 loaded via a namespace (and not attached): [1] XVector_0.8.0 zlibbioc_1.14.0 GenomicAlignments_1.4.1 BiocParallel_1.2.20 tools_3.2.2 DBI_0.3.1 lambda.r_1.1.7 [8] futile.logger_1.4.1 rtracklayer_1.28.9 futile.options_1.0.0 bitops_1.0-6 RCurl_1.95-4.7 biomaRt_2.24.0 RSQLite_1.0.0 [15] compiler_3.2.2 Biostrings_2.36.3 Rsamtools_1.20.4 XML_3.98-1.3
Hi Robert,
One limitation of
seqlevelsStyle()
at the moment is that it knows how to rename the chromosomes only, but not the scaffolds. This is something that we plan to work on in the future but that won't happen before the next release (scheduled in 1 month).BTW, what you originally reported in this thread is buggy behavior of
select()
after renaming of the chromosomes. There are at least 2 problems, which can simply be reproduced with:First problem. After renaming of the chromosomes,
select()
still returns the original names.Second problem. After renaming of the chromosomes,
select()
breaks if we specify a set of active sequences.So we need to fix that.
Cheers,
H.