Hi Kasper,
Yes instantiating a separate Views on each chromosome would be tedious. I just added the BSgenomeViews class to BSgenome 1.35.6 (this is in BioC devel) so now you can do:
library(BSgenome.Mmusculus.UCSC.mm10)
genome <- BSgenome.Mmusculus.UCSC.mm10
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
ex <- exons(txdb, columns=c("tx_name", "gene_id"))
v <- Views(genome, ex)
Then:
> v
BSgenomeViews object with 254827 views and 2 metadata columns:
seqnames ranges strand dna |
<Rle> <IRanges> <Rle> <DNAStringSet> |
[1] chr1 [4807893, 4807982] + [GCACTGTCCG...CACCGCCGCG] |
[2] chr1 [4808455, 4808486] + [GTTATTTTCC...GAGATACAGG] |
[3] chr1 [4828584, 4828649] + [GCATGGATGG...GTCCACATGC] |
[4] chr1 [4830268, 4830315] + [CCCTGTGATG...TGCCTTCTTG] |
[5] chr1 [4832311, 4832381] + [GTTTGATATC...GCAGAAACCG] |
... ... ... ... ... ...
[254823] chrY [90531602, 90531635] - [GCGGTTGGGA...TAGATGAGCT] |
[254824] chrY [90603501, 90603569] - [GTTTGAAGTC...CTCATCATCG] |
[254825] chrY [90604098, 90604282] - [ACCCAGGAGA...ACTGCATGAG] |
[254826] chrY [90605010, 90605191] - [AGACATCCTG...TGATGAAGGA] |
[254827] chrY [90605679, 90605864] - [GGCCTTGCTC...TCTCCAGAAG] |
tx_name gene_id
<CharacterList> <CharacterList>
[1] uc007afg.1,uc007afh.1 18777
[2] uc007afg.1,uc007afh.1 18777
[3] uc007afg.1,uc007afh.1 18777
[4] uc007afg.1,uc007afh.1 18777
[5] uc007afg.1,uc007afh.1 18777
... ... ...
[254823] uc029ygu.1 NA
[254824] uc029ygv.1 NA
[254825] uc029ygv.1 NA
[254826] uc029ygv.1 NA
[254827] uc029ygv.1 NA
-------
seqinfo: 66 sequences (1 circular) from mm10 genome
This is just a BSgenome object and a GRanges object bundle together. You can get them back with:
subject(v)
granges(v)
You can use [[
to extract a given sequence:
> v[[2]]
32-letter "DNAString" instance
seq: GTTATTTTCCTTCACGGATTGGGAGATACAGG
or coerce to DNAStringSet to extract them all:
as(v, "DNAStringSet") # might take a while
Some of the most frequently used DNAStringSet methods work on v
:
> alphabetFrequency(v, collapse=TRUE)
A C G T M R W S
21951343 20431781 20686000 21448065 0 0 0 0
Y K V H D B N -
0 0 0 0 0 0 2 0
+ .
0 0
More could be added...
Will add a man page for this new class. Let me know on the bioc-devel mailing list how it goes for you.
H.