I wish to extract a TxFeatures object for the MYC gene. I've successfully used this code for another gene (CD19). I've read the Genomic Ranges and TxDB vignettes through but missing the subtleties...
Perhaps I do not understand the clearly `TxFeatures` object? Help is greatly appreciated!
## make a EnsDb class object library(EnsDb.Hsapiens.v86) edb <- EnsDb.Hsapiens.v86 ### Option 1 GRanges objects derived from ensembldb MYC <- genes(edb, filter = list(GenenameFilter("MYC"), GeneBiotypeFilter("protein_coding"))) seqlevels(MYC) > MYC GRanges object with 1 range and 6 metadata columns: seqnames ranges strand | gene_id gene_name entrezid <Rle> <IRanges> <Rle> | <character> <character> <character> ENSG00000136997 8 [127735434, 127741434] + | ENSG00000136997 MYC 4609 gene_biotype seq_coord_system symbol <character> <character> <character> ENSG00000136997 protein_coding chromosome MYC ------- seqinfo: 1 sequence from GRCh38 genome
## Option 2: construct GRange object by hand myc <- GRanges( seqnames = Rle(8, 1), ranges = IRanges(start=127735434, end=127741434, names=1), strand = Rle(strand("+"), 1) ) > myc GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> 1 8 [127735434, 127741434] + ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths
## make a TxDb object ##### library(SGSeq) hsaEnsembl <- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl") seqlevels(hsaEnsembl) <- "8" ## extract TxFeatures from TxFeatures subset txf <- convertToTxFeatures(hsaEnsembl)
Apparently nothing overlapped?
> txf_MYC <- txf[txf %over% MYC] > txf_MYC TxFeatures object with 0 ranges and 0 metadata columns: seqnames ranges strand type txName geneName <Rle> <IRanges> <Rle> <factor> <CharacterList> <CharacterList> ------- seqinfo: 555 sequences (1 circular) from an unspecified genome > txf_myc <- txf[txf %over% myc] > txf_myc TxFeatures object with 0 ranges and 0 metadata columns: seqnames ranges strand type txName geneName <Rle> <IRanges> <Rle> <factor> <CharacterList> <CharacterList> ------- seqinfo: 555 sequences (1 circular) from an unspecified genome
If I test with another GRanges for another gene, it works fine.
bcl6 <- GRanges( seqnames=Rle(3,1), ranges=IRanges(start=187721377, end=187745727,names=1), strand=Rle(strand("-"),1) ) > bcl6 GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> 1 3 [187721377, 187745727] - ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths ## Resetting seqlevels my TxDb object > seqlevels(hsaEnsembl) <- seqlevels0(hsaEnsembl) > txf <- convertToTxFeatures(hsaEnsembl) > txf_bcl6 <- txf[txf %over% bcl6] > txf_bcl6 TxFeatures object with 44 ranges and 0 metadata columns: seqnames ranges strand type txName geneName <Rle> <IRanges> <Rle> <factor> <CharacterList> <CharacterList> [1] 3 [187721377, 187722601] - L ENST00000621333,ENST00000406870 ENSG00000113916 [2] 3 [187721387, 187722601] - L ENST00000419510 ENSG00000113916 [3] 3 [187722398, 187722601] - L ENST00000232014 ENSG00000113916 [4] 3 [187722432, 187722601] - L ENST00000450123 ENSG00000113916 [5] 3 [187722601, 187724941] - J ENST00000621333,ENST00000406870,ENST00000419510,... ENSG00000113916 ... ... ... ... ... ... ... [40] 3 [187745410, 187745437] - F ENST00000480458 ENSG00000113916 [41] 3 [187745410, 187745459] - F ENST00000496823 ENSG00000113916 [42] 3 [187745410, 187745472] - F ENST00000470319 ENSG00000113916 [43] 3 [187745410, 187745725] - F ENST00000621333 ENSG00000113916 [44] 3 [187745410, 187745727] - F ENST00000406870 ENSG00000113916 ------- seqinfo: 1 sequence from an unspecified genome