Entering edit mode
Hi!
I have a set of target peaks that I would like to link to promoter regions from the mm10 genome annotation. I retrieve the promoter regions and subset them to standard chromosomes as follows:
library(GenomicRanges)
library(GenomicFeatures)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
promoters <- promoters(TxDb.Mmusculus.UCSC.mm10.knownGene)
promoters <- promoters[seqnames(promoters) %in% paste0("chr", c(1:19, "X", "Y"))]
This gives me the following GenomicRanges object
GRanges object with 142314 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
ENSMUST00000193812.1 chr1 3071253-3073452 + | 1 ENSMUST00000193812.1
ENSMUST00000082908.1 chr1 3100016-3102215 + | 2 ENSMUST00000082908.1
ENSMUST00000192857.1 chr1 3250757-3252956 + | 3 ENSMUST00000192857.1
ENSMUST00000161581.1 chr1 3464587-3466786 + | 4 ENSMUST00000161581.1
ENSMUST00000192183.1 chr1 3529795-3531994 + | 5 ENSMUST00000192183.1
... ... ... ... . ... ...
ENSMUST00000187582.6 chrY 90667426-90669625 - | 142310 ENSMUST00000187582.6
ENSMUST00000191048.1 chrY 90667426-90669625 - | 142311 ENSMUST00000191048.1
ENSMUST00000238676.1 chrY 90755268-90757467 - | 142312 ENSMUST00000238676.1
ENSMUST00000177893.1 chrY 90754622-90756821 - | 142313 ENSMUST00000177893.1
ENSMUST00000179623.1 chrY 90838978-90841177 - | 142314 ENSMUST00000179623.1
-------
seqinfo: 66 sequences (1 circular) from mm10 genome
I would like to simplify it to obtain just a canonical transcript per gene, and include the gene symbol as an extra column in the metadata.
Could you please help me figure out how to do so? Thank you!!