Genomic features for genomic positions
2
0
Entering edit mode
Asma rabe ▴ 290
@asma-rabe-4697
Last seen 7.7 years ago
Japan

I have a list of regions identified as differentially methylated regions using bump hunter in minfi package with the format

chr  start  end ….

chr1

chr2

chr3

….

I want to identify genomic features of these positions (5' UTR, exon, gene body or 3' UTR)

I fought of using findOverlaps and converting txdb to GFF

I converted txdb to GFF

txdbGFF<-asGFF(txdb)

the information about type is gene,mRNA,exon,CDS 

unique(txdbGFF$type)

[1] "gene" "mRNA" "exon" "CDS" 

How can I get more detailed info. like 5' UTR, intergenic,..etc.

 

Thank you

genomicfeatures • 2.2k views
ADD COMMENT
0
Entering edit mode

You may want to look at the annotatePeak function in the ChIPseeker package that does this.

 

Vince

ADD REPLY
3
Entering edit mode
@valerie-obenchain-4275
Last seen 3.8 years ago
United States

I would recommend putting your 'list of regions' in a GRanges or GRangesList class if they aren't already.

You don't need to convert the TxDb to a GFF. Instead use the functions in GenomicFeatures to extract the regions of interest, then call findOverlaps().See ?transcripts and ?transcriptsBy for a listing of functions that extract regions from a TxDb.

Another option is to use locateVariants() in the VariantAnnotaiton package. That function performs findOverlaps() between your ranges and a TxDb based on the 'regions' argument you provide. In the example below, 'gr' is a GRanges of your positions of interest.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene 

This call identifies ranges in 'gr' that fall in coding regions:

loc_all <- locateVariants(gr,  txdb, CodingVariants())

This identifies ranges in 'gr' that fall in 5 UTR regions:

loc_all <- locateVariants(gr,  txdb, FiveUTRVariants())

See ?locateVariants for more 'region' options.

Valerie

ADD COMMENT
0
Entering edit mode

Thank you very much. Great package! I could annotate the regions using

locateVariants

The output has many redundancies. I used ?unique  to remove redundant columns from GRanges object.

 

ADD REPLY
0
Entering edit mode

The output has one row per "variant-transcript" match so yes, there can be duplicates. The package vignette has an example of summarizing the output from locateVariants() by gene (or other feature) regardless of transcript. Maybe not applicable to your use case this time but useful for the future.

Valerie

ADD REPLY
0
Entering edit mode
Vince Schulz ▴ 160
@vince-schulz-3553
Last seen 12 months ago
United States

You may want to look at the annotatePeak function in the ChIPseeker package that does this.

Vince

ADD COMMENT

Login before adding your answer.

Traffic: 821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6