Search
Question: Genomic features for genomic positions
0
gravatar for Asma rabe
15 months ago by
Asma rabe290
Japan
Asma rabe290 wrote:

I have a list of regions identified as differentially methylated regions using bump hunter in minfi package with the format

chr  start  end ….

chr1

chr2

chr3

….

I want to identify genomic features of these positions (5' UTR, exon, gene body or 3' UTR)

I fought of using findOverlaps and converting txdb to GFF

I converted txdb to GFF

txdbGFF<-asGFF(txdb)

the information about type is gene,mRNA,exon,CDS 

unique(txdbGFF$type)

[1] "gene" "mRNA" "exon" "CDS" 

How can I get more detailed info. like 5' UTR, intergenic,..etc.

 

Thank you

ADD COMMENTlink modified 15 months ago by Vince Schulz60 • written 15 months ago by Asma rabe290

You may want to look at the annotatePeak function in the ChIPseeker package that does this.

 

Vince

ADD REPLYlink written 15 months ago by Vince Schulz60
3
gravatar for Valerie Obenchain
15 months ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

I would recommend putting your 'list of regions' in a GRanges or GRangesList class if they aren't already.

You don't need to convert the TxDb to a GFF. Instead use the functions in GenomicFeatures to extract the regions of interest, then call findOverlaps().See ?transcripts and ?transcriptsBy for a listing of functions that extract regions from a TxDb.

Another option is to use locateVariants() in the VariantAnnotaiton package. That function performs findOverlaps() between your ranges and a TxDb based on the 'regions' argument you provide. In the example below, 'gr' is a GRanges of your positions of interest.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene 

This call identifies ranges in 'gr' that fall in coding regions:

loc_all <- locateVariants(gr,  txdb, CodingVariants())

This identifies ranges in 'gr' that fall in 5 UTR regions:

loc_all <- locateVariants(gr,  txdb, FiveUTRVariants())

See ?locateVariants for more 'region' options.

Valerie

ADD COMMENTlink modified 15 months ago • written 15 months ago by Valerie Obenchain ♦♦ 6.4k

Thank you very much. Great package! I could annotate the regions using

locateVariants

The output has many redundancies. I used ?unique  to remove redundant columns from GRanges object.

 

ADD REPLYlink modified 15 months ago • written 15 months ago by Asma rabe290

The output has one row per "variant-transcript" match so yes, there can be duplicates. The package vignette has an example of summarizing the output from locateVariants() by gene (or other feature) regardless of transcript. Maybe not applicable to your use case this time but useful for the future.

Valerie

ADD REPLYlink written 15 months ago by Valerie Obenchain ♦♦ 6.4k
0
gravatar for Vince Schulz
15 months ago by
Vince Schulz60
United States
Vince Schulz60 wrote:

You may want to look at the annotatePeak function in the ChIPseeker package that does this.

Vince

ADD COMMENTlink written 15 months ago by Vince Schulz60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 113 users visited in the last hour