How to get annotations for all genes that overlap a region with annotatePeak
1
0
Entering edit mode
@pegahtaklifi-24293
Last seen 2.7 years ago
Iran

Hello everyone! I am trying to annotate a list of regions as my regions are 502 bp long sometimes they overlap with more than 2 distinct genes , for example chr2:112541661_112542162 overlaps with both POLR1B and LOC105373562 ( the first gene is on + strand and the latter is on - strand). I am using annotatePeak function as follows:

txdb<- TxDb.Hsapiens.UCSC.hg38.knownGene
G.coor<- as_granges(coor , seqnames=seqnames , start=start , end=end)
annotation<- annotatePeak(G.coor , TxDb = txdb ,level = "gene"  , addFlankGeneInfo=TRUE )

and this message appears

>> preparing features information...         2021-07-12 16:38:13 
  1613 genes were dropped because they have exons located on both strands of the same reference sequence or on more
  than one reference sequence, so cannot be represented by a single genomic range.
  Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList object, or use suppressMessages() to
  suppress this message.
>> identifying nearest features...       2021-07-12 16:38:15 
>> calculating distance from peak to TSS...  2021-07-12 16:38:16 
>> assigning genomic annotation...       2021-07-12 16:38:16 
>> adding flank feature information from peaks...    2021-07-12 16:38:48 
>> assigning chromosome lengths          2021-07-12 16:38:49 
>> done...                   2021-07-12 16:38:49 

for my example (chr2:112541661_112542162 ) this only shows POLR1B gene for annotation. how can I get annotations for all genes that overlap with my list of regions ?

also I'm not sure how to set 'single.strand.genes.only=FALSE' as it is not a parameter of annotatePeak

I would appreciate your help.

txdb R annotatation • 2.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

When using the TxDb.Hsapiens.UCSC.hg38.knownGene package, annotatePeak will just do

features <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)

And you will see the message about the genes being dropped. Unfortunately the help page for this function is less than helpful, as it says

Arguments:

    peak: peak file or GRanges object

tssRegion: Region Range of TSS

    TxDb: TxDb or EnsDb annotation object

Which is not entirely correct. You can also pass in a GRanges object as the TxDb argument. So simply doing

TxDb <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only = FALSE)
annotation<- annotatePeak(G.coor , TxDb = TxDb ,level = "gene"  , addFlankGeneInfo=TRUE )

should get you where you want to go.

ADD COMMENT
0
Entering edit mode

Thank you for your response . Unfortunately when I try

annotation<- annotatePeak(G.coor , TxDb = TxDb ,level = "gene"  , addFlankGeneInfo=TRUE )

I get this error :

>> preparing features information...         2021-07-13 16:42:11 
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘genes’ for signature ‘"CompressedGRangesList"’

any idea how to fix this ? btw i used genes function from GenomicFeatures package, is that the same function as you mentioned ?

ADD REPLY
0
Entering edit mode

and when I try to unlist txdb :

txdb<- unlist(txdb)
annotation<- annotatePeak(G.coor , TxDb = TxDb ,level = "gene"  , addFlankGeneInfo=TRUE )

I get this message and the output does not have annotations ( intron , exon , ...)

#
#.. 'TxDb' is a self-defined 'GRanges' object...
#
#.. Some parameters of 'annotatePeak' will be disable,
#.. including:
#.. level, assignGenomicAnnotation, genomicAnnotationPriority,
#.. annoDb, addFlankGeneInfo and flankDistance.
#
#.. Some plotting functions are designed for visualizing genomic annotation
#.. and will not be available for the output object.
#
>> preparing features information...         2021-07-13 16:51:51 
>> identifying nearest features...       2021-07-13 16:51:51 
>> calculating distance from peak to TSS...  2021-07-13 16:51:52 
>> done...
ADD REPLY
0
Entering edit mode

Ah, yes. If you say single.strand.genes.only = FALSE, you get a CompressedGRangesList because of things like

> z <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only = FALSE)
> z[lengths(z) > 3L]
GRangesList object of length 3:
$`100302278`
GRanges object with 4 ranges and 0 metadata columns:
      seqnames              ranges strand
         <Rle>           <IRanges>  <Rle>
  [1]     chr1         30366-30503      +
  [2]     chr9         30144-30281      +
  [3]    chr15 101960459-101960596      -
  [4]    chr19         71973-72110      +
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$`101954271`
GRanges object with 6 ranges and 0 metadata columns:
      seqnames              ranges strand
         <Rle>           <IRanges>  <Rle>
  [1]     chr2 174557966-174558072      -
  [2]     chr3 181231737-181231843      +
  [3]    chr10   13217269-13217375      +
  [4]    chr15   67839939-67840045      -
  [5]    chr19      893484-1021628      +
  [6]     chrX 141118030-141118136      -
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

Where there are genes that occur on different chromosomes. All you have to do is unlist.

> unlist(z)
GRanges object with 27582 ranges and 0 metadata columns:
        seqnames              ranges strand
           <Rle>           <IRanges>  <Rle>
      1    chr19   58345178-58362751      -
     10     chr8   18391282-18401218      +
    100    chr20   44619522-44652233      -
   1000    chr18   27932879-28177946      -
  10000     chr1 243488233-243851079      -
    ...      ...                 ...    ...
   9991     chr9 112217716-112333664      -
   9992    chr21   34364006-34371381      +
   9993    chr22   19036282-19122454      -
   9994     chr6   89829894-89874436      +
   9997    chr22   50523568-50526461      -
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome
ADD REPLY
0
Entering edit mode

Missed your second comment. Yes, using your own GRanges object disables the functionality to query a TxDb object to get the gene parts because you no longer supply a TxDb. You can do one or the other if you want to use the stock functionality.

ADD REPLY
1
Entering edit mode

There is one other option - either run through ChIPseeker:::getGene under the debugger and change the line

features <- genes(TxDb)

to

features <- genes(TxDb, single.strand.genes.only = FALSE)
features <- unlist(features)

and see if that change will work correctly, or if you really care, you could fork the GitHub repo, make changes to include genes on multiple chromosomes, and then send a PR to the authors. Or maybe just file an issue on the repo asking for them to make changes for you.

ADD REPLY
0
Entering edit mode

thank you very much for your response

ADD REPLY

Login before adding your answer.

Traffic: 525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6