Question

How to get annotations for all genes that overlap a region with annotatePeak

0

Entering edit mode

pegah.taklifi ▴ 30

@pegahtaklifi-24293

Last seen 3.2 years ago

Iran

Hello everyone! I am trying to annotate a list of regions as my regions are 502 bp long sometimes they overlap with more than 2 distinct genes , for example chr2:112541661_112542162 overlaps with both POLR1B and LOC105373562 ( the first gene is on + strand and the latter is on - strand). I am using annotatePeak function as follows:

txdb<- TxDb.Hsapiens.UCSC.hg38.knownGene
G.coor<- as_granges(coor , seqnames=seqnames , start=start , end=end)
annotation<- annotatePeak(G.coor , TxDb = txdb ,level = "gene"  , addFlankGeneInfo=TRUE )

and this message appears

>> preparing features information...         2021-07-12 16:38:13 
  1613 genes were dropped because they have exons located on both strands of the same reference sequence or on more
  than one reference sequence, so cannot be represented by a single genomic range.
  Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList object, or use suppressMessages() to
  suppress this message.
>> identifying nearest features...       2021-07-12 16:38:15 
>> calculating distance from peak to TSS...  2021-07-12 16:38:16 
>> assigning genomic annotation...       2021-07-12 16:38:16 
>> adding flank feature information from peaks...    2021-07-12 16:38:48 
>> assigning chromosome lengths          2021-07-12 16:38:49 
>> done...                   2021-07-12 16:38:49

for my example (chr2:112541661_112542162 ) this only shows POLR1B gene for annotation. how can I get annotations for all genes that overlap with my list of regions ?

also I'm not sure how to set 'single.strand.genes.only=FALSE' as it is not a parameter of annotatePeak

I would appreciate your help.

txdb R annotatation • 3.2k views

ADD COMMENT • link 4.0 years ago pegah.taklifi ▴ 30

score 0 · Answer 1 · 2021-07-12

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

When using the TxDb.Hsapiens.UCSC.hg38.knownGene package, annotatePeak will just do

features <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)

And you will see the message about the genes being dropped. Unfortunately the help page for this function is less than helpful, as it says

Arguments:

    peak: peak file or GRanges object

tssRegion: Region Range of TSS

    TxDb: TxDb or EnsDb annotation object

Which is not entirely correct. You can also pass in a GRanges object as the TxDb argument. So simply doing

TxDb <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only = FALSE)
annotation<- annotatePeak(G.coor , TxDb = TxDb ,level = "gene"  , addFlankGeneInfo=TRUE )

should get you where you want to go.

ADD COMMENT • link 4.0 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you for your response . Unfortunately when I try

annotation<- annotatePeak(G.coor , TxDb = TxDb ,level = "gene"  , addFlankGeneInfo=TRUE )

I get this error :

>> preparing features information...         2021-07-13 16:42:11 
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘genes’ for signature ‘"CompressedGRangesList"’

any idea how to fix this ? btw i used genes function from GenomicFeatures package, is that the same function as you mentioned ?

ADD REPLY • link 4.0 years ago pegah.taklifi ▴ 30

0

Entering edit mode

and when I try to unlist txdb :

txdb<- unlist(txdb)
annotation<- annotatePeak(G.coor , TxDb = TxDb ,level = "gene"  , addFlankGeneInfo=TRUE )

I get this message and the output does not have annotations ( intron , exon , ...)

#
#.. 'TxDb' is a self-defined 'GRanges' object...
#
#.. Some parameters of 'annotatePeak' will be disable,
#.. including:
#.. level, assignGenomicAnnotation, genomicAnnotationPriority,
#.. annoDb, addFlankGeneInfo and flankDistance.
#
#.. Some plotting functions are designed for visualizing genomic annotation
#.. and will not be available for the output object.
#
>> preparing features information...         2021-07-13 16:51:51 
>> identifying nearest features...       2021-07-13 16:51:51 
>> calculating distance from peak to TSS...  2021-07-13 16:51:52 
>> done...

ADD REPLY • link 4.0 years ago pegah.taklifi ▴ 30

0

Entering edit mode

Ah, yes. If you say single.strand.genes.only = FALSE, you get a CompressedGRangesList because of things like

> z <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only = FALSE)
> z[lengths(z) > 3L]
GRangesList object of length 3:
$`100302278`
GRanges object with 4 ranges and 0 metadata columns:
      seqnames              ranges strand
         <Rle>           <IRanges>  <Rle>
  [1]     chr1         30366-30503      +
  [2]     chr9         30144-30281      +
  [3]    chr15 101960459-101960596      -
  [4]    chr19         71973-72110      +
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$`101954271`
GRanges object with 6 ranges and 0 metadata columns:
      seqnames              ranges strand
         <Rle>           <IRanges>  <Rle>
  [1]     chr2 174557966-174558072      -
  [2]     chr3 181231737-181231843      +
  [3]    chr10   13217269-13217375      +
  [4]    chr15   67839939-67840045      -
  [5]    chr19      893484-1021628      +
  [6]     chrX 141118030-141118136      -
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

Where there are genes that occur on different chromosomes. All you have to do is unlist.

> unlist(z)
GRanges object with 27582 ranges and 0 metadata columns:
        seqnames              ranges strand
           <Rle>           <IRanges>  <Rle>
      1    chr19   58345178-58362751      -
     10     chr8   18391282-18401218      +
    100    chr20   44619522-44652233      -
   1000    chr18   27932879-28177946      -
  10000     chr1 243488233-243851079      -
    ...      ...                 ...    ...
   9991     chr9 112217716-112333664      -
   9992    chr21   34364006-34371381      +
   9993    chr22   19036282-19122454      -
   9994     chr6   89829894-89874436      +
   9997    chr22   50523568-50526461      -
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

ADD REPLY • link 4.0 years ago James W. MacDonald 68k

0

Entering edit mode

Missed your second comment. Yes, using your own GRanges object disables the functionality to query a TxDb object to get the gene parts because you no longer supply a TxDb. You can do one or the other if you want to use the stock functionality.

ADD REPLY • link 4.0 years ago James W. MacDonald 68k

1

Entering edit mode

There is one other option - either run through ChIPseeker:::getGene under the debugger and change the line

features <- genes(TxDb)

to

features <- genes(TxDb, single.strand.genes.only = FALSE)
features <- unlist(features)

and see if that change will work correctly, or if you really care, you could fork the GitHub repo, make changes to include genes on multiple chromosomes, and then send a PR to the authors. Or maybe just file an issue on the repo asking for them to make changes for you.