Question: How to construct a subset of TxDb object from TxDb.Hsapiens.UCSC.hg38.knownGene and selected genes
0
6 months ago by
Ruichao.Zeng0 wrote:

Hi all, I just wish to construct a subset of TxDb object based on the TxDb.Hsapiens.UCSC.hg38.knownGene and my special genes. Thus, does anyone know how to do it?

Many thanks

modified 6 months ago • written 6 months ago by Ruichao.Zeng0

What exactly are you trying to do? Can you not subset the output of the TxDb rather than trying to subset the TxDb itself?

For example, I have a target genes list, so I only want to construct a subset of TxDb for those target genes list rather than apply the entire TxDb.Hsapiens.UCSC.hg38.knownGene for the following analysis. Specifically, filtering the target genes corresponding information from TxDb.Hsapiens.UCSC.hg38.knownGene and then generate a new subset of TxDb.

That doesn't tell me anything more than your original post. I understand you want a subset, but not why, or how you plan on using it and why it's not just as easy or easier to extract everything and then subset after the fact.

If I just want a set of genes, it's easy enough to subset a GRanges or GRangesList

> gnsIlike <- sample(keys(TxDb.Hsapiens.UCSC.hg38.knownGene), 500)
> genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only = FALSE)[gnsIlike,]
GRangesList object of length 500:
$90589 GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr19 12140217-12156731 -$4074
GRanges object with 1 range and 0 metadata columns:
seqnames          ranges strand
[1]    chr12 8940363-8949955      -

$79800 GRanges object with 1 range and 0 metadata columns: seqnames ranges strand [1] chr2 202912218-202986485 + ... <497 more elements> ------- seqinfo: 455 sequences (1 circular) from hg38 genome > transcriptsBy(TxDb.Hsapiens.UCSC.hg38.knownGene)[gnsIlike] GRangesList object of length 500:$90589
GRanges object with 3 ranges and 2 metadata columns:
seqnames            ranges strand |     tx_id     tx_name
<Rle>         <IRanges>  <Rle> | <integer> <character>
[1]    chr19 12140217-12156731      - |    162058  uc060txu.1
[2]    chr19 12144890-12156714      - |    162060  uc060txw.1
[3]    chr19 12145265-12156729      - |    162061  uc010dyo.3

\$4074
GRanges object with 13 ranges and 2 metadata columns:
seqnames          ranges strand |  tx_id    tx_name
[1]    chr12 8940363-8943420      - | 108458 uc058kyv.1
[2]    chr12 8940363-8949636      - | 108459 uc058kyw.1
[3]    chr12 8940364-8949761      - | 108460 uc021quk.2
[4]    chr12 8940365-8949955      - | 108461 uc001qvf.4
[5]    chr12 8940751-8942517      - | 108462 uc058kyx.1
...      ...             ...    ... .    ...        ...
[9]    chr12 8943296-8945479      - | 108466 uc058kzb.1
[10]    chr12 8943825-8949735      - | 108467 uc058kzc.1
[11]    chr12 8945237-8949610      - | 108468 uc058kzd.1
[12]    chr12 8945433-8949645      - | 108469 uc058kze.1
[13]    chr12 8945485-8946848      - | 108470 uc058kzf.1

...
<498 more elements>
-------
seqinfo: 455 sequences (1 circular) from hg38 genome


Is that not simpler than going through all the work to get a subsetted TxDb?

Apology for my poor description. Specifically, I try to apply annotatePeak function from ChIPseeker package to call peaks. Then, this function requires a TxDb argument that should be exactly TxDb object. Apparently, the TxDb.Hsapiens.UCSC.hg38.knownGene meets this. However, due to I just have a small list of special genes, so I wish to construct a subset of TxDb object based on the TxDb.Hsapiens.UCSC.hg38.knownGene and my target gene, and then run the annotatePeak function, which can save much more time compared to apply the entire TxDb.Hsapiens.UCSC.hg38.knownGene. Hope this can help you understand what I exactly wish to do.

You won't save any time by doing that. It will take way more time to generate the subsetted TxDb package than it would take to simply run annotatePeak on the whole thing, even if you have to run annotatePeak multiple times. Don't try to save pennies by spending dollars.