transcripts are not true in TxDb.Hsapiens.UCSC.hg38.knownGene
Entering edit mode
BioEpi • 0
Last seen 5 months ago
United States

Hello, I used TxDb.Hsapiens.UCSC.hg38.knownGene/GenomicFeatures to retrieve gene promoters and other genomic features. here is code: library('TxDb.Hsapiens.UCSC.hg38.knownGene')

 txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene

 PR <- promoters(txdb, upstream=2000, downstream=0)

 but when I take a look at the PR results:

enter image description here it is quite weird. As some of promoters does not belong to any genes.

TxDb.Hsapiens.UCSC.hg38.knownGene • 1.0k views
Entering edit mode
Last seen 10 minutes ago
Seattle, WA, United States


The image you included above doesn't show that "some of promoters does not belong to any genes" so doesn't really help your point.

According to the man page (?GenomicFeatures::promoters):

The ‘promoters’ function computes user-defined promoter regions for the transcripts in a TxDb-like object.

Note that this is actually why the names on the returned GRanges object are the transcript names and not the gene ids.

So the only thing that we can count on is that there's going to be a one-to-one relationship between promoters and transcripts:

PR <- promoters(txdb, upstream=2000, downstream=0, columns=c("tx_id", "tx_name", "gene_id"))
# GRanges object with 258145 ranges and 3 metadata columns:
#                             seqnames        ranges strand |     tx_id           tx_name
#                                <Rle>     <IRanges>  <Rle> | <integer>       <character>
#   ENST00000456328.2             chr1    9869-11868      + |         1 ENST00000456328.2
#   ENST00000450305.2             chr1   10010-12009      + |         2 ENST00000450305.2
#   ENST00000473358.1             chr1   27554-29553      + |         3 ENST00000473358.1
#   ENST00000469289.1             chr1   28267-30266      + |         4 ENST00000469289.1
#   ENST00000607096.1             chr1   28366-30365      + |         5 ENST00000607096.1
#                 ...              ...           ...    ... .       ...               ...
#   ENST00000619779.1 chrUn_GL000220v1 153997-155996      + |    258141 ENST00000619779.1
#   ENST00000620265.1 chrUn_KI270442v1 378608-380607      + |    258142 ENST00000620265.1
#   ENST00000611690.1 chrUn_KI270442v1 217402-219401      - |    258143 ENST00000611690.1
#   ENST00000616830.1 chrUn_KI270744v1   51115-53114      - |    258144 ENST00000616830.1
#   ENST00000612925.1 chrUn_KI270750v1 146668-148667      + |    258145 ENST00000612925.1
#                             gene_id
#                     <CharacterList>
#   ENST00000456328.2       100287102
#   ENST00000450305.2       100287102
#   ENST00000473358.1       107985730
#   ENST00000469289.1       107985730
#   ENST00000607096.1       100302278
#                ...             ...
#   ENST00000619779.1                
#   ENST00000620265.1                
#   ENST00000611690.1                
#   ENST00000616830.1                
#   ENST00000612925.1                
#  -------
#  seqinfo: 640 sequences (1 circular) from hg38 genome

Now the fact that some transcripts are not associated with any gene is just the way things are in the GENCODE V38 track from the UCSC folks, which is the track that TxDb.Hsapiens.UCSC.hg38.knownGene is based on.



Entering edit mode

enter image description hereThanks a lot. I also get another issue: I used threeUTRsByTranscript(txdb, use.names=TRUE)) to extract 3'UTR of genes, but the start and end coordinate of the 3'UTR highlighted in blue is the same, which looks weird.

for other genomic features, such as exon, I also get the same problem: enter image description here

Does anyone like to explain this phenomenon? Screen Shot 2021-12-12 at 1.00.48 PM

Entering edit mode

ok ok let's slow down and take the time to discuss the basics:

  • This is a different question. Your original question has been answered. If you're satisfied with the answer, please mark it as an accepted answer. If you're not, then please tell us why here. For any new question, please start a new thread.

  • Show the exact commands you use as well as their output. This should all be done in markdown. No need to upload screenshots that show the cropped version of some random output and with your drawings on top of it.

  • Avoid the use of things like "looks weird" or "explain this phenomenon". This is vague and subjective. Instead describe (with words) exactly and precisely what the problem is.

Looking forward to answer your next question in a different thread.




Login before adding your answer.

Traffic: 766 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6