Calculate the expression of gene
1
0
Entering edit mode
cookie ▴ 20
@cookie-7508
Last seen 9.1 years ago
USA

Hi,

I am trying to figure out how to calculate the expression of a gene via GenomicFeatures and IRanges.

My solution to this is based on the idea where I find group the TxDb by genes, then see if how many transcripts are "above" the gene, that is how many are overlapping, within the range of the gene. This includes obtaining the RNA-seq data from UCSC, then making it a IRange or GRange object, and then using the findOverlaps function from GenomicFeatures to see how many hits I get. Is this the right way to do it?

This is the a small code example:

library(IRanges)
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

txdb.test=transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene,by="gene")
gr <- GRanges( seqnames=Rle(c("chr1")),
  ranges=IRanges(start=c(2244435, 3231243, 2142344), end=c(21444351,3331243, 2342344)),)
findOverlaps(gr,txdb.test)


The results then show:
 

> findOverlaps(gr,txdb.test)
Hits of length 251
queryLength: 3
subjectLength: 23459
    queryHits subjectHits 
     <integer>   <integer> 
 1           1         275 
 2           1         481 
 3           1         484 
 4           1         637 
 5           1         885 
 ...       ...         ... 
 247         3         275 
 248         3        3115 
 249         3       14052 
 250         3       17865 
 251         3       20150 

 

Here, we can see that there a 251 hits, and where it was detected. More hits would mean more expression, which means that these genes are more expressed than the others. I hope I am using the right way of thinking here. Am I doing this right?

 

genomicranges iranges • 1.9k views
ADD COMMENT
0
Entering edit mode

Probably you would rather countOverlaps in your own code, or use GenomicRanges::summarizeOverlaps() for a more general purpose tool. As Jim mentions this type of counting operation might be a first step in, e.g., differential expression between groups.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

I think you misunderstand what the TxDb packages are for. These packages simply contain gene information from UCSC (e.g., for each gene, where is the 3' UTR, the exons, the introns, and the 5' UTR).

They don't contain any information about how much a given gene is expressed. All you have done is to count the number of transcripts that are within some gene ranges. This has nothing to do with expression of the gene.

ADD COMMENT

Login before adding your answer.

Traffic: 876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6