TPM and FPKM for lncRNA
0
0
Entering edit mode
Ina Hoeschele ▴ 620
@ina-hoeschele-2992
Last seen 3.3 years ago
United States

Hi,

I have total RNA-seq data and would like to compute TPM and FPKM values for a read count matrix generated for lncRNAs using the annotation file gencode.v28.long_noncoding_RNAs.gtf. To compute TPM and FPKM from the counts I first need the lengths of the lncRNA genes, which I obtain by summing their exon lengths accounting for overlaps using the R package GenomicFeatures. Then I compute effective length as gene length minus average fragment length + 1 (average fragment length was average insert length for any sample as provided to me by the sequencing lab). I used the following code:

library(GenomicFeatures)
txdb <- makeTxDbFromGFF("/home/inah/MESA_mRNAseq_New/gencode.v28.long_noncoding_RNAs.gtf", format="gtf",organism="Homo sapiens")
exons.list.per.gene <- exonsBy(txdb,by="gene")
exonic.gene.sizes <- lapply(exons.list.per.gene,function(x){sum(width(reduce(x)))})
exonic.gene.lengths <- as.numeric(exonic.gene.sizes)
names(exonic.gene.lengths) <- names(exonic.gene.sizes)
summary(exonic.gene.lengths)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#     68     499     734    1440    1775  205012

35 of the 15779 exonic.gene.lengths values are less than the average fragment length of 134 (for one sample).  So something must be going wrong here. So the effective gene length would be negative for these 35 lncRNAs. 

Does any GenomicFeatures (or other) expert have an idea of what is going on here? Am I calculating the lengths correctly? Is this maybe a problem with lncRNAs?

Thank you ... Ina

 

 

 

 

GenomicFeatures TPM FPKM lncRNA • 1.7k views
ADD COMMENT
0
Entering edit mode

Update:

I just did the same for protein-coding genes using mRNA-seq data and am also getting gene lengths that are shorter than the fragment lengths, so something may be wrong with my gene length calculation? See below.

library(GenomicFeatures)
txdb <- makeTxDbFromGFF("/home/inah/MESA_mRNAseq_New/Homo_sapiens.GRCh38.85.gtf", format="gtf",organism="Homo sapiens")
exons.list.per.gene <- exonsBy(txdb,by="gene")
exonic.gene.sizes <- lapply(exons.list.per.gene,function(x){sum(width(reduce(x)))})
exonic.gene.lengths <- as.numeric(exonic.gene.sizes)
names(exonic.gene.lengths) <- names(exonic.gene.sizes)
summary(exonic.gene.lengths)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 #     8     407     950    2328    3148  205012
sum(exonic.gene.lengths<134)    # 5889

 

 

 

ADD REPLY

Login before adding your answer.

Traffic: 888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6