Question: TPM and FPKM for lncRNA
0
gravatar for Ina Hoeschele
17 months ago by
Ina Hoeschele610
United States
Ina Hoeschele610 wrote:

Hi,

I have total RNA-seq data and would like to compute TPM and FPKM values for a read count matrix generated for lncRNAs using the annotation file gencode.v28.long_noncoding_RNAs.gtf. To compute TPM and FPKM from the counts I first need the lengths of the lncRNA genes, which I obtain by summing their exon lengths accounting for overlaps using the R package GenomicFeatures. Then I compute effective length as gene length minus average fragment length + 1 (average fragment length was average insert length for any sample as provided to me by the sequencing lab). I used the following code:

library(GenomicFeatures)
txdb <- makeTxDbFromGFF("/home/inah/MESA_mRNAseq_New/gencode.v28.long_noncoding_RNAs.gtf", format="gtf",organism="Homo sapiens")
exons.list.per.gene <- exonsBy(txdb,by="gene")
exonic.gene.sizes <- lapply(exons.list.per.gene,function(x){sum(width(reduce(x)))})
exonic.gene.lengths <- as.numeric(exonic.gene.sizes)
names(exonic.gene.lengths) <- names(exonic.gene.sizes)
summary(exonic.gene.lengths)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#     68     499     734    1440    1775  205012

35 of the 15779 exonic.gene.lengths values are less than the average fragment length of 134 (for one sample).  So something must be going wrong here. So the effective gene length would be negative for these 35 lncRNAs. 

Does any GenomicFeatures (or other) expert have an idea of what is going on here? Am I calculating the lengths correctly? Is this maybe a problem with lncRNAs?

Thank you ... Ina

 

 

 

 

genomicfeatures lncrna fpkm tpm • 391 views
ADD COMMENTlink written 17 months ago by Ina Hoeschele610

Update:

I just did the same for protein-coding genes using mRNA-seq data and am also getting gene lengths that are shorter than the fragment lengths, so something may be wrong with my gene length calculation? See below.

library(GenomicFeatures)
txdb <- makeTxDbFromGFF("/home/inah/MESA_mRNAseq_New/Homo_sapiens.GRCh38.85.gtf", format="gtf",organism="Homo sapiens")
exons.list.per.gene <- exonsBy(txdb,by="gene")
exonic.gene.sizes <- lapply(exons.list.per.gene,function(x){sum(width(reduce(x)))})
exonic.gene.lengths <- as.numeric(exonic.gene.sizes)
names(exonic.gene.lengths) <- names(exonic.gene.sizes)
summary(exonic.gene.lengths)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 #     8     407     950    2328    3148  205012
sum(exonic.gene.lengths<134)    # 5889

 

 

 

ADD REPLYlink written 17 months ago by Ina Hoeschele610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 177 users visited in the last hour