I've been used edgeR for differential expression analysis for data generated from the same tissue, but different conditions.
Now I have a RNAseq data A (n=20), and would like to compare them with another RNAseq data B (n=1,000 across different tissues). Since data B is normalized and batch-effect adjusted RPKM value, I need to generate RPKM value for my own data A.
I already had a count table, and would like to use rpkm() in edgeR, but first I have to get a gene length vector. My question is how to count gene length from an "Ensembl.gtf" file by taking into account the following:
1. Gene 1 is much longer than Gene 2 if including both exon and intron. But Gene 1 only has 3 exons, and Gene 2 has 10 exons --> for the transcripts, Gene2>Gene1
2. For the same Gene, there are > 1 transcript isoforms. In different tissues, different transcript isoforms will be expressed.