Question

Run featureCounts in isoform/transcript level in Rsubread

1

Entering edit mode

Likai Mao ▴ 40

@likai-mao-9454

Last seen 8.1 years ago

QIMR, Brisbane, Australia

Hi, I need transcript or isoform level counts. I tried gene level first. The result of GAPDH gene looks good:

ENSG00000111640.14 35281 44063 44933 37448 41543 54689 32989 31520 23272 61452 38870 58003 22751 19844 28379 36308 27316 51177 42593 49435 35901 32326 48705 39685 35096 34391 24142 38428 30607 28948 45369 35688 36033 25738 20554 44144

The parameters used in featureCounts function are:

countPrimaryAlignmentsOnly=TRUE, isPairedEnd=TRUE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2

I then tried some parameter combinations of for transcript level. All combinations produced very low (<100) counts for GAPDH except that one trial created high TOTAL counts (part):

ENST00000466525.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ENST00000466525.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ENST00000466525.1 10987 13962 13813 11035 13000 17130 10222 9887 7574 19108 12917 18216 7412 6201 9120 11259 8610 16360 13844 15615 11169 10326 15290 12717 11104 10663 7754 12000 9672 8880 14516 11399 11810 8286 6541 13904 ENST00000466525.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ENST00000396858.5 7 11 11 6 12 5 8 12 8 9 4 10 7 3 5 9 3 14 5 13 5 10 11 7 1 6 11 10 3 12 10 10 9 8 2 10 ENST00000396858.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The parameters used are:

useMetaFeatures=F, countPrimaryAlignmentsOnly=TRUE, GTF.attrType='transcript_id', isPairedEnd=TRUE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2

Note that each transcript had multiple lines probably because of the combination of "useMetaFeatures=F" and not using "GTF.featureType='transcript'".

The total counts are still much lower (<1/3) than gene level for GAPDH gene:

Total 11004 13978 13827 11048 13016 17145 10235 9903 7586 19127 12923 18238 7421 6209 9128 11271 8616 16389 13857 15635 11176 10341 15305 12730 11105 10677 7768 12019 9680 8897 14533 11412 11825 8299 6545 13929

Although Rsubread I used (1.18.0) is not the latest version, I tried latest Subread on another data set and produced exactly same results.

What's the proper way to get transcript level counts? Thanks.

The GTF file was downloaded from GENCODE. Gene GAPDH in this GTF has the following lines (part, it has 11 transcripts):

chr12 HAVANA gene 6533927 6538374 . + . gene_id "ENSG00000111640.14"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; level 2; havana_gene "OTTHUMG00000137379.1"; chr12 HAVANA transcript 6533927 6538371 . + . gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1"; chr12 HAVANA exon 6533927 6534569 . + . gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 1; exon_id "ENSE00001932108.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1"; chr12 HAVANA exon 6534810 6534861 . + . gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 2; exon_id "ENSE00003562276.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1"; chr12 HAVANA CDS 6534833 6534861 . + 0 gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 2; exon_id "ENSE00003562276.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1"; chr12 HAVANA start_codon 6534833 6534835 . + 0 gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 2; exon_id "ENSE00003562276.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";

chr12 HAVANA exon 6536494 6536593 . + . gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 3; exon_id "ENSE00003571091.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1"; chr12 HAVANA CDS 6536494 6536593 . + 1 gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 3; exon_id "ENSE00003571091.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";

...

chr12 HAVANA stop_codon 6538168 6538170 . + 0 gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 9; exon_id "ENSE00001902446.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1"; ... chr12 HAVANA transcript 6534512 6535141 . + . gene_id "ENSG00000111640.14"; transcript_id "ENST00000496049.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "retained_intron"; transcript_status "KNOWN"; transcript_name "GAPDH-010"; level 2; transcript_support_level "2"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268068.1"; ...

rsubread • 5.2k views

ADD COMMENT • link updated 8.1 years ago by Wei Shi ★ 3.6k • written 8.2 years ago by Likai Mao ▴ 40

score 1 · Answer 1 · 2016-02-29

The reason why you got very low counts for transcripts is because transcripts from the same gene usually have a large overlap between them and featureCounts does not assign any read that overlaps with more than one transcript in its default setting (although you can set allowMultiOverlap=TRUE to instruct featureCounts to assign such reads to all their overlapping transcripts).

In your parameter setting for counting reads to transcripts (I pasted below), you set useMetaFeatures=FALSE which will instruct featureCounts to count reads to exons. But how did you get counts for transcripts with this parameter setting?

useMetaFeatures=F, countPrimaryAlignmentsOnly=TRUE, GTF.attrType='transcript_id', isPairedEnd=TRUE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2

score 1 · Answer 2 · 2016-03-01

Yes, you should specify GTF.featureType='exon' when you perform gene level quantification.

For your transcript-level counting, although you can get more reads counted by specifying 'allowMultiOverlap=TRUE' you should be aware that any single read can only originate from one transcript but with this option on the counting program now assigns it to all its overlapping transcripts. This may lead to misleading results depending on what you are going to do with these counts.