Run featureCounts in isoform/transcript level in Rsubread
2
1
Entering edit mode
Likai Mao ▴ 40
@likai-mao-9454
Last seen 8.1 years ago
QIMR, Brisbane, Australia

Hi, I need transcript or isoform level counts. I tried gene level first. The result of GAPDH gene looks good:

ENSG00000111640.14 35281 44063 44933 37448 41543 54689 32989 31520 23272 61452 38870 58003 22751 19844 28379 36308 27316 51177 42593 49435 35901 32326 48705 39685 35096 34391 24142 38428 30607 28948 45369 35688 36033 25738 20554 44144

The parameters used in featureCounts function are:

countPrimaryAlignmentsOnly=TRUE, isPairedEnd=TRUE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2

I then tried some parameter combinations of for transcript level. All combinations produced very low (<100) counts for GAPDH except that one trial created high TOTAL counts (part):

ENST00000466525.1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
ENST00000466525.1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
ENST00000466525.1    10987    13962    13813    11035    13000    17130    10222    9887    7574    19108    12917    18216    7412    6201    9120    11259    8610    16360    13844    15615    11169    10326    15290    12717    11104    10663    7754    12000    9672    8880    14516    11399    11810    8286    6541    13904
ENST00000466525.1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
ENST00000396858.5    7    11    11    6    12    5    8    12    8    9    4    10    7    3    5    9    3    14    5    13    5    10    11    7    1    6    11    10    3    12    10    10    9    8    2    10
ENST00000396858.5    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

 

The parameters used are:

useMetaFeatures=F, countPrimaryAlignmentsOnly=TRUE, GTF.attrType='transcript_id', isPairedEnd=TRUE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2

Note that each transcript had multiple lines probably because of the combination of "useMetaFeatures=F" and not using "GTF.featureType='transcript'".

The total counts are still much lower (<1/3) than gene level for GAPDH gene:

Total 11004 13978 13827 11048 13016 17145 10235 9903 7586 19127 12923 18238 7421 6209 9128 11271 8616 16389 13857 15635 11176 10341 15305 12730 11105 10677 7768 12019 9680 8897 14533 11412 11825 8299 6545 13929

Although Rsubread I used (1.18.0) is not the latest version, I tried latest Subread on another data set and produced exactly same results.

What's the proper way to get transcript level counts? Thanks.

The GTF file was downloaded from GENCODE. Gene GAPDH in this GTF has the following lines (part, it has 11 transcripts):

chr12    HAVANA    gene    6533927    6538374    .    +    .    gene_id "ENSG00000111640.14"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; level 2; havana_gene "OTTHUMG00000137379.1";
chr12    HAVANA    transcript    6533927    6538371    .    +    .    gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";
chr12    HAVANA    exon    6533927    6534569    .    +    .    gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 1; exon_id "ENSE00001932108.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";
chr12    HAVANA    exon    6534810    6534861    .    +    .    gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 2; exon_id "ENSE00003562276.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";
chr12    HAVANA    CDS    6534833    6534861    .    +    0    gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 2; exon_id "ENSE00003562276.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";
chr12    HAVANA    start_codon    6534833    6534835    .    +    0    gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 2; exon_id "ENSE00003562276.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";

chr12    HAVANA    exon    6536494    6536593    .    +    .    gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 3; exon_id "ENSE00003571091.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";
chr12    HAVANA    CDS    6536494    6536593    .    +    1    gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 3; exon_id "ENSE00003571091.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";

...

chr12    HAVANA    stop_codon    6538168    6538170    .    +    0    gene_id "ENSG00000111640.14"; transcript_id "ENST00000229239.9"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GAPDH-001"; exon_number 9; exon_id "ENSE00001902446.1"; level 2; protein_id "ENSP00000229239.5"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS8549.1"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268059.1";
...
chr12    HAVANA    transcript    6534512    6535141    .    +    .    gene_id "ENSG00000111640.14"; transcript_id "ENST00000496049.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GAPDH"; transcript_type "retained_intron"; transcript_status "KNOWN"; transcript_name "GAPDH-010"; level 2; transcript_support_level "2"; havana_gene "OTTHUMG00000137379.1"; havana_transcript "OTTHUMT00000268068.1";
...

rsubread • 5.2k views
ADD COMMENT
1
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 9 days ago
Australia/Melbourne/Olivia Newton-John …

The reason why you got very low counts for transcripts is because transcripts from the same gene usually have a large overlap between them and featureCounts does not assign any read that overlaps with more than one transcript in its default setting (although you can set allowMultiOverlap=TRUE to instruct featureCounts to assign such reads to all their overlapping transcripts).

In your parameter setting for counting reads to transcripts (I pasted below), you set useMetaFeatures=FALSE which will instruct featureCounts to count reads to exons. But how did you get counts for transcripts with this parameter setting?

 

useMetaFeatures=F, countPrimaryAlignmentsOnly=TRUE, GTF.attrType='transcript_id', isPairedEnd=TRUE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2

 

ADD COMMENT
0
Entering edit mode

Thanks Wei. I tried allowMultiOverlap=TRUE and it worked! All GAPDH transcripts had high counts.

I set useMetaFeatures=FALSE simply because I wanted to try the effects of this option. But somehow I didn't try allowMultiOverlap.

By the way, I also tried GTF.featureType='transcript' as someone suggested it, which produced slightly higher counts. But I feel I shouldn't include this if I follow gene level quantification that counting only exons. Am I right?

ADD REPLY
1
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 9 days ago
Australia/Melbourne/Olivia Newton-John …

Yes, you should specify GTF.featureType='exon' when you perform gene level quantification.

For your transcript-level counting, although you can get more reads counted by specifying 'allowMultiOverlap=TRUE' you should be aware that any single read can only originate from one transcript but with this option on the counting program now assigns it to all its overlapping transcripts. This may lead to misleading results depending on what you are going to do with these counts.

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 1018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6