Question

Can Rsubread output use Ensembl gene IDs

0

Entering edit mode

agustin.gonvi ▴ 20

@agustingonvi-20284

Last seen 2.3 years ago

Cleveland, OH

I am using Rsubread to analyze human FASTQ files from the ENA. I downloaded a GRCh38 FASTA from ENCODE to create and index. I used the featureCounts(bam.files, annot.inbuilt = "hg38") function to count mapped reads for genomic features. Everything seems to work fine, but the count matrices are annotated to Entrez IDs, and I want the output using Ensembl IDs. Is there a way to do that?

Thanks.

Ensembl Rsubread • 1.1k views

ADD COMMENT • link updated 2.3 years ago by Gordon Smyth 51k • written 2.3 years ago by agustin.gonvi ▴ 20

score 2 · Answer 1 · 2022-03-28

The Rsubread inbuilt annotation is RefSeq rather than Ensembl for the reasons explained here:

https://www.biorxiv.org/content/10.1101/2021.01.07.425794v1

But featureCounts works with any annotation. If you want Ensembl IDs, then input the Ensembl GTF file to featureCounts using annot.ext instead of annot.inbuilt. See help("featureCounts").

Note that this is not merely a matter of annotating the count matrix. If you use different gene annotation, then the whole count matrix will change so as to correspond to the genes and exons in the new annotation file.