featureCounts returning a matrix of counts full of 0...
2
1
Entering edit mode
Raito92 ▴ 50
@raito92-20399
Last seen 13 months ago
Italy

Hello, I'm using the RNASeqGeneEdgeRQL workflow to process some RNASeq data. I've tried to create the required matrix of counts through the featureCounts function from Rsubread, by using an external reference genome. That's the code I used.

fastqPath <- list.files("JustThePathofMyFastQFolder", pattern="\\.fastq$", full=TRUE) all.bam <- sub("\\.fastq$", ".bam", fastqPath)

fc <- featureCounts(all.bam, annot.ext="ThePathofMyGtfFile", isGTFAnnotationFile=TRUE, nthreads=16, GTF.featureType="gene", GTF.attrType="gene_id", allowMultiOverlap=TRUE, fraction=TRUE, isPairedEnd=FALSE)


I previously edited my GTF file (after converting a .GFF3 file with the package rtracklayer) so that only the genes are included, because otherwise featureCounts would return an error. I changed 'ID' with 'gene_id' and only kept the lines referring to genes features, as follows:

The old file: https://i.ibb.co/ThQrThC/File-old.png

The new file: https://i.ibb.co/ykz9rpF/File.png

The previous .bam files were successfully created, but the counting step seems to fail.

While running the featureCount script, that's exactly what happens:

It says it can assign successfully 0% of the alignments and has a dramatically low running time:

Then, when I try to visualize the output, by typing fc, that's what runs on my screen:

What may the problem be? Thanks a lot!

Here is my sessioninfo() :

https://i.ibb.co/K9YPXx8/Session-Info-Empty.png

featureCounts rnaseq Rsubread • 1.1k views
1
Entering edit mode

I have removed the RNASeqGeneEdgeRQL tag from your question, and I've also edited the title, because what you are doing has no overlap with the RNASeqGeneEdgeRQL workflow.

2
Entering edit mode
Aspie96 ▴ 60
@aspie96-20441
Last seen 2.3 years ago

This problem can occour when the chromosome names are not the same in the BAM files and in the GFF/GTF file. This is for instance possible if they come from different sources.

If you are unable to get them from the same source, you should check the chrAliases option. However, setting it may not be trivial at all.

1
Entering edit mode

Hello, thanks for your suggestion! The source was actually the problem, since both my GTF and .FA file (for creating the genome index) were from a private database. Using the official releases of the files from NCBI actually made everything work properly!

2
Entering edit mode
Wei Shi ★ 3.3k
@wei-shi-2183
Last seen 1 day ago
Australia/Melbourne/Olivia Newton-John …

First you don't need to modify your GFF file in order to run featureCounts. The command below should work for your original GFF file.

fc <- featureCounts(all.bam, annot.ext="ThePathofMyGtfFile", isGTFAnnotationFile=TRUE, nthreads=16, GTF.featureType="gene", GTF.attrType="ID", allowMultiOverlap=TRUE, fraction=TRUE, isPairedEnd=FALSE)


You do not need to remove any features from your GFF file either because featureCounts automatically extracts those features that are specified by the parameters.

Since your data is RNAseq data, shouldn't you specify GTF.featureType="exon"? The feature type gene represents the whole gene body including both exons and introns, but your reads were only generated from exons.

Second as @Aspie96 pointed above, there might be a mismatch in chromosome name between your annotation and reference genome used for mapping. On the other hand, as shown in featureCounts counting summary (fc\$stat`) most of your reads were not mapped and I am not sure if it is the low mappability that resulted in zero counts.

0
Entering edit mode

Hello, thanks for your answer! I guessed there was some redundancy in the code, but I wanted to specify the options as well, just to be sure. You're right about the GTF.featureType option, but I just wanted to start with a peculiar feature. I'm not entirely sure about the mismatch but for sure it's better, if available, to always take files from the same source.