Question: featureCounts returning a matrix of counts full of 0...
1
gravatar for Raito92
6 months ago by
Raito9240
Italy
Raito9240 wrote:

Hello, I'm using the RNASeqGeneEdgeRQL workflow to process some RNASeq data. I've tried to create the required matrix of counts through the featureCounts function from Rsubread, by using an external reference genome. That's the code I used.

fastqPath <- list.files("JustThePathofMyFastQFolder", pattern="\\.fastq$", full=TRUE)
all.bam <- sub("\\.fastq$", ".bam", fastqPath)

fc <- featureCounts(all.bam, annot.ext="ThePathofMyGtfFile", isGTFAnnotationFile=TRUE, nthreads=16, GTF.featureType="gene", GTF.attrType="gene_id", allowMultiOverlap=TRUE, fraction=TRUE, isPairedEnd=FALSE`)

I previously edited my GTF file (after converting a .GFF3 file with the package rtracklayer) so that only the genes are included, because otherwise featureCounts would return an error. I changed 'ID' with 'gene_id' and only kept the lines referring to genes features, as follows:

The old file: https://i.ibb.co/ThQrThC/File-old.png

The new file: https://i.ibb.co/ykz9rpF/File.png

The previous .bam files were successfully created, but the counting step seems to fail.

While running the featureCount script, that's exactly what happens:

It says it can assign successfully 0% of the alignments and has a dramatically low running time:

enter image description here

Then, when I try to visualize the output, by typing fc, that's what runs on my screen:

enter image description here enter image description here enter image description here enter image description here

enter image description here

What may the problem be? Thanks a lot!

Here is my sessioninfo() :

https://i.ibb.co/K9YPXx8/Session-Info-Empty.png

rnaseq rsubread featurecounts • 256 views
ADD COMMENTlink modified 6 months ago by Gordon Smyth39k • written 6 months ago by Raito9240
1

I have removed the RNASeqGeneEdgeRQL tag from your question, and I've also edited the title, because what you are doing has no overlap with the RNASeqGeneEdgeRQL workflow.

ADD REPLYlink modified 6 months ago • written 6 months ago by Gordon Smyth39k
Answer: featureCounts from RnaSeqGeneEdgeRQL returning a matrix of counts full of 0...
2
gravatar for Aspie96
6 months ago by
Aspie9660
Aspie9660 wrote:

This problem can occour when the chromosome names are not the same in the BAM files and in the GFF/GTF file. This is for instance possible if they come from different sources.

If you are unable to get them from the same source, you should check the chrAliases option. However, setting it may not be trivial at all.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Aspie9660
1

Hello, thanks for your suggestion! The source was actually the problem, since both my GTF and .FA file (for creating the genome index) were from a private database. Using the official releases of the files from NCBI actually made everything work properly!

ADD REPLYlink written 6 months ago by Raito9240
Answer: featureCounts from RnaSeqGeneEdgeRQL returning a matrix of counts full of 0...
2
gravatar for Wei Shi
6 months ago by
Wei Shi3.2k
Australia
Wei Shi3.2k wrote:

First you don't need to modify your GFF file in order to run featureCounts. The command below should work for your original GFF file.

fc <- featureCounts(all.bam, annot.ext="ThePathofMyGtfFile", isGTFAnnotationFile=TRUE, nthreads=16, GTF.featureType="gene", GTF.attrType="ID", allowMultiOverlap=TRUE, fraction=TRUE, isPairedEnd=FALSE)

You do not need to remove any features from your GFF file either because featureCounts automatically extracts those features that are specified by the parameters.

Since your data is RNAseq data, shouldn't you specify GTF.featureType="exon"? The feature type gene represents the whole gene body including both exons and introns, but your reads were only generated from exons.

Second as @Aspie96 pointed above, there might be a mismatch in chromosome name between your annotation and reference genome used for mapping. On the other hand, as shown in featureCounts counting summary (fc$stat) most of your reads were not mapped and I am not sure if it is the low mappability that resulted in zero counts.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Wei Shi3.2k

Hello, thanks for your answer! I guessed there was some redundancy in the code, but I wanted to specify the options as well, just to be sure. You're right about the GTF.featureType option, but I just wanted to start with a peculiar feature. I'm not entirely sure about the mismatch but for sure it's better, if available, to always take files from the same source.

ADD REPLYlink written 6 months ago by Raito9240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 258 users visited in the last hour