Question

Error in featureCounts matrix

0

Entering edit mode

ryann • 0

@b1071729

Last seen 4 months ago

Canada

I am doing the analysis for RNASeq on 24 mouse samples (beginner to both coding and RNASeq analysis; I got the mouse GTF file from NCBI). I have attached screenshots of the summary of the featureCounts process as well as the .txt file I received as an output. Most of the columns of the .txt file look normal but there are a few that have blank columns from what looks to be a matrix alignment issue going from .txt to .xlsx in R. Am I missing something in this conversion or is there something wrong with my featureCounts output file? As you can see from the attachment, it did not get rid of the chromosome information and there are count values missing from a couple samples.

# Read text file, I have to specify fill because otherwise I get an error message: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 218 did not have 30 elements
data <- read.table("featurecounts.txt", header = TRUE, fill = TRUE)

# Omit columns 2 to 6
columns_to_keep <- c(1, (7:ncol(data)))
data_subset <- data[, columns_to_keep]

# Write as Excel file
write.xlsx(data_subset, "featurecounts_final.xlsx")

output from R, a featureCounts Excel with matrix issues

output from featureCounts program, the text file

summary file from featureCounts program

GenomeWideAssociation countsimQC RNASeq • 510 views

ADD COMMENT • link updated 5 months ago by Gordon Smyth 51k • written 5 months ago by ryann • 0

score 0 · Answer 1 · 2024-02-26

As a general rule, there is no reason to write out data and then read back into R. After running featureCounts, you can instanciate a DGEList object, and then analyze using edgeR or the limma-voom pipeline. At the end you might want to output the results from topTags or topTable in an Excel workbook, but I find it's better to go straight to Glimma to make interactive MA plots, which are usually more informative than a static Excel workbook.

You might also consider using the internal SAF file for mouse rather than the GTF. NCBI uses things like NC_000067 instead of say, chr1, as the name for chr1.

score 0 · Answer 2 · 2024-02-26

You can download up-to-date Rsubread SAF files for the latest NCBI RefSeq annotation from https://bioinf.wehi.edu.au/Rsubread/annot.

Continuing James MacDonald's comments, the errors you have are not from featureCounts itself but rather from the steps used to convert the output to Excel. You can avoid all the Excel problems by using R code:

library(Rsubread)
library(edgeR)
fc <- featureCounts(...)
y <- featureCounts2DGEList(fc)