Question

featureCounts from Rsubread

0

Entering edit mode

Emiliano • 0

@426be483

Last seen 4 months ago

Italy

Hello,

I am analysing RNAseq data of sequences human samples starting from FASTQ files. I did all the quality controls, trimming ect and aligned my pair-end reds with STAR. I am now trying to obtain a count matrix in SummariedExperiment, and I am counting with featureCounts from Rsubread. I get back a list with counts and annotation. The problem is that in my annotations, Chr, Start and End and Strand have multiple values and are codified as character (I attach a picture to show), and for this reason I cannot generate the GRanges() for RowData as this requires a unique value fro Chr, Start and End. How can I get around this problem? Thank you enter image description here

Rsubread • 332 views

ADD COMMENT • link updated 4 months ago by Gordon Smyth 52k • written 4 months ago by Emiliano • 0

score 0 · Answer 1 · 2024-07-11

Convert the featureCounts output to a DGEList:

library(Rsubread)
library(edgeR)
fc <- featureCounts( ... )
y <- featureCounts2DGEList(fc)

Then y$genes will have columns Chr, Start, End, Strand and Length, all with unique values.

Note that there are some human genes that are both the X and Y chromosomes. If your annotation includes such genes on both chromosomes, then you may need to mask the PAR region of the Y chromosome if you wish to create a unique genomic range for each geneID. You can easily type

table(y$genes$Chr)

to check whether this is an issue for your data.