2
0
Entering edit mode
am39 • 0
@am39-10874
Last seen 15 months ago

I'm using the featureCounts function of Rsubread to assign aligned reads to features. Something like ~20% of my reads are unassigned (in the "NoFeature") category. Is there any way to see which reads these are? (Maybe it's possible in command-line subread, outside of R?) I'd like to be able to look at my unassigned reads to understand if they're contamination or from an un-annotated part of the genome or what.

Thank you!

2
Entering edit mode
Wei Shi ★ 3.3k
@wei-shi-2183
Last seen 1 day ago
Australia/Melbourne/Olivia Newton-John …

The reportReads parameter allows you to output counting results for each read and then you can identify those unassigned reads.

0
Entering edit mode

Thank you - that's what I needed!

0
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia

The NoFeature reads are from un-annotated parts of the genome. They can't be "contamination" because then they wouldn't be aligned to the genome and hence wouldn't be counted by featureCounts in the first place.

The R version of Rsubread has the same functionality as the command-line.

0
Entering edit mode

Right, sorry I wasn't thinking clearly.

Nevertheless, I'd like to take a look at what those unsassigned reads are (i.e., which un-annotated regions of the genome they are from) - is there anywhere in the output I can find those unassigned reads? I don't see it in the R documentation of the featureCounts function.

0
Entering edit mode

I'd say that you can have human DNA contamination and those reads would align to the human genome, typically to intergenic regions (outside the boundaries of annotated genes) and, to a lesser extent, to intronic regions.