Entering edit mode
connor.geraghty
•
0
@connorgeraghty-16065
Last seen 6.1 years ago
Hello,
I have been working my way through learning Rsubread, and I am stuck on the featureCounts() command. The data maps to the reference genome above 90% for these 5 test samples:
NumMapped PropMapped 1 20005679 0.936811 2 15387615 0.909452 3 17158660 0.915605 4 16636690 0.955880 5 16891823 0.951170
But, when I use featureCounts:
fc_PE <- featureCounts(bam.files, annot.inbuilt="mm10", isPairedEnd=TRUE, nthreads = 14)
it returns 0.0% successfully assigned fragments.
NCBI RefSeq annotation for mm10 (build 38.1) is used. ========== _____ _ _ ____ _____ ______ _____ ===== / ____| | | | _ \| __ \| ____| /\ | __ \ ===== | (___ | | | | |_) | |__) | |__ / \ | | | | ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | | ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| | ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/ Rsubread 1.28.1 //========================== featureCounts setting ===========================\\ || || || Input files : 5 BAM files || || P /users/PAS1346/osu9792/testrna/CGL1607_R1. ... || || P /users/PAS1346/osu9792/testrna/CGL1608_R1. ... || || P /users/PAS1346/osu9792/testrna/CGL1609_R1. ... || || P /users/PAS1346/osu9792/testrna/CGL1610_R1. ... || || P /users/PAS1346/osu9792/testrna/CGL1612_R1. ... || || || || Dir for temp files : . || || Threads : 14 || || Level : meta-feature level || || Paired-end : yes || || Strand specific : no || || Multimapping reads : not counted || || Multi-overlapping reads : not counted || || Min overlapping bases : 1 || || || || Chimeric reads : counted || || Both ends mapped : not required || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file /users/PAS1346/osu9792/R/x86_64-pc-linux-gnu-libr ... || || Features : 222996 || || Meta-features : 27179 || || Chromosomes/contigs : 43 || || || || Process BAM file /users/PAS1346/osu9792/testrna/CGL1607_R1.fastq.gz.su ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || Total fragments : 21355089 || || Successfully assigned fragments : 6919 (0.0%) || || Running time : 0.08 minutes || || || || Process BAM file /users/PAS1346/osu9792/testrna/CGL1608_R1.fastq.gz.su ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || Total fragments : 16919664 || || Successfully assigned fragments : 6175 (0.0%) || || Running time : 0.06 minutes || || || || Process BAM file /users/PAS1346/osu9792/testrna/CGL1609_R1.fastq.gz.su ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || Total fragments : 18740233 || || Successfully assigned fragments : 8017 (0.0%) || || Running time : 0.07 minutes || || || || Process BAM file /users/PAS1346/osu9792/testrna/CGL1610_R1.fastq.gz.su ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || Total fragments : 17404589 || || Successfully assigned fragments : 5861 (0.0%) || || Running time : 0.06 minutes || || || || Process BAM file /users/PAS1346/osu9792/testrna/CGL1612_R1.fastq.gz.su ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || Total fragments : 17758996 || || Successfully assigned fragments : 5128 (0.0%) || || Running time : 0.06 minutes || || || || Read assignment finished. || || || \\===================== http://subread.sourceforge.net/ ======================//
Do you have any ideas as to why the features aren't producing meanigful counts? Could I have done something upstream that would have compromised the featureCounts() ability?
Any help is greatly appreciated!
Connor
Could you show the counting summary by issuing the following command? This will tell you the reasons why most reads were not counted.
fc_PE$stat
Here is the output from fc_PE$stat:
All samples follow this same pattern with most unassigned due to being either unmapped or no features.
Did you map your reads to mm10? Have you checked if the chr names in your reference genome match those in the annotation?
Yes, the reads were mapped to mm10 (GRCm38.p6 to be exact). What is the best way to check if chr names match between reference genome and annotation?
Here is the line of code to build the reference index based on the download from NCBI Genome:
buildindex(basename="ref_index",reference="GCF_000001635.26_GRCm38.p6_genomic.fna", memory = 48000).
By default would the chromosome names be different between the built index from the NCBI, and the inbuilt mm10 annotation?
The inbuilt annotation was built based on NCBI RefSeq annotation so chr names should match between your mapped reads and the inbuilt annotation. What kind of sequencing data you are analyzing?
These are colon tissue RNA samples from mouse proximal colon. So, some of the transcripts could probably be microbial, but the majority should be mouse.
Your analysis seems to be fine. You probably need to dig deeply into your reads to see where your reads are mapped to. Artifact/contamination in your sequencing or sample prep might cause an issue like this.