10x BAM to count matrix in Rsubread and Rsamtools- cellCounts vs. featureCounts
3
0
Entering edit mode
@cd842ea7
Last seen 20 months ago
United States

Hi, I am new to Single cell analysis but have some experience with NGS data output and manipulation. I have a set of .bam that I assume were the product of the cell ranger pipeline from the ENA (project ID PRJEB36998) unfortunately I have no access to any other output of the pipeline (FASTQ, countmatrix, etc.) but found the reference genome file ( GTF ).

What is the best way to proceed with generating a count matrix from this input using Rsubread package?

The following code using featureCounts works, but I believe it is improper to use on single cell data?

countsmat<-featureCounts( files = c("LN.bam","blood.bam"), annot.ext = "mm39.ncbiRefSeq.gtf.gz", isGTFAnnotationFile = TRUE  )

I then turned to the cellCounts function but realized I needed an Index file (.bai) to proceed, unfortunately buildindex requires the FASTQ files.. Rsamtools package indexBam can take a bam and generate a .bai as shown below, but for some reason I am getting the following error when trying to pass this to cellCounts. Not sure if I am not formatting the sample incorrectly or I am missing something bigger here.

#generate blood.bam.bai in wd followed by cellCounts
indexBam("blood.bam") 
cellCounts(sample = "blood.bam", input.mode = "BAM" , annot.ext = "mm39.ncbiRefSeq.gtf.gz", index = "blood.bam.bai" , isGTFAnnotationFile = TRUE)

outputs

Error in cellCounts(sample = "blood.bam", input.mode = "BAM", annot.ext = "mm39.ncbiRefSeq.gtf.gz", :
Error: index 'C:\Users\cdp\Desktop\LCMV_single cell\blood.bam.bai' is not found

Thanks in advance to anyone who can help me sort this out - I would prefer not to revert bam back to FASTQ then undertake the CellRanger pipeline again from the start.

Session info:

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
 [1] Rsubread_2.12.2      Rsamtools_2.14.0    
 [3] Biostrings_2.66.0    XVector_0.38.0      
 [5] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9 
 [7] IRanges_2.32.0       S4Vectors_0.36.1    
 [9] BiocGenerics_0.44.0  reshape2_1.4.4      
[11] ggplot2_3.4.0        dplyr_1.1.0         
[13] Matrix_1.5-3         SeuratObject_4.1.3  
[15] Seurat_4.3.0        
Rsamtools Rsubread • 2.2k views
ADD COMMENT
1
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 1 hour ago
Germany

featureCounts assumes that one bam is one sample (=one column of the count matrix) while in 10x you have many cells per bam.

The most straightforward way is to use https://support.10xgenomics.com/docs/bamtofastq to convert bam back to fastq and simply run CellRanger again. Takes some time but requires no custom code and is most reliable.

Of course you can try emailing the author for the counts in case they respond.

ADD COMMENT
1
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 11 weeks ago
Australia/Melbourne/Olivia Newton-John …

The index parameter in cellCounts is an index built for the reference genome, not the bam files. See cellCounts help page for more details.

ADD COMMENT
0
Entering edit mode

Adding to Wei's answer, buildindex does not the require the FASTQ files but rather the FASTA file for the same reference genome as used by Cell Ranger. The genome and the FASTA file should be publicly available.

ADD REPLY
0
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 16 hours ago
San Diego

I think you might be able to do this by parsing the tags of the bam, but ATPoint is right, it's simpler to just redo the 10X cellranger run.

ADD COMMENT

Login before adding your answer.

Traffic: 964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6