Question

10x BAM to count matrix in Rsubread and Rsamtools- cellCounts vs. featureCounts

0

Entering edit mode

Christopher • 0

@cd842ea7

Last seen 2.8 years ago

United States

Hi, I am new to Single cell analysis but have some experience with NGS data output and manipulation. I have a set of .bam that I assume were the product of the cell ranger pipeline from the ENA (project ID PRJEB36998) unfortunately I have no access to any other output of the pipeline (FASTQ, countmatrix, etc.) but found the reference genome file ( GTF ).

What is the best way to proceed with generating a count matrix from this input using Rsubread package?

The following code using featureCounts works, but I believe it is improper to use on single cell data?

countsmat<-featureCounts( files = c("LN.bam","blood.bam"), annot.ext = "mm39.ncbiRefSeq.gtf.gz", isGTFAnnotationFile = TRUE  )

I then turned to the cellCounts function but realized I needed an Index file (.bai) to proceed, unfortunately buildindex requires the FASTQ files.. Rsamtools package indexBam can take a bam and generate a .bai as shown below, but for some reason I am getting the following error when trying to pass this to cellCounts. Not sure if I am not formatting the sample incorrectly or I am missing something bigger here.

#generate blood.bam.bai in wd followed by cellCounts
indexBam("blood.bam") 
cellCounts(sample = "blood.bam", input.mode = "BAM" , annot.ext = "mm39.ncbiRefSeq.gtf.gz", index = "blood.bam.bai" , isGTFAnnotationFile = TRUE)

outputs

Error in cellCounts(sample = "blood.bam", input.mode = "BAM", annot.ext = "mm39.ncbiRefSeq.gtf.gz", :
Error: index 'C:\Users\cdp\Desktop\LCMV_single cell\blood.bam.bai' is not found

Thanks in advance to anyone who can help me sort this out - I would prefer not to revert bam back to FASTQ then undertake the CellRanger pipeline again from the start.

Session info:

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
 [1] Rsubread_2.12.2      Rsamtools_2.14.0    
 [3] Biostrings_2.66.0    XVector_0.38.0      
 [5] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9 
 [7] IRanges_2.32.0       S4Vectors_0.36.1    
 [9] BiocGenerics_0.44.0  reshape2_1.4.4      
[11] ggplot2_3.4.0        dplyr_1.1.0         
[13] Matrix_1.5-3         SeuratObject_4.1.3  
[15] Seurat_4.3.0

Rsamtools Rsubread • 3.2k views

ADD COMMENT • link updated 2.8 years ago by Gordon Smyth 53k • written 2.8 years ago by Christopher • 0

score 1 · Answer 1 · 2023-02-16

featureCounts assumes that one bam is one sample (=one column of the count matrix) while in 10x you have many cells per bam.

The most straightforward way is to use https://support.10xgenomics.com/docs/bamtofastq to convert bam back to fastq and simply run CellRanger again. Takes some time but requires no custom code and is most reliable.

Of course you can try emailing the author for the counts in case they respond.

score 1 · Answer 2 · 2023-02-16

1

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 4 months ago

Australia/Melbourne

The index parameter in cellCounts is an index built for the reference genome, not the bam files. See cellCounts help page for more details.

ADD COMMENT • link 2.8 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Adding to Wei's answer, buildindex does not the require the FASTQ files but rather the FASTA file for the same reference genome as used by Cell Ranger. The genome and the FASTA file should be publicly available.

ADD REPLY • link 2.8 years ago Gordon Smyth 53k

score 0 · Answer 3 · 2023-02-16

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 27 minutes ago

San Diego

I think you might be able to do this by parsing the tags of the bam, but ATPoint is right, it's simpler to just redo the 10X cellranger run.

ADD COMMENT • link 2.8 years ago swbarnes2 ★ 1.4k