Hi, I am new to Single cell analysis but have some experience with NGS data output and manipulation. I have a set of .bam that I assume were the product of the cell ranger pipeline from the ENA (project ID PRJEB36998) unfortunately I have no access to any other output of the pipeline (FASTQ, countmatrix, etc.) but found the reference genome file ( GTF ).
What is the best way to proceed with generating a count matrix from this input using Rsubread package?
The following code using featureCounts
works, but I believe it is improper to use on single cell data?
countsmat<-featureCounts( files = c("LN.bam","blood.bam"), annot.ext = "mm39.ncbiRefSeq.gtf.gz", isGTFAnnotationFile = TRUE )
I then turned to the cellCounts
function but realized I needed an Index file (.bai) to proceed, unfortunately buildindex
requires the FASTQ files.. Rsamtools package indexBam
can take a bam and generate a .bai as shown below, but for some reason I am getting the following error when trying to pass this to cellCounts
. Not sure if I am not formatting the sample incorrectly or I am missing something bigger here.
#generate blood.bam.bai in wd followed by cellCounts
indexBam("blood.bam")
cellCounts(sample = "blood.bam", input.mode = "BAM" , annot.ext = "mm39.ncbiRefSeq.gtf.gz", index = "blood.bam.bai" , isGTFAnnotationFile = TRUE)
outputs
Error in cellCounts(sample = "blood.bam", input.mode = "BAM", annot.ext = "mm39.ncbiRefSeq.gtf.gz", :
Error: index 'C:\Users\cdp\Desktop\LCMV_single cell\blood.bam.bai' is not found
Thanks in advance to anyone who can help me sort this out - I would prefer not to revert bam back to FASTQ then undertake the CellRanger pipeline again from the start.
Session info:
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils
[6] datasets methods base
other attached packages:
[1] Rsubread_2.12.2 Rsamtools_2.14.0
[3] Biostrings_2.66.0 XVector_0.38.0
[5] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[7] IRanges_2.32.0 S4Vectors_0.36.1
[9] BiocGenerics_0.44.0 reshape2_1.4.4
[11] ggplot2_3.4.0 dplyr_1.1.0
[13] Matrix_1.5-3 SeuratObject_4.1.3
[15] Seurat_4.3.0
Adding to Wei's answer, buildindex does not the require the FASTQ files but rather the FASTA file for the same reference genome as used by Cell Ranger. The genome and the FASTA file should be publicly available.