Four samples were sequenced on the Illumina NovaSeq machine (which is prone to index hopping). I used DropletUtils::swappedDrops to correct this issue.
#An xlsx file with filepath details for every sample
samples.swapped.reads = rio::import(sample.file.map.swapped.reads)
#creating filepath for each sample
h5.matrix.filepath.incl.swapped.reads = paste0(samples.incl.swapped.reads$matrix.files.dir, samples.incl.swapped.reads$sample.dir.name, samples.incl.swapped.reads$h5.filt.matrix.filepath)
cleaned.reads.analysis = swappedDrops(h5.matrix.filepath.incl.swapped.reads, min.frac = 0.9, get.swapped = TRUE, get.diagnostics = TRUE)
for(i in 1:length(samples.incl.swapped.reads$sample.name))
{
import.samples.cleaned.reads[[i]] = CreateSeuratObject(counts = cleaned.reads.analysis$cleaned[[i]], project = names(import.samples.cleaned.reads)[i], min.cells = minimum.cells, min.features = minimum.features)
}
I then merged the import.samples.cleaned.reads objects.
object.list.cleaned.reads$general.filt = merge(x = import.samples.cleaned.reads[[1]], y = import.samples.cleaned.reads[2:length(import.samples.cleaned.reads)], add.cell.ids = samples$sample.name[1:length(samples$sample.name)])
table(object.list.cleaned.reads$general.filt@meta.data$orig.ident)
#output
sh1_condition sh1_normal sh2_condition sh2_normal
7365 5858 5670 9461
When I compare this to the cell count WITHOUT using swappedDrops (using the Read10x function in Seurat and the files within filtered-feature-bc-matrix folder, that is created by Cell Ranger), I receive LESS cells.
sh1_condition sh1_normal sh2_condition sh2_normal
6395 5162 4932 7387
I repeated the same process of using Read10x but this time with the rawfeaturebc_matrix files
sh1_condition sh1_normal sh2_condition sh2_normal
8241 7199 6704 10696
It seems that molecule-info.h5 contains raw counts prior to filtering. The Seurat package recommends using the filtered-feature-bc-matrix files as input (this was implemented in multiple analyses in our group) since it removes barcodes that contain only ambient RNA, but uses the emptyDrops function from this package to include barcodes that contain cells with low RNA expression that are not ambient RNA. I tried using the filtered-feature-bc-matrix.h5 as input for the swappedDrops function but obviously this didn't work.
How can I use the swappedDrops as described (molecule_info.h5 etc...) but still filter as Cell Ranger would, in order to receive similar results to the previous use of the Seurat package (using the filtered-feature-bc-matrix files)?
Please advise
For starters, you could format your post properly. See https://commonmark.org/help/. Some punctuation would also be in order.
Done. Formatted. Punctuated.
Excellent.
I also guess I should say some more things otherwise the system doesn't let me make a comment with just the word "Excellent". So that's that.