Quality control in SingleCellExperiment
1
0
Linda
Hi all,

I am new to R and SingleCellExperiment.

I am currently working through the following tutorial (Chapter 6):

https://osca.bioconductor.org/ but it is not working! Here is what I type in the RStudio console:

BiocManager::install("scRNAseq")

library(scRNAseq)

sce.416b <- LunSpikeInData(which="416b")

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
keytype="GENEID", column="SEQNAME")
is.mito <- which(location=="MT")


but I get the following error:

> sce.416b <- LunSpikeInData(which="416b")
snapshotDate(): 2019-10-22
>
> library(AnnotationHub)
> ens.mm.v97 <- AnnotationHub()[["AH73905"]]
snapshotDate(): 2019-10-29
require(“ensembldb”)
Error: failed to load resource
name: AH73905
title: Ensembl 97 EnsDb for Mus musculus
reason: require(“ensembldb”) failed: use BiocManager::install() to install package?
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
there is no package called ‘ensembldb’
> location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
+                    keytype="GENEID", column="SEQNAME")
Error in mapIds(ens.mm.v97, keys = rownames(sce.416b), keytype = "GENEID",  :
could not find function "mapIds"
> is.mito <- which(location=="MT")
Error in which(location == "MT") : object 'location' not found


Can anyone tell me why I am unable to follow this tutorial?

Also, can someone tell me what the heck this code is actually doing:

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
keytype="GENEID", column="SEQNAME")
is.mito <- which(location=="MT")


What is "AH73905" what is "ens.mm.v97" and how exactly is this finding mitochondrial genes in sce.416b?

Any help with this would be really appreciated.

Kind regards, Linford

0
OK so I have followed the error messages and realised I needed to additional packages:

BiocManager::install("ensembldb")


Now it appears to work, but I am getting an error message:

library(AnnotationHub)
> ens.mm.v97 <- AnnotationHub()[["AH73905"]]
snapshotDate(): 2019-10-29
> location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
+                    keytype="GENEID", column="SEQNAME")
Warning message:
Unable to map 563 of 46604 requested IDs.
> is.mito <- which(location=="MT")


Which I assume is not such a big issue?

library(scater)
df <- perCellQCMetrics(sce.416b, subsets=list(Mito=is.mito))
df


However, if someone could explain exactly how this code is identifying mitochondrial transcripts I would be very grateful.

0
I am glad you have solved the first problem.

I think you just have a warning message. I think it doesn't matter. This message is common. The reason is that there are 563 IDs that can't map to the reference annotation dataset ens.mm.v97.

The content of location is like that:

## [1] 1 3102016-3102125 +

seqinfo: 61 sequences from GRCm38 genome

# Best wishes!

1
Aaron Lun ★ 26k
Also, can someone tell me what the heck this code is actually doing:

First, breath deeply. And then look at each step.

library(AnnotationHub)


This loads the AnnotationHub package, which provides easy access to various annotation files in the AnnotationHub. This is a central hub for all sorts of things, e.g., gene models for various organisms, peak calls from various epigenomics projects, dbSNP variants, and so on. It would be a pain to embed each of these files into a separate package, so instead we use a one-stop-shop for all of them.

ens.mm.v97 <- AnnotationHub()[["AH73905"]]


This grabs the file corresponding to the mouse Ensembl annotation, version 97. The "AH73905" is just the identifier for this file in the AnnotationHub (AH, get it?). If that seems a bit cryptic, you could just do query(AnnotationHub(), "Ensembl 97") and dig down to "Mus musculus" to get the same number.

location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
keytype="GENEID", column="SEQNAME")


Pretty simple here. We're just taking all the Ensembl IDs in the row names of sce.416b, and we're mapping them to the SEQNAME, i.e., we're getting the chromosome on which each gene is found. The keytype= just specifies that the character vector contains... well, gene IDs. If you had, say, gene symbols, you could replace that with keytype="SYMBOL".

I should note that the latest version of LunSpikeInData will load location information by default, but that hasn't made its way into the book yet. So when that happens, you don't need to do all of this, but it's good to learn in case you want to annotate your own objects.

is.mito <- which(location=="MT")


Nothing much to say here. Is it mitochondrial or not?

Which I assume is not such a big issue?

That's correct. For the sake of simplicity, I didn't bother to find the exact annotation that I used to obtain the counts, and there are slight differences between Ensembl versions (usually around genes that no one cares about, e.g., GmXXXX).

