Quality control in SingleCellExperiment
1
0
Entering edit mode
Linda • 0
@linda-23123
Last seen 19 days ago
United Kingdom

Hi all,

I am new to R and SingleCellExperiment.

I am currently working through the following tutorial (Chapter 6):

https://osca.bioconductor.org/ but it is not working! Here is what I type in the RStudio console:

BiocManager::install("scRNAseq")

library(scRNAseq)

sce.416b <- LunSpikeInData(which="416b")

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
keytype="GENEID", column="SEQNAME")
is.mito <- which(location=="MT")


but I get the following error:

> sce.416b <- LunSpikeInData(which="416b")
snapshotDate(): 2019-10-22
see ?scRNAseq and browseVignettes('scRNAseq') for documentation
see ?scRNAseq and browseVignettes('scRNAseq') for documentation
see ?scRNAseq and browseVignettes('scRNAseq') for documentation
>
> library(AnnotationHub)
> ens.mm.v97 <- AnnotationHub()[["AH73905"]]
snapshotDate(): 2019-10-29
require(“ensembldb”)
Error: failed to load resource
name: AH73905
title: Ensembl 97 EnsDb for Mus musculus
reason: require(“ensembldb”) failed: use BiocManager::install() to install package?
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
there is no package called ‘ensembldb’
> location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
+                    keytype="GENEID", column="SEQNAME")
Error in mapIds(ens.mm.v97, keys = rownames(sce.416b), keytype = "GENEID",  :
could not find function "mapIds"
> is.mito <- which(location=="MT")
Error in which(location == "MT") : object 'location' not found


Can anyone tell me why I am unable to follow this tutorial?

Also, can someone tell me what the heck this code is actually doing:

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
keytype="GENEID", column="SEQNAME")
is.mito <- which(location=="MT")


What is "AH73905" what is "ens.mm.v97" and how exactly is this finding mitochondrial genes in sce.416b?

Any help with this would be really appreciated.

Kind regards, Linford

SingleCellExperiment scater sce quality control • 303 views
0
Entering edit mode

OK so I have followed the error messages and realised I needed to additional packages:

BiocManager::install("ensembldb")


Now it appears to work, but I am getting an error message:

library(AnnotationHub)
> ens.mm.v97 <- AnnotationHub()[["AH73905"]]
snapshotDate(): 2019-10-29
> location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
+                    keytype="GENEID", column="SEQNAME")
Warning message:
Unable to map 563 of 46604 requested IDs.
> is.mito <- which(location=="MT")


Which I assume is not such a big issue?

library(scater)
df <- perCellQCMetrics(sce.416b, subsets=list(Mito=is.mito))
df


However, if someone could explain exactly how this code is identifying mitochondrial transcripts I would be very grateful.

0
Entering edit mode

I am glad you have solved the first problem.

I think you just have a warning message. I think it doesn't matter. This message is common. The reason is that there are 563 IDs that can't map to the reference annotation dataset ens.mm.v97.

The content of location is like that:

## [1] 1 3102016-3102125 +

seqinfo: 61 sequences from GRCm38 genome

# Best wishes!

1
Entering edit mode
Aaron Lun ★ 26k
@alun
Last seen 1 day ago
The city by the bay

Also, can someone tell me what the heck this code is actually doing:

First, breath deeply. And then look at each step.

library(AnnotationHub)


This loads the AnnotationHub package, which provides easy access to various annotation files in the AnnotationHub. This is a central hub for all sorts of things, e.g., gene models for various organisms, peak calls from various epigenomics projects, dbSNP variants, and so on. It would be a pain to embed each of these files into a separate package, so instead we use a one-stop-shop for all of them.

ens.mm.v97 <- AnnotationHub()[["AH73905"]]


This grabs the file corresponding to the mouse Ensembl annotation, version 97. The "AH73905" is just the identifier for this file in the AnnotationHub (AH, get it?). If that seems a bit cryptic, you could just do query(AnnotationHub(), "Ensembl 97") and dig down to "Mus musculus" to get the same number.

location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
keytype="GENEID", column="SEQNAME")


Pretty simple here. We're just taking all the Ensembl IDs in the row names of sce.416b, and we're mapping them to the SEQNAME, i.e., we're getting the chromosome on which each gene is found. The keytype= just specifies that the character vector contains... well, gene IDs. If you had, say, gene symbols, you could replace that with keytype="SYMBOL".

I should note that the latest version of LunSpikeInData will load location information by default, but that hasn't made its way into the book yet. So when that happens, you don't need to do all of this, but it's good to learn in case you want to annotate your own objects.

is.mito <- which(location=="MT")


Nothing much to say here. Is it mitochondrial or not?

Which I assume is not such a big issue?

That's correct. For the sake of simplicity, I didn't bother to find the exact annotation that I used to obtain the counts, and there are slight differences between Ensembl versions (usually around genes that no one cares about, e.g., GmXXXX).

Traffic: 317 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.