Question

Gene ID to Gene Name/Symbol in loom files

0

Entering edit mode

Abhishek Singh ▴ 20

@abhishek-singh-4725

Last seen 9 months ago

France

Dear Community,

I am reading a single cell file and processing it along with TSNE. Finally generation of loom files.

The commands that I am using to read file is :

loadSCE <- function(path){

  sce <- scater::read10XResults(path)

  #sce <- normalize(sce) # Data normalization based on scran

  mitochondrialGenes <- as.character(rowData(sce)[startsWith(rowData(sce)$symbol, "mt-"),]$id)

  isSpike(sce, "mt") <- rownames(sce) %in% mitochondrialGenes

  sce <- calculateQCMetrics(sce, 

                            feature_controls = list(

                              MT =  isSpike(sce, "mt")

                            ))

}

paths <- list.dirs(path = "/SampleData/TestData/", recursive = FALSE)

for (i in 1:length(paths))

  assign(paste0("sce_",i), loadSCE(paths[i]))

sce=0

for (i in 1:length(paths))

  sce[i]<-print(noquote(paste0("sce_",i)))

t_list <- list()

tlist <- mget(ls(pattern="sce\d+"))

for(i in seq_along(t_list))

{

  metadata(t_list[[i]])["name"] <- paste0("iMates-",i)

}

the output for one of variable is

> sce_1
class: SingleCellExperiment 
dim: 33694 5586 
metadata(0):
assays(1): counts
rownames(33694): ENSG00000243485 ENSG00000237613 ... ENSG00000277475 ENSG00000268674
rowData names(11): id symbol ... total_counts log10_total_counts
colnames(5586): AAACCTGAGAAGGTTT-1 AAACCTGAGCGTTCCG-1 ... TTTGTCATCGTCTGCT-1 TTTGTCATCGTTGCCT-1
colData names(30): dataset barcode ... pct_counts_MT is_cell_control
reducedDimNames(0):
spikeNames(1): mt

I guess the problem is here only. if i could fix it here, i.e., change gene id to gene symbol and save it back to object sce_1 the problem will be solved. Can anyoe help me in fixing this on a boxing day :(

Thank you

Scater LoomR SCopeloomR • 3.1k views

ADD COMMENT • link updated 5.9 years ago by Aaron Lun ★ 28k • written 5.9 years ago by Abhishek Singh ▴ 20

score 3 · Answer 1 · 2018-12-26

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 3 hours ago

The city by the bay

I see you didn't take my previous advice about reading the workflows. You might have found something instructive in Workflow #2 (reads), Section 2.3. It goes without saying that you would need to change org.Mm.eg.db to org.Hs.eg.db:

library(org.Hs.eg.db)
symb <- mapIds(org.Hs.eg.db, keys=rownames(sce_1), keytype="ENSEMBL", column="SYMBOL")

At this point, you might be tempted to assign symb as the row names of sce_1. However, some Ensembl IDs share the same gene symbol, which would result in duplicated row names - not good. To avoid this, we use uniquifyFeatureNames:

library(scater)
rownames(sce_1) <- uniquifyFeatureNames(rownames(sce_1), symb)

You can read the documentation to see exactly what it does, but I believe that the function name is fairly self-explanatory.

P.S. I also see that you didn't take any of my previous advice about your code (https://support.bioconductor.org/p/116056/#116324). I can only hope that these will be resolved in due order.

P.P.S. Consider using LoomExperiment for reading/writing loom files directly to/from Bioconductor data structures.

ADD COMMENT • link 5.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

The code that I have put in here is an old version. The new version (for new project) has the edits suggested by you.

However, here I am running into trouble with the code. The sce objects work fine (only I get warnings):

> symb <- mapIds(org.Hs.eg.db, keys=rownames(sce_1), keytype="ENSEMBL", column="SYMBOL")
'select()' returned 1:many mapping between keys and columns

> rownames(sce_1) <- uniquifyFeatureNames(rownames(sce_1), symb)

But when I put this in a loop for a number of sce objects I get error:

for (i in 1:length(paths)){

symb <- mapIds(org.Hs.eg.db, keys=rownames(sce[i]), keytype="ENSEMBL", column="SYMBOL")

rownames(sce[i]) <- uniquifyFeatureNames(rownames(sce[i]), symb)

}

Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : 
  mapIds must have at least one key to match against.

where sce is a list and contains

sce_1

sce_2

sce_3

Please help me in sorting this out.

Many thanks in advance.

ADD REPLY • link 5.9 years ago Abhishek Singh ▴ 20

0

Entering edit mode

Some of your SCE objects don't have row names, so rownames(sce[i]) returns NULL, leading to the error message.

ADD REPLY • link 5.9 years ago Aaron Lun ★ 28k