Gene ID to Gene Name/Symbol in loom files
1
0
Entering edit mode
@abhishek-singh-4725
Last seen 9 months ago
France

Dear Community,

 

I am reading a single cell file and processing it along with TSNE. Finally generation of loom files.

 

The commands that I am using to read file is :

 

 

loadSCE <- function(path){

  sce <- scater::read10XResults(path)

  #sce <- normalize(sce) # Data normalization based on scran

  mitochondrialGenes <- as.character(rowData(sce)[startsWith(rowData(sce)$symbol, "mt-"),]$id)

  isSpike(sce, "mt") <- rownames(sce) %in% mitochondrialGenes

  sce <- calculateQCMetrics(sce, 

                            feature_controls = list(

                              MT =  isSpike(sce, "mt")

                            ))

}

paths <- list.dirs(path = "/SampleData/TestData/", recursive = FALSE)

for (i in 1:length(paths))

  assign(paste0("sce_",i), loadSCE(paths[i]))

sce=0

for (i in 1:length(paths))

  sce[i]<-print(noquote(paste0("sce_",i)))

t_list <- list()

tlist <- mget(ls(pattern="sce\d+"))

for(i in seq_along(t_list))

{

  metadata(t_list[[i]])["name"] <- paste0("iMates-",i)

}

the output for one of variable is 

> sce_1
class: SingleCellExperiment 
dim: 33694 5586 
metadata(0):
assays(1): counts
rownames(33694): ENSG00000243485 ENSG00000237613 ... ENSG00000277475 ENSG00000268674
rowData names(11): id symbol ... total_counts log10_total_counts
colnames(5586): AAACCTGAGAAGGTTT-1 AAACCTGAGCGTTCCG-1 ... TTTGTCATCGTCTGCT-1 TTTGTCATCGTTGCCT-1
colData names(30): dataset barcode ... pct_counts_MT is_cell_control
reducedDimNames(0):
spikeNames(1): mt

 

I guess the problem is here only. if i could fix it here, i.e., change gene id to gene symbol and save it back to object sce_1 the problem will be solved. Can anyoe help me in fixing this on a boxing day :(

 

Thank you 

Scater LoomR SCopeloomR • 3.1k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

I see you didn't take my previous advice about reading the workflows. You might have found something instructive in Workflow #2 (reads), Section 2.3. It goes without saying that you would need to change org.Mm.eg.db to org.Hs.eg.db:

library(org.Hs.eg.db)
symb <- mapIds(org.Hs.eg.db, keys=rownames(sce_1), keytype="ENSEMBL", column="SYMBOL")

At this point, you might be tempted to assign symb as the row names of sce_1. However, some Ensembl IDs share the same gene symbol, which would result in duplicated row names - not good. To avoid this, we use uniquifyFeatureNames:

library(scater)
rownames(sce_1) <- uniquifyFeatureNames(rownames(sce_1), symb)

You can read the documentation to see exactly what it does, but I believe that the function name is fairly self-explanatory.

P.S. I also see that you didn't take any of my previous advice about your code (https://support.bioconductor.org/p/116056/#116324). I can only hope that these will be resolved in due order.

P.P.S. Consider using LoomExperiment for reading/writing loom files directly to/from Bioconductor data structures.

ADD COMMENT
0
Entering edit mode

Hi Aaron,

The code that I have put in here is an old version. The new version (for new project) has the edits suggested by you.

However, here I am running into trouble with the code. The sce objects work fine (only I get warnings):

> symb <- mapIds(org.Hs.eg.db, keys=rownames(sce_1), keytype="ENSEMBL", column="SYMBOL")
'select()' returned 1:many mapping between keys and columns
> rownames(sce_1) <- uniquifyFeatureNames(rownames(sce_1), symb)

 

But when I put this in a loop for a number of sce objects I get error:

for (i in 1:length(paths)){

symb <- mapIds(org.Hs.eg.db, keys=rownames(sce[i]), keytype="ENSEMBL", column="SYMBOL")

rownames(sce[i]) <- uniquifyFeatureNames(rownames(sce[i]), symb)

}
Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : 
  mapIds must have at least one key to match against.

where sce is a list and contains

sce_1

sce_2

sce_3

 

Please help me in sorting this out.

 

Many thanks in advance.

 

ADD REPLY
0
Entering edit mode

Some of your SCE objects don't have row names, so rownames(sce[i]) returns NULL, leading to the error message.

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6