I am new to using Bioconductor. I have a SingleCellExperiment object, sce, that contains rownames in SYMBOL format, and rowData in ENSEMBL format. Using TxDb.Hsapiens.UCSC.hg19.knownGene, I wish to find the chromosomal location for each gene (for downstream mitochondrial gene controlling) and store these CDSCHROM values as a new vector within rowData. The code I have tried looks like this this:
location <- mapIds(TxDb.Hsapiens.UCSC.hg19.knownGene, keys=rowData(sce)$ENSEMBL, column="CDSCHROM", keytype=???)
rowData(sce)$CHR <- location
However, I do not understand how to fill in the keytype argument. I see that "ENSEMBL" is not a valid keytype, so how would I go about this problem?
Seeing that "GENEID" is a valid keytype, I thought about doing the following:
geneidSymbols <- mapIds(org.Hs.eg.db, keys=rownames(sce), keytype="SYMBOL", column="GENEID")
rowData(sce)$GENEID <- geneidSymbols
and then using the Gene ID's as my keys in the new code. But "GENEID" is not a valid column type for org.Hs.eg.db, so that did not work either.
I would appreciate any suggestions as I am new to Bioconductor and scRNA-seq in general. Thank you!
You might try looking at Organism.dplyr which combines the TxDb and OrgDb information allowing to query and filter based on data from both.