How to add Ensembl ids after Pseudobulk analysis by DESeq2
Entering edit mode
Sara • 0
Last seen 9 days ago

Hi all,

I used DESeq2 to do Pseudobulk analysis on my Seurat object. I have a problem converting gene names to Ensembl IDs. My row names are, some with ENSG, some with gene names. I want to have Ensembl IDs and chromosome names as well. Here is the part of my DESeq code for Pseudobulk analysis:

dds <- DESeqDataSetFromMatrix(countData = counts_bcell,
                              colData = colData,
                              design = ~Age+Sex+condition)

keep <- rowSums(counts(dds)) >=10
dds <- dds[keep,]

colData(dds)$condition <- relevel(colData(dds)$condition, ref = "Control")

#run DESeq2
dds <- DESeq(dds, test = "LRT", reduced = ~Age+Sex)

#check the coefficients for the comparison

#Generate result object
res <- results(dds, name = "condition_Patient_vs_Control")
mapped <- data.frame(GeneName = rownames(res),
                     ensemblID = mapIds(, keys =rownames(res), keytype = "SYMBOL", column="ENSEMBL"))

res$ensembl_gene_id <- mapped$ensemblID

If we look at mapped it looks like as below for the gene names with ENSG I don't get any ensemblID.

> mapped
                       GeneName       ensemblID
ENSG00000238009 ENSG00000238009            <NA>
ENSG00000241860 ENSG00000241860            <NA>
ENSG00000290385 ENSG00000290385            <NA>
ENSG00000291215 ENSG00000291215            <NA>
ENSG00000229905 ENSG00000229905            <NA>
LINC01409             LINC01409            <NA>
ENSG00000290784 ENSG00000290784            <NA>
FAM87B                   FAM87B ENSG00000177757
LINC00115             LINC00115            <NA>

Any suggestions, please, or a better way to add ensemblID and chromosome name and biotype?

I appreciate your help. Many thanks!

enter link description here

Seurat DESeq2 single-cell scRNA Pseudobulk • 269 views
Entering edit mode

After checking ?DESeqDataSetFromMatrix note this information:

arguments provided to SummarizedExperiment including rowRanges and metadata. Note that for Bioconductor 3.1, rowRanges must be a GRanges or GRangesList, with potential metadata columns as a DataFrame accessed and stored with mcols. If a user wants to store metadata columns about the rows of the countData, but does not have GRanges or GRangesList information, first construct the DESeqDataSet without rowRanges and then add the DataFrame with mcols(dds).

Key is to realize that metadata should be a DataFrame (that is) accessed and stored with mcols.

So I would add all annotation information to your dds object, so right after running DESeqDataSetFromMatrix.... (thus before filtering) you can add annotation data through: mcols(dds) <- ...

See Mikes' post here for more on this: DESeq2: add annotations (from data frame) to DESeqDataSet

Since you are working with an ensembl-based dataset, I would make use of an EnsDb. These can be obtained through the AnnotationHub infrastructure, and an EnsDb can even be stored for offline use.

This post of mine may get you started with that: EnsDb.Rnorvegicus for Rnor6

Entering edit mode
BioinfGuru ▴ 30
Last seen 2 hours ago

I would use biomaRt.

This is what happens when annotation occurs to early. If possible I would run the DEG analysis on the original output names of the alignment and quantification software used. For example if ensembl annotations were used, then all sample rows would have ensemble IDs.


Login before adding your answer.

Traffic: 803 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6