Question

How to add Ensembl ids after Pseudobulk analysis by DESeq2

0

Entering edit mode

Sara • 0

@95b4edca

Last seen 14 hours ago

Belgium

Hi all,

I used DESeq2 to do Pseudobulk analysis on my Seurat object. I have a problem converting gene names to Ensembl IDs. My row names are, some with ENSG, some with gene names. I want to have Ensembl IDs and chromosome names as well. Here is the part of my DESeq code for Pseudobulk analysis:

dds <- DESeqDataSetFromMatrix(countData = counts_bcell,
                              colData = colData,
                              design = ~Age+Sex+condition)

#filter
keep <- rowSums(counts(dds)) >=10
dds <- dds[keep,]

colData(dds)$condition <- relevel(colData(dds)$condition, ref = "Control")

#run DESeq2
dds <- DESeq(dds, test = "LRT", reduced = ~Age+Sex)


#check the coefficients for the comparison
resultsNames(dds)

#Generate result object
res <- results(dds, name = "condition_Patient_vs_Control")

mapped <- data.frame(GeneName = rownames(res),
                     ensemblID = mapIds(org.Hs.eg.db, keys =rownames(res), keytype = "SYMBOL", column="ENSEMBL"))

res$ensembl_gene_id <- mapped$ensemblID

If we look at mapped it looks like as below for the gene names with ENSG I don't get any ensemblID.

> mapped
                       GeneName       ensemblID
ENSG00000238009 ENSG00000238009            <NA>
ENSG00000241860 ENSG00000241860            <NA>
ENSG00000290385 ENSG00000290385            <NA>
ENSG00000291215 ENSG00000291215            <NA>
ENSG00000229905 ENSG00000229905            <NA>
LINC01409             LINC01409            <NA>
ENSG00000290784 ENSG00000290784            <NA>
FAM87B                   FAM87B ENSG00000177757
LINC00115             LINC00115            <NA>

Any suggestions, please, or a better way to add ensemblID and chromosome name and biotype?

I appreciate your help. Many thanks!

enter link description here

Seurat DESeq2 single-cell scRNA Pseudobulk • 218 views

ADD COMMENT • link updated 20 days ago by Guido Hooiveld ★ 4.0k • written 22 days ago by Sara • 0

0

Entering edit mode

After checking ?DESeqDataSetFromMatrix note this information:

arguments provided to SummarizedExperiment including rowRanges and metadata. Note that for Bioconductor 3.1, rowRanges must be a GRanges or GRangesList, with potential metadata columns as a DataFrame accessed and stored with mcols. If a user wants to store metadata columns about the rows of the countData, but does not have GRanges or GRangesList information, first construct the DESeqDataSet without rowRanges and then add the DataFrame with mcols(dds).

Key is to realize that metadata should be a DataFrame (that is) accessed and stored with mcols.

So I would add all annotation information to your dds object, so right after running DESeqDataSetFromMatrix.... (thus before filtering) you can add annotation data through: mcols(dds) <- ...

See Mikes' post here for more on this: DESeq2: add annotations (from data frame) to DESeqDataSet

Since you are working with an ensembl-based dataset, I would make use of an EnsDb. These can be obtained through the AnnotationHub infrastructure, and an EnsDb can even be stored for offline use.

This post of mine may get you started with that: EnsDb.Rnorvegicus for Rnor6

ADD REPLY • link 20 days ago Guido Hooiveld ★ 4.0k

score 0 · Answer 1 · 2024-05-26

0

Entering edit mode

BioinfGuru ▴ 30

@yagalbi-11519

Last seen 4 days ago

Ireland

I would use biomaRt.

This is what happens when annotation occurs to early. If possible I would run the DEG analysis on the original output names of the alignment and quantification software used. For example if ensembl annotations were used, then all sample rows would have ensemble IDs.

ADD COMMENT • link 21 days ago BioinfGuru ▴ 30