Question

Compining DESeq2 output with biomart query to display DE genes rather than DE ensembl IDs

0

Entering edit mode

Linda • 0

@linda-23123

Last seen 4.6 years ago

United Kingdom

Sorry, this is probably a very basic question...

I have the following output from DESeq2:

> head(res)
log2 fold change (MLE): Prep r322 vs hp1829 
Wald test p-value: Prep r322 vs hp1829 
DataFrame with 6 rows and 6 columns
                         baseMean      log2FoldChange             lfcSE                 stat               pvalue               padj
                        <numeric>           <numeric>         <numeric>            <numeric>            <numeric>          <numeric>
ENSG00000223972 0.592269283427995    2.45738005468305  6.19618011312043    0.396595968777528    0.691665425740757                 NA
ENSG00000227232  419.788972276199   0.338770334899125 0.382205077243893    0.886357495148993    0.375424915927214  0.725257780579341
ENSG00000238009 0.550554023354792 -0.0363069764959028  5.25663366608541 -0.00690688733554843     0.99448914504777                 NA
ENSG00000237683  17.2262590129968    3.85233823390427  1.14138300025536     3.37514947484097 0.000737756082603317 0.0565243632615917
ENSG00000268903  1.54832852250678    2.00990324663306  2.15702435077862    0.931794416649747    0.351442780657832                 NA
ENSG00000239906 0.475776197736371 -0.0362637105559594  6.76583280846614 -0.00535982954095205    0.995723495236492                 NA

And for these ensembl IDs I have got the gene name via a biomaRt query:

> head(G_list)
  ensembl_gene_id hgnc_symbol ensembl_gene_id_version
1 ENSG00000007923     DNAJC11      ENSG00000007923.11
2 ENSG00000008128      CDK11A      ENSG00000008128.18
3 ENSG00000008130        NADK      ENSG00000008130.11
4 ENSG00000009724       MASP2      ENSG00000009724.12
5 ENSG00000011021       CLCN6      ENSG00000011021.17
6 ENSG00000028137    TNFRSF1B      ENSG00000028137.12

How do I now alter res so that the results of my DE analysis display the gene name (as given in G_list) rather than the ensembl ID? I have tried to use merge but it isn't working.

z <- merge(res,G_list,by.x=rownames(res),by.y="ensembl_gene_id")
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns

DESeq2 • 1.4k views

ADD COMMENT • link updated 5.0 years ago by James W. MacDonald 68k • written 5.0 years ago by Linda • 0

score 0 · Answer 1 · 2020-11-30

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 2 days ago

San Diego

You can't modify res like that, but you can make a copy of it that is a true data table, and you can merge on that;

df <- as.data.frame(res)

ADD COMMENT • link 5.0 years ago swbarnes2 ★ 1.4k

score 0 · Answer 2 · 2020-11-30

While merge can be useful, it's often easier to just use match directly. As a super-fake example:

> z <- DataFrame(baseMean = rnorm(26), logFC = rnorm(26), row.names = LETTERS)
> z
DataFrame with 26 rows and 2 columns
     baseMean      logFC
    <numeric>  <numeric>
A    0.704597  -0.398978
B    0.515007  -1.640074
C    0.290537  -0.522406
D   -1.862162  -0.195471
E    2.178323  -0.890097
...       ...        ...
V    0.212373  1.4935439
W   -0.827351 -0.0752781
X   -0.136847 -1.0763688
Y    0.384423 -1.0034517
Z    0.324602  1.9117110
> newdata <- data.frame(newID = LETTERS[sample(1:26, 15)], symbol = letters[1:15])
> newdata
   newID symbol
1      S      a
2      F      b
3      L      c
4      J      d
5      E      e
6      O      f
7      R      g
8      B      h
9      A      i
10     P      j
11     K      k
12     I      l
13     U      m
14     H      n
15     C      o
> z$IDS <- NA
> z$IDS[match(newdata$newID, row.names(z))] <- newdata$symbol
> z
DataFrame with 26 rows and 3 columns
     baseMean      logFC         IDS
    <numeric>  <numeric> <character>
A    0.704597  -0.398978           i
B    0.515007  -1.640074           h
C    0.290537  -0.522406           o
D   -1.862162  -0.195471          NA
E    2.178323  -0.890097           e
...       ...        ...         ...
V    0.212373  1.4935439          NA
W   -0.827351 -0.0752781          NA
X   -0.136847 -1.0763688          NA
Y    0.384423 -1.0034517          NA
Z    0.324602  1.9117110          NA