Search
Question: Adding gene names (or symbols) in my DESeq result
0
gravatar for ytlin610
3 months ago by
ytlin61010
ytlin61010 wrote:

Hi, I'm working on the DE analysis of my RNA-seq data from the green algae Chlamydomonas, and I'm able to generate a normal DE result by DESeq2 like this:

  baseMean log2FoldChange lfcSE stat pvalue padj
  <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
Cre01.g000450.v5.5 256.1055 -0.2995 0.2954 -1.0140 0.3106 0.7465
Cre01.g000500.v5.5 44.3266 -0.7029 0.3880 -1.8114 0.0701 0.3764
Cre01.g000600.v5.5 2.3502 1.5752 1.8795 0.8381 0.4020 0.8108
Cre01.g000650.v5.5 5.7842 1.3050 0.8817 1.4802 0.1388 0.5241
Cre01.g000850.v5.5 4.7789 -0.0103 0.7810 -0.0132 0.9895 0.9999
... ... ... ... ... ... ...
Cre36.g759647.v5.5 10.3085 0.2125 1.1183 0.1900 0.8493 0.9771
Cre39.g760097.v5.5 2.7385 0.8043 1.6105 0.4994 0.6175 0.9069
Cre43.g760547.v5.5 2.9478 -2.4908 1.6740 -1.4879 0.1368 0.5233
Cre44.g760747.v5.5 633.6948 -0.0325 0.2354 -0.1380 0.8902 0.9846
Cre48.g761197.v5.5 5.6491 -0.3471 1.0296 -0.3371 0.7360 0.9423

 

I've also downloaded a text file of gene symbol and transcript ID from JGI (https://phytozome.jgi.doe.gov/pz/portal.html):

Cre01.g000050.t1.1 RWP14
Cre01.g000150.t1.2 ZRT2
Cre01.g000650.t1.1 AMX2
Cre01.g000850.t1.2 CPLD38
Cre01.g000900.t1.2 CPLD20
Cre01.g001400.t1.1 ZMP1
Cre01.g001750.t1.2 TIG1
Cre01.g002200.t1.1 RPB6
Cre01.g002500.t1.2 COP2
Cre01.g003050.t1.2 SEC8
Cre01.g004250.t1.2 TCTEX1
Cre01.g004300.t1.2 ASN1
Cre01.g004450.t1.2 CPLD42
Cre01.g004500.t1.2 LEU1L
Cre01.g004550.t1.2 FAP190
Cre01.g004600.t1.1 RWP12
Cre01.g005150.t1.1 SGA1
Cre01.g005450.t1.2 RSP10
Cre01.g005550.t1.2 ARL2

 

I'm wondering if there is a direct way to add a column of gene symbols to my DE result by mapping the transcripts ID to the text file above? I've done some research and I'm not sure if the org.Hs.eg.db package can help me to do it Thanks!

 

ADD COMMENTlink modified 3 months ago by Michael Love20k • written 3 months ago by ytlin61010

Not an expert, but I had the same question recently and I did it through

>library(fuzzyjoin)
>regex_left_join(dataframe, genelist,by=c("IDcol"="transID"))

where IDcol is the column containing your IDs in your data frame and transID, the IDs column in your list.

Hope that helps!

ADD REPLYlink modified 3 months ago • written 3 months ago by rina0
1

Hi Rina, thank you for the response, I've successfully joined the two data frames with this package, thanks a lot!

ADD REPLYlink modified 3 months ago • written 3 months ago by ytlin61010
1
gravatar for Michael Love
3 months ago by
Michael Love20k
United States
Michael Love20k wrote:

There are numerous ways to do this in R, and Rina, has provided one above. 

I think the easiest approach is:

res <- results(dds, tidy=TRUE)

This will put the row names as a column called "row".

Then once your table of additional gene symbols has the first column called "row", you can just do:

names(new.table) <- c("row", "symbol")
m <- merge(res, new.table, all.x=TRUE)

The all.x=TRUE argument says that it should include rows of res even if there is no matching row of new.table.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Michael Love20k

Hi Michael, I've also tried your method and it worked great! I can finally label the gene names on my volcano plots, many thanks!

ADD REPLYlink modified 3 months ago • written 3 months ago by ytlin61010
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 159 users visited in the last hour