Question: Adding gene names (or symbols) in my DESeq result
0
7 months ago by
ytlin61010
ytlin61010 wrote:

Hi, I'm working on the DE analysis of my RNA-seq data from the green algae Chlamydomonas, and I'm able to generate a normal DE result by DESeq2 like this:

 baseMean log2FoldChange lfcSE stat pvalue padj Cre01.g000450.v5.5 256.1055 -0.2995 0.2954 -1.0140 0.3106 0.7465 Cre01.g000500.v5.5 44.3266 -0.7029 0.3880 -1.8114 0.0701 0.3764 Cre01.g000600.v5.5 2.3502 1.5752 1.8795 0.8381 0.4020 0.8108 Cre01.g000650.v5.5 5.7842 1.3050 0.8817 1.4802 0.1388 0.5241 Cre01.g000850.v5.5 4.7789 -0.0103 0.7810 -0.0132 0.9895 0.9999 ... ... ... ... ... ... ... Cre36.g759647.v5.5 10.3085 0.2125 1.1183 0.1900 0.8493 0.9771 Cre39.g760097.v5.5 2.7385 0.8043 1.6105 0.4994 0.6175 0.9069 Cre43.g760547.v5.5 2.9478 -2.4908 1.6740 -1.4879 0.1368 0.5233 Cre44.g760747.v5.5 633.6948 -0.0325 0.2354 -0.1380 0.8902 0.9846 Cre48.g761197.v5.5 5.6491 -0.3471 1.0296 -0.3371 0.7360 0.9423

I've also downloaded a text file of gene symbol and transcript ID from JGI (https://phytozome.jgi.doe.gov/pz/portal.html):

 Cre01.g000050.t1.1 RWP14 Cre01.g000150.t1.2 ZRT2 Cre01.g000650.t1.1 AMX2 Cre01.g000850.t1.2 CPLD38 Cre01.g000900.t1.2 CPLD20 Cre01.g001400.t1.1 ZMP1 Cre01.g001750.t1.2 TIG1 Cre01.g002200.t1.1 RPB6 Cre01.g002500.t1.2 COP2 Cre01.g003050.t1.2 SEC8 Cre01.g004250.t1.2 TCTEX1 Cre01.g004300.t1.2 ASN1 Cre01.g004450.t1.2 CPLD42 Cre01.g004500.t1.2 LEU1L Cre01.g004550.t1.2 FAP190 Cre01.g004600.t1.1 RWP12 Cre01.g005150.t1.1 SGA1 Cre01.g005450.t1.2 RSP10 Cre01.g005550.t1.2 ARL2 … …

I'm wondering if there is a direct way to add a column of gene symbols to my DE result by mapping the transcripts ID to the text file above? I've done some research and I'm not sure if the org.Hs.eg.db package can help me to do it Thanks!

rnaseq deseq2 gene names • 378 views
modified 7 months ago by Michael Love22k • written 7 months ago by ytlin61010

Not an expert, but I had the same question recently and I did it through

>library(fuzzyjoin)
>regex_left_join(dataframe, genelist,by=c("IDcol"="transID"))


where IDcol is the column containing your IDs in your data frame and transID, the IDs column in your list.

Hope that helps!

1

Hi Rina, thank you for the response, I've successfully joined the two data frames with this package, thanks a lot!

1
7 months ago by
Michael Love22k
United States
Michael Love22k wrote:

There are numerous ways to do this in R, and Rina, has provided one above.

I think the easiest approach is:

res <- results(dds, tidy=TRUE)


This will put the row names as a column called "row".

Then once your table of additional gene symbols has the first column called "row", you can just do:

names(new.table) <- c("row", "symbol")
m <- merge(res, new.table, all.x=TRUE)


The all.x=TRUE argument says that it should include rows of res even if there is no matching row of new.table.