Question

Write an annotated DESEQ2 counts CSV, pls help.

0

Entering edit mode

Dr. • 0

@92e11057

Last seen 14 months ago

United States

Hi all, I thought finding an answer to do this would be easy but since I clearly don't know the required R code to do it, I'm here asking for help. I've generated a DESeq2 csv file containing my normalized counts organized by row (Ensembl ID's) as a R df called DSNorm. See my code below. One, I'm not sure why I'm not getting any results back from BiomaRt (0 rows)? Is it because my filter object isn't a character vector? If that's it, how can I export my DSNorm first column (my ensembl ID's) as a character vector in the filter object? Second, once I get the biomaRt part to function correctly, how can I insert the attained gene_biotypes into a new column of my DSNorm data frame (ensuring the rows align)? Any insight or suggestions would be helpful? Thanks all!

Code should be placed in three backticks as shown below

#load DESeq2
library("DESeq2")

#read in the above data
cts <- read.csv("counts_ILTK.csv", header=TRUE, row.names=1)
coldata <- read.csv("DEG_exp.csv", header=TRUE, row.names=1)
coldata <- coldata[,c("condition","type")]

# construct the Deseq2 data set
dds1 <- DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~condition, tidy=TRUE)

#calculate normalized counts and export
dds <- estimateSizeFactors(dds1)
table_counts_normalized <- counts(dds, normalized=TRUE)
write.csv(table_counts_normalized, "TKvIL_DESeq2_Counts.csv", row.names=TRUE)

#use biomaRt to get Ensembl bio types for my Genes.
DSNorm <- read.csv("TKvIL_DESeq2_Counts.csv", header=TRUE)

#assign Col1 for row names
rownames(DSNorm) <- DSNorm$ensembID
filter<-rownames(DSNorm)

#Annotate with gene biotype
library("biomaRt")

mart = useMart("ensembl","hsapiens_gene_ensembl")
getBM(attributes='gene_biotype', filters = 'ensembl_gene_id', values = filter, mart = mart)

# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

> getBM(attributes='gene_biotype', filters = 'ensembl_gene_id', values = filter, mart = mart)

Batch submitting query [======>-----------------------------------------------------------------------------------------]   8% eta: 10s
Batch submitting query [==============>---------------------------------------------------------------------------------]  15% eta:  9s
Batch submitting query [=====================>--------------------------------------------------------------------------]  23% eta:  8s
Batch submitting query [=============================>------------------------------------------------------------------]  31% eta:  7s
Batch submitting query [====================================>-----------------------------------------------------------]  38% eta:  6s
Batch submitting query [===========================================>----------------------------------------------------]  46% eta:  5s
Batch submitting query [===================================================>--------------------------------------------]  54% eta:  6s
Batch submitting query [==========================================================>-------------------------------------]  62% eta:  7s
Batch submitting query [=================================================================>------------------------------]  69% eta:  6s
Batch submitting query [=========================================================================>----------------------]  77% eta:  4s
Batch submitting query [================================================================================>---------------]  85% eta:  3s
Batch submitting query [========================================================================================>-------]  92% eta:  1s

[1] gene_biotype
<0 rows> (or 0-length row.names)

sessionInfo( )

DESeq2 biomaRt R • 940 views

ADD COMMENT • link 14 months ago Dr. • 0

score 0 · Answer 1 · 2023-02-14

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

You don't give enough information to go on, but two things to keep in mind.

1.) It's always best to ask to get your filters back, because biomaRt queries a relational database, and there is no expectation that the order will be preserved.

2.) You don't provide enough information for anybody to help you. What are your IDs? You need to show an example.

Here's an example of what could be going on.

> library(biomaRt)
> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl")
## need some Ensembl IDs
> library(org.Hs.eg.db)

> ids <- head(keys(org.Hs.eg.db, "ENSEMBL"), 20)
> head(ids)
[1] "ENSG00000121410" "ENSG00000175899" "ENSG00000256069" "ENSG00000171428"
[5] "ENSG00000156006" "ENSG00000196136"
## Query, asking for the IDs to be returned as well
> getBM(c("ensembl_gene_id","gene_biotype"), "ensembl_gene_id", ids, mart)
   ensembl_gene_id                       gene_biotype
1  ENSG00000090861                     protein_coding
2  ENSG00000107331                     protein_coding
3  ENSG00000114771                     protein_coding
4  ENSG00000121410                     protein_coding
5  ENSG00000127837                     protein_coding
6  ENSG00000129673                     protein_coding
7  ENSG00000131269                     protein_coding
8  ENSG00000156006                     protein_coding
9  ENSG00000165029                     protein_coding
10 ENSG00000167972                     protein_coding
11 ENSG00000171428                     protein_coding
12 ENSG00000175899                     protein_coding
13 ENSG00000183044                     protein_coding
14 ENSG00000196136                     protein_coding
15 ENSG00000204574                     protein_coding
16 ENSG00000225989                     protein_coding
17 ENSG00000231129                     protein_coding
18 ENSG00000232169                     protein_coding
19 ENSG00000236149                     protein_coding
20 ENSG00000256069 transcribed_unprocessed_pseudogene

## but what if you have versioned IDs?
> z <- getBM("ensembl_gene_id_version", "ensembl_gene_id", ids, mart)
> head(z)
  ensembl_gene_id_version
1      ENSG00000090861.17
2      ENSG00000107331.18
3      ENSG00000114771.14
4      ENSG00000121410.12
5      ENSG00000127837.10
6      ENSG00000129673.10

## try using those, but don't tell biomaRt about it...
> getBM(c("ensembl_gene_id","gene_biotype"), "ensembl_gene_id", z[,1], mart)
[1] ensembl_gene_id gene_biotype   
<0 rows> (or 0-length row.names)

ADD COMMENT • link 14 months ago James W. MacDonald 65k

0

Entering edit mode

Thanks for the reply James! I appreciate it. See the head below for the csv I am trying to annotate wit the gene_biotype. They are Ensembl version ID's. once I run biomaRt, and I do get the gene_biotypes back, how do I add them as a column to my csv file?

 > head(cts)
                   UHRR.TK.A1 UHRR.TK.A2 UHRR.TK.A3 UHRR.TK.B1 UHRR.TK.B2 UHRR.TK.B3 UHRR.TK.C1 UHRR.TK.C2 UHRR.TK.C3
ENSG00000000003.15       2702       2602       2598       2724       3300       2085       3309       2739       2222
ENSG00000000005.6          51         99         47         32         26         10         19         62         28
ENSG00000000419.14       3136       3678       3337       3555       4155       3281       3610       3831       3291
ENSG00000000457.14       1351       1365       1466       1291       1475       1318       1412       1516       1281
ENSG00000000460.17       2106       1997       1914       2045       2517       1760       2611       2279       1824
ENSG00000000938.13        392        449        391        548        399        365        532        439        363
                   UHRR.IL.A1 UHRR.IL.A2 UHRR.IL.A3 UHRR.IL.B1 UHRR.IL.B2 UHRR.IL.B3 UHRR.IL.C1 UHRR.IL.C2 UHRR.IL.C3
ENSG00000000003.15       1850       1932       1923       2091       2251       2628       1791       1745       2108
ENSG00000000005.6          28         16         22         21         36         27         34         18         31
ENSG00000000419.14       4642       4377       3833       5324       4884       5469       4609       3853       4525
ENSG00000000457.14       1171       1196       1124       1338       1465       1476       1078       1083       1286
ENSG00000000460.17       3217       3141       3044       3712       3628       4094       3084       2816       3305
ENSG00000000938.13        468        568        515        577        619        734        473        493        606

ADD REPLY • link 14 months ago Dr. • 0

0

Entering edit mode

That step is simple R processing. Here's an example using fake data.

## some fake data
> fakeo
                           S1         S2         S3           S4         S5
ENSG00000121410.12 -0.4761319  0.5840037 -0.2875571 -0.258449539  0.2065031
ENSG00000156006.5  -1.6718746 -0.3217945  1.2565317 -0.004544629  0.6150507
ENSG00000171428.15  1.1753496  0.6505702 -0.6345850 -0.949316097 -0.9424679
ENSG00000175899.15 -0.8521976  0.3984347  1.7782462 -2.247601988 -0.3682978
ENSG00000196136.18  1.3845189 -0.6521241 -1.5654971  0.097865547  0.8410233
ENSG00000256069.8  -0.5633037 -0.4354026 -0.3113057 -0.432819907 -1.7767897
                           S6          S7          S8          S9          S10
ENSG00000121410.12  2.3968527  0.68541300  1.14910248  0.03592422  0.441716581
ENSG00000156006.5  -0.1770464 -0.58921054 -1.53434867  0.72832514  0.004807051
ENSG00000171428.15 -1.5861187 -1.55417797 -0.05418144 -2.36742024 -0.125530406
ENSG00000175899.15 -1.4260440 -0.04661739  0.02889575  0.29666626  1.109700005
ENSG00000196136.18  0.7343172  0.35358871 -2.31499018 -1.04461456  0.776786110
ENSG00000256069.8   0.7637215  1.70031419  0.67200664 -1.07176537 -1.244707978

> mart <- useEnsembl("ensembl", "hsapiens_gene_ensembl")
> newdat <- getBM(c("ensembl_gene_id_version","gene_biotype"), "ensembl_gene_id_version", row.names(fakeo), mart)
## add a column of NA
## we do this in case there are genes that don't return a biotype
> fakeo$Gene_biotype <- NA
## match the new data to the existing data
> fakeo$Gene_biotype[match(row.names(fakeo), newdat[,1])] <- newdat[,2]
> fakeo
                           S1         S2         S3           S4         S5
ENSG00000121410.12 -0.4761319  0.5840037 -0.2875571 -0.258449539  0.2065031
ENSG00000156006.5  -1.6718746 -0.3217945  1.2565317 -0.004544629  0.6150507
ENSG00000171428.15  1.1753496  0.6505702 -0.6345850 -0.949316097 -0.9424679
ENSG00000175899.15 -0.8521976  0.3984347  1.7782462 -2.247601988 -0.3682978
ENSG00000196136.18  1.3845189 -0.6521241 -1.5654971  0.097865547  0.8410233
ENSG00000256069.8  -0.5633037 -0.4354026 -0.3113057 -0.432819907 -1.7767897
                           S6          S7          S8          S9          S10
ENSG00000121410.12  2.3968527  0.68541300  1.14910248  0.03592422  0.441716581
ENSG00000156006.5  -0.1770464 -0.58921054 -1.53434867  0.72832514  0.004807051
ENSG00000171428.15 -1.5861187 -1.55417797 -0.05418144 -2.36742024 -0.125530406
ENSG00000175899.15 -1.4260440 -0.04661739  0.02889575  0.29666626  1.109700005
ENSG00000196136.18  0.7343172  0.35358871 -2.31499018 -1.04461456  0.776786110
ENSG00000256069.8   0.7637215  1.70031419  0.67200664 -1.07176537 -1.244707978
                                         Gene_biotype
ENSG00000121410.12                     protein_coding
ENSG00000156006.5                      protein_coding
ENSG00000171428.15                     protein_coding
ENSG00000175899.15                     protein_coding
ENSG00000196136.18                     protein_coding
ENSG00000256069.8  transcribed_unprocessed_pseudogene