Hi all, I thought finding an answer to do this would be easy but since I clearly don't know the required R code to do it, I'm here asking for help. I've generated a DESeq2 csv file containing my normalized counts organized by row (Ensembl ID's) as a R df called DSNorm. See my code below. One, I'm not sure why I'm not getting any results back from BiomaRt (0 rows)? Is it because my filter object isn't a character vector? If that's it, how can I export my DSNorm first column (my ensembl ID's) as a character vector in the filter object? Second, once I get the biomaRt part to function correctly, how can I insert the attained gene_biotypes into a new column of my DSNorm data frame (ensuring the rows align)? Any insight or suggestions would be helpful? Thanks all!
Code should be placed in three backticks as shown below
#load DESeq2
library("DESeq2")
#read in the above data
cts <- read.csv("counts_ILTK.csv", header=TRUE, row.names=1)
coldata <- read.csv("DEG_exp.csv", header=TRUE, row.names=1)
coldata <- coldata[,c("condition","type")]
# construct the Deseq2 data set
dds1 <- DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~condition, tidy=TRUE)
#calculate normalized counts and export
dds <- estimateSizeFactors(dds1)
table_counts_normalized <- counts(dds, normalized=TRUE)
write.csv(table_counts_normalized, "TKvIL_DESeq2_Counts.csv", row.names=TRUE)
#use biomaRt to get Ensembl bio types for my Genes.
DSNorm <- read.csv("TKvIL_DESeq2_Counts.csv", header=TRUE)
#assign Col1 for row names
rownames(DSNorm) <- DSNorm$ensembID
filter<-rownames(DSNorm)
#Annotate with gene biotype
library("biomaRt")
mart = useMart("ensembl","hsapiens_gene_ensembl")
getBM(attributes='gene_biotype', filters = 'ensembl_gene_id', values = filter, mart = mart)
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
> getBM(attributes='gene_biotype', filters = 'ensembl_gene_id', values = filter, mart = mart)
Batch submitting query [======>-----------------------------------------------------------------------------------------] 8% eta: 10s
Batch submitting query [==============>---------------------------------------------------------------------------------] 15% eta: 9s
Batch submitting query [=====================>--------------------------------------------------------------------------] 23% eta: 8s
Batch submitting query [=============================>------------------------------------------------------------------] 31% eta: 7s
Batch submitting query [====================================>-----------------------------------------------------------] 38% eta: 6s
Batch submitting query [===========================================>----------------------------------------------------] 46% eta: 5s
Batch submitting query [===================================================>--------------------------------------------] 54% eta: 6s
Batch submitting query [==========================================================>-------------------------------------] 62% eta: 7s
Batch submitting query [=================================================================>------------------------------] 69% eta: 6s
Batch submitting query [=========================================================================>----------------------] 77% eta: 4s
Batch submitting query [================================================================================>---------------] 85% eta: 3s
Batch submitting query [========================================================================================>-------] 92% eta: 1s
[1] gene_biotype
<0 rows> (or 0-length row.names)
sessionInfo( )
Thanks for the reply James! I appreciate it. See the head below for the csv I am trying to annotate wit the gene_biotype. They are Ensembl version ID's. once I run biomaRt, and I do get the gene_biotypes back, how do I add them as a column to my csv file?
That step is simple R processing. Here's an example using fake data.
Got it! Thanks!