Question

R is Removing the Gene Names in My Count Data

0

Entering edit mode

thatolivebranch • 0

@89ccd41e

Last seen 15 months ago

United States

I am prefacing this post with the fact that I am brand-new to computer science, bioinformatics, much less R and DESeq2.

I am having issues with R removing the gene ID names in my countData. My raw countData appears as follows:

                   2416X12     2416X13     2416X10
ENSOARG00000000001   0          0          0
ENSOARG00000000002   319        30         524
ENSOARG00000000003   0          0          0

From here, I read in the sampleData (colData), and have to make edits to the countData so that the matrix lines up with that of the colData matrix. This is that code, along with the resulting countData:

> samplesRemoved <- c('24176X2','24176X8','24176X25','24176X29','24176X32')
> sampleData <- sampleData[!(sampleData$genomicID %in% samplesRemoved), ]
> countData <- countData[, as.character(sampleData$genomicID)]
> sampleData$genomicID == colnames(countData)
> [1] TRUE TRUE TRUE
TRUE TRUE TRUE
TRUE TRUE TRUE
> head(countData)

   2416X12     2416X13     2416X10
1    0          0          0
2    319        30         524
3    0          0          0

As you can see, the gene names have been removed, replaced by single-digit numbers. My supervisor says these are placeholders.

My issue is that when I create the dds objects, there are no gene names present - it is just the "placeholders". This naturally causes issues when I try to annotate my DESeq results using biomaRt.

To summarize:

My countData has the gene names; upon removing genes with less than 60% uniquely-mapped reads, gene names are removed and are replaced with single-digit numbers.
I can still create dds objects, run DESeq on these objects, and receive results of the differential expression, but there is no gene name present in my results, just those aforementioned placeholders.
Because there are no gene names in my DESeq results, I am unable to annotate my data using biomaRt.
Therefore, I am looking to re-add or "fix" the issue causing R to remove my gene names from my countData or DESeq objects.

I appreciate any insight or information that can be provided.

Thank you!

DESeq2 • 1.2k views

ADD COMMENT • link updated 15 months ago by BioinfGuru ▴ 70 • written 15 months ago by thatolivebranch • 0

0

Entering edit mode

df <- data.frame("2416X12" = c(0, 319, 0), "2416X13" = c(319, 30, 524), "2416X10" = c(0, 0, 0))
rownames(df) <- c("ENSOARG00000000001", "ENSOARG00000000002", "ENSOARG00000000003")
df
                     X2416X12 X2416X13 X2416X10
ENSOARG00000000001        0      319        0
ENSOARG00000000002      319       30        0
ENSOARG00000000003        0      524        0

samplesRemoved <- c('24176X2','24176X8','24176X25','24176X29','24176X32')
sampleData <- sampleData[!(sampleData$genomicID %in% samplesRemoved), ]
Error: object 'sampleData' not found

It would help us test the code if you post the result of head(sampleData)

I think the row names are lost earlier in the script (before the code you have posted). Go back up through the script using head(countData) every time you edit countData to see the exact line where you lose the row names. Usual mistakes: Did you import the count data / sample data correctly? At any point do you use tibbles which auto drop row names)?

Usually I have my counts and metadata in a csv files so I use:

metadata <- read.csv("path/to/counts.csv", stringsAsFactors=T , row.names = 'sample')
counts <- as.matrix(round(read.csv("path/to/metadata.csv", header = TRUE, check.names = FALSE, row.names = 1)))
counts <- counts[, rownames(metadata)] # only keep columns in counts that are rows in metadata

ADD REPLY • link 15 months ago BioinfGuru ▴ 70

score 0 · Answer 1 · 2024-08-14

Simple subsetting of matrix or data.frame objects won't remove the rownames of that object, so you must somehow be doing that yourself. As an example:

> mat <- matrix(rnorm(100), 10, dimnames = list(letters[1:10], LETTERS[1:10]))
> mat
            A           B           C           D          E            F           G           H          I
a -0.82256365 -0.37198076  0.07137378 -0.71331112  0.6833136 -1.334158319  0.98671641  0.19626838 -2.0007266
b -2.71810772  0.95693101 -0.40648419  0.22544382 -0.8446000 -0.005392807  0.82647472  0.41393284 -0.5892270
c -0.63591072 -1.11432562 -0.10338702 -0.62844455 -0.8710215 -0.556189619 -1.35987664  1.31540058  0.4205688
d -0.46359778  2.58741206  0.59820336 -0.62112069 -0.1307859 -0.233554781 -0.80173782  0.68605100 -0.7928719
e  0.89879645  1.12937029 -0.74531210  0.08488017  0.8476031 -0.735564089 -0.92668446 -0.77919285  1.0992674
f  1.17918168 -0.55807982 -0.72293915  0.10491824 -1.0668308  0.345118895  1.29919497 -0.14056020 -0.9329652
g -0.09452953  0.82752783 -1.39423832  0.13745696 -1.0201883  0.403421337 -0.49858300 -0.04181608  1.0612187
h -0.65160252  0.13314305  1.60116195 -0.30343525 -1.2821592  0.416272876 -0.07024921 -0.69996705 -1.1704220
i  0.74523362  1.35284415 -1.30131747  1.05375189  0.7837923 -0.491782908  1.42802243 -1.38823060 -1.7969556
j -0.82538798 -0.00653038 -0.16957324 -1.58844909  0.6448933 -0.013630793 -0.02089623  0.94204586  0.2263195
            J
a  0.17034655
b -1.65225971
c -0.03031325
d  0.83313149
e  0.33784874
f -0.71377411
g  0.06026062
h -0.75208889
i  0.25610912
j -0.48736031
> mat[,c(1,4,2,3,8)]
            A           D           B           C           H
a -0.82256365 -0.71331112 -0.37198076  0.07137378  0.19626838
b -2.71810772  0.22544382  0.95693101 -0.40648419  0.41393284
c -0.63591072 -0.62844455 -1.11432562 -0.10338702  1.31540058
d -0.46359778 -0.62112069  2.58741206  0.59820336  0.68605100
e  0.89879645  0.08488017  1.12937029 -0.74531210 -0.77919285
f  1.17918168  0.10491824 -0.55807982 -0.72293915 -0.14056020
g -0.09452953  0.13745696  0.82752783 -1.39423832 -0.04181608
h -0.65160252 -0.30343525  0.13314305  1.60116195 -0.69996705
i  0.74523362  1.05375189  1.35284415 -1.30131747 -1.38823060
j -0.82538798 -1.58844909 -0.00653038 -0.16957324  0.94204586

But anyway, let's say R actually is removing the rownames. It's simple enough to save them and re-apply

> samplesRemoved <- c('24176X2','24176X8','24176X25','24176X29','24176X32')
> sampleData <- sampleData[!(sampleData$genomicID %in% samplesRemoved), ]
> rn <- row.names(countData)
> countData <- countData[, as.character(sampleData$genomicID)]
> row.names(countData) <- rn