I am prefacing this post with the fact that I am brand-new to computer science, bioinformatics, much less R and DESeq2.
I am having issues with R removing the gene ID names in my countData. My raw countData appears as follows:
2416X12 2416X13 2416X10
ENSOARG00000000001 0 0 0
ENSOARG00000000002 319 30 524
ENSOARG00000000003 0 0 0
From here, I read in the sampleData (colData), and have to make edits to the countData so that the matrix lines up with that of the colData matrix. This is that code, along with the resulting countData:
> samplesRemoved <- c('24176X2','24176X8','24176X25','24176X29','24176X32')
> sampleData <- sampleData[!(sampleData$genomicID %in% samplesRemoved), ]
> countData <- countData[, as.character(sampleData$genomicID)]
> sampleData$genomicID == colnames(countData)
> [1] TRUE TRUE TRUE
TRUE TRUE TRUE
TRUE TRUE TRUE
> head(countData)
2416X12 2416X13 2416X10
1 0 0 0
2 319 30 524
3 0 0 0
As you can see, the gene names have been removed, replaced by single-digit numbers. My supervisor says these are placeholders.
My issue is that when I create the dds objects, there are no gene names present - it is just the "placeholders". This naturally causes issues when I try to annotate my DESeq results using biomaRt.
To summarize:
- My countData has the gene names; upon removing genes with less than 60% uniquely-mapped reads, gene names are removed and are replaced with single-digit numbers.
- I can still create dds objects, run DESeq on these objects, and receive results of the differential expression, but there is no gene name present in my results, just those aforementioned placeholders.
- Because there are no gene names in my DESeq results, I am unable to annotate my data using biomaRt.
- Therefore, I am looking to re-add or "fix" the issue causing R to remove my gene names from my countData or DESeq objects.
I appreciate any insight or information that can be provided.
Thank you!
It would help us test the code if you post the result of
head(sampleData)
I think the row names are lost earlier in the script (before the code you have posted). Go back up through the script using
head(countData)
every time you edit countData to see the exact line where you lose the row names. Usual mistakes: Did you import the count data / sample data correctly? At any point do you use tibbles which auto drop row names)?Usually I have my counts and metadata in a csv files so I use: