I am wanting to delete about 9000 rows of genes from 27,000. The rationale is that 20 of our samples are "identical" in the sense that they have all been deleted for an essential gene (permitted due to a drug pretreatment),(aka biological replicates), and the 9000 genes show very strong variation between the 20 independently generated samples. We think we can get rid of some of the noise by focusing on just the differences between the controls and the samples. One problem is that the larger table is either the original RNA counts file or the deSeqDataSet, whereas the other table is a results table. So I can't just search for identical rows... , but can use gene_ID to point to the correct rows. I have tried things like %in% and intersect, but no luck (I am guessing because the rows are different beyond the name. I did read the section on filtering reads (and I have already filtered low counts). But couldn't figure out how I could filter by comparing one list to another. Any advice is appreciated!
Putting aside the rationale for wanting to remove these genes, there are many ways to do what you're after.
Let's say your
DESeqDataSet object is named
dds and your "results table" is called
Take a minute to familiarize yourself with the type of entries stored in the columns of
res, by taking a quick peak:
dds should have some type of row-level identifiers. You can find this out by looking at the output of
head(rownames(dds)). Can you match those identifiers with any of the entries in the columns of
res? If you can't, you've got larger fish to fry, but let's press on ...
Depending on how you built
dds, it should also have a
DataFrame of meta information for the rows (genes) of your
dds, which you can see by looking at
rowData(dds). Do any of the entries there match the entries in the columns of
One you've identified the column in
res that has identifiers you can match to some gene information in
dds, then get the identifiers from
res you want to remove, and store them in
axe, and do something like
dds2 <- dds[!rowData(dds)$some_identifier %in% axe,]
DESeqDataSet is a
SummarizedExperiment (read that vignette if you haven't already). Both of which can be indexed like a 2d data structure. If you are having problems with the mechanics of subsetting and filtering 2d objects, then it'd be helpful to run through a couple of R tutorials before you get too frustrated by some R basics in your bioinformatics quest.