Deleted:Removing Ig* and H2-* genes and counts from txi after DESeqDataSetFromTximport() in a DESeq2 analysis prior to DESeq() and results()?
1
0
Entering edit mode
Pratik Mehta ▴ 10
@0512b16f
Last seen 1 day ago
United States

Hey Bioconductor Community,

So I have a bulk RNA-seq of 4 groups used for doing 7 different comparisons in various combinations. No complex designs, just simply designs like this: ~ condition (so, x vs y). I began with nf-core/rnaseq using star_salmon to tximport to DESeq2.

In DESeq2 now, when running the comparisons that are of high-interest to me, there's a lot of Immunoglobulin genes that are marked as significant. They pretty much dominate the DEG table. I want to see what else is there besides those. So I went back a little upstream to right after txi <- tximport(...) and dds <- DESeqDataSetFromTximport(txi,..) and right before DESeq() and `results(). (Note: This study is in mouse.)

So I did something like this to subset out the Ig*'s something like this:

txi <- tximport(...)
dds <- DESeqDataSetFromTximport(txi,..)

dds <- dds[grep(x = rowData(dds)$mgi_symbols, pattern = "Ig",invert = TRUE),] #added this

DESeq(...)
results(...)

Results looked good that the Immunoglobulins weren't flooding the significant genes anymore... but then HLA's were H2-*. So I went back upstream and did it like this:

txi <- tximport(...)
dds <- DESeqDataSetFromTximport(txi,..)

dds <- dds[grep(x = rowData(dds)$mgi_symbols, pattern = "Ig",invert = TRUE),] #added this
dds <- dds[grep(x = rowData(dds)$mgi_symbols, pattern = "H2-",invert = TRUE),] #and this

DESeq(...)
results(...)

Results look great now... Really they do. But now I am just wondering if someone could just offer their suggestions if what I have done is "okay"? I understand that I will report the original findings/stages of subsetting, I think, in the supplemental, and the goodies we found now... So our results could be reproduced.

But the heart of my question is: Did I do this type-of filtering at the correct stage of the analysis?

I have been reading some posts here and at Biostars that say strictly not to do this. And there's I think one post were someone did something like the subsetting I have done, and are completely fine with it. And then I think Gordon Smyth talks about the details of doing this in edgeR before or after normalization... I had read one other post Gordon Smyth was saying with edgeR, he uses RefSeq to bypass some of this even more upstream. So I have chosen to do this analysis with DESeq2 because I have become familiar with it, and also have become familar with using Gencode and Ensembl. I am kind-of hoping to stay with these for now. But really would appreciate if someone could help with the heart of my question above, please?

DESeq2 tximport • 171 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6