RNA sequencing raw counts gene filtering
1
0
Entering edit mode
kcarey • 0
@b626d890
Last seen 7 months ago
United States

Hello All,

I am preparing to run normalization and differential expression on my RNA sequencing raw counts data. However, I wanted to perform a gene filtering step before running limma for batch correction (16 Tissue sites were used for data collection) and DESeq2 (DE Analysis and Normalization). I started with ~60,000 genes and I have already filtered for removal of genes was greater than or equal to 50% 0's expression in the data, which lowered my gene amount to 35,458 genes. This is still a bit high. Again, downstream, I will be performing DESeq2 and WGCNA, and I wanted to ensure that I had genes that were robust. I am not confident in the best approach to apply more filtering. I have visualized the data with a histogram and see a bimodal distribution, along with PCA plot as well. Can you offer any suggestions for filtering? Is this mainly technical or biological? From a technical side, I was told that bimodal distribution in RNA sequencing data is typical, and the left bump corresponds to noise. However, biologically, when I pulled some genes out, I did see that the expression across samples in boxplots for the genes, made sense based on my subtype grouping.

Are there any suggestions for filtering? I have seen people use this before DEseq2.

pre-filtering: removing rows with low gene counts

Calculate total read counts per gene

total_counts <- rowSums(counts(dds))

Filter genes with at least 10 total read counts

dds_filtered <- dds[total_counts >= 10, ]

However, this seems arbitrary and not data specific. I am not sure how to search for a value in literature. I am using high grade serous ovarian cancer data. After I filter, I plan to batch correct with limma before DESeq2.

Any suggestions will be great!

DESeq2 limma RNAseq • 1.4k views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.6k
@atpoint-13662
Last seen 1 hour ago
Germany

Both the vignettes of limma and DESeq2 have recommendations for prefiltering, please read them:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#pre-filtering

https://www.bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf Section 15.3

I plan to batch correct with limma before DESeq2.

No, don't. DESeq2 expects integer counts. Read the manual of the tools you use first. Don't reinvent the wheel.

ADD COMMENT
0
Entering edit mode

Thank you for pointing me in the right direction. I was overthinking it.

Kaylin

ADD REPLY

Login before adding your answer.

Traffic: 561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6