Question

Deseq to output continous sequencing data

0

Entering edit mode

Neha • 0

@aeb6d1e5

Last seen 7 months ago

United States

Hi,

I am new to DESeq and am wondering if I can input my raw RNAseq count data, perform all the normalizations and pre-processing as per DeSeq, but not run the negative-binomial GLM. I'm interested in instead outputting the continuous RNAseq data (that accounts for e.g., the library size, composition differences, dispersion etc) and running a different regression model that's not a GLM. I have come across numerous posts on using VST in context of DeSeq-WGCNA which are a similar idea but I have a few questions:

1. Would the following steps be sound and sufficient to get normalized, pre-processed, continuous RNAseq data?

# Step 1. Save model in deseq
 dds_adj <- DESeqDataSetFromMatrix(countData = RNA,
                                  colData = design_formula,
                                  design = ~ covar1 + as.character(covar2) + condition)  

# Step 2. Pre-filter low count RNAs
dds_adj <- estimateSizeFactors(dds_adj)
smallestGroupSize <- ncol(dds_adj)*0.10                                        # 10% of total sample size (N)
keep_1 <- rowSums(counts(dds_adj, normalized=TRUE) >= 1 ) >= smallestGroupSize # cut off is 1 cpm here
dds_adj <- dds_adj[keep_1,]

# Step 3. VST the data to take care of 1. estimating and normalizing for size factor (library composition) and 2. dispersion
vst<-  varianceStabilizingTransformation(dds_adj, blind = FALSE)
norm.data <- assay(vst)

I would like to explore different conditions (e.g., continous and categorical age) but when I consider them in Step1, each leads a slightly different norm.data. I'd like a single, final norm.data. By reading the WGCNA issues, my understanding is that setting blind = T in vst() or setting the design = ~1 could help but that comes with repurcussions. Would it be appropriate to design = ~ covar1 + as.character(covar2) and ignoring the main condition of interest altogether and why? Also, by using a design that includes covariates, it doesn't mean that the RNAseq data is now adjusted for these covariates and I still do need to control for them downstream, right?

Thank you so much!

DESeq2 • 496 views

ADD COMMENT • link updated 8 months ago by Michael Love 43k • written 8 months ago by Neha • 0

score 0 · Answer 1 · 2024-10-28

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 14 days ago

United States

This what VST is for? Have you see the rnaseqGene workflow?

ADD COMMENT • link 8 months ago Michael Love 43k