Different Expression Analysis with EDASeq,DESeq2,edgeR,RUVSeq
1
0
Entering edit mode
Yang WY • 0
@yang-wy-21745
Last seen 4.6 years ago
China Beijing

Hi,

This is my first post here and I'm new to RNA-Seq data analysis, especially different expression analysis (DEA).

I have read some surveys about "the best practise in RNA-Seq analysis". As they said, when working on DEA, first we should normalize the raw counts on GC content, batch effects, library size, other unwanted tecnical variances (using RUVSeq), etc, then we could use DEA algorithms to estimate dispersion, fit the model and calculate LogFC and P-adjust.

What make me confused are:

1、I want to do GC content normalization before DESeq2 or EdgeR pipeline, Here's my source using DESeq2:

 counts_GC <- withinLaneNormalization(raw_count_matrix, "GC", which="full", offset=T)
 counts_DESeq<-DESeqDataSetFromMatrix(countData = counts_GC@assayData$normalizedCounts, colData = 
                                      myfactor,design=~myfactor)
 DESeq2_DEA<-DESeq(counts_DESeq,parallel = T)
 res<-results(DESeq2_DEA)

As you see, I extracted the normalized counts created by withinLaneNormalization() and put it in DESeq2 pipeline, the same I did in edgeR.

While, the user's guide suggest that both DESeq2 and edgeR expect un-normalized counts, if user need EDASeq or cqn normalization before, they should set "offset" like:

EDASeqNormFactors<-exp(-EDASeqoffset)  #DESeq2
disp<-estimateDisp(counts(dataOffset),design, offset=-offst(dataOffset))   #edgeR

Does it means I can only use "offset" to pass EDASeq's results? Is my former source code incorrect?

2、 When I look for some examples using EDASeq+DESeq2 or edgeR, like https://support.bioconductor.org/p/108765/ and https://support.bioconductor.org/p/113405/ , both of them use "offset" generated from EDASeq as normalization factor, I notice that they didn't use the normalize method in classical pipeline such as Median in DESeq2 (as part of DESeq() do ) or TMM in edgeR (as calcNormFactors() do) anymore. If I use "offset", can Median or TMM works? I wrote a new source below, Is it the only way to normalize on GC content and library size before DESeq2 or EdgeR pipeline?

library(DESeq2)
# data is data from EDAseq example
dataOffset <- withinLaneNormalization(data,"gc",which="full",offset=TRUE)
dataOffset <- betweenLaneNormalization(dataOffset,which="full",offset=TRUE)


dds <- DESeqDataSetFromMatrix(countData = dataOffset@assayData$normalizedCounts,
                              colData = pData(dataOffset),
                              design = ~ conditions)


EDASeqNormFactors <- exp(-1 * offst(dataOffset))
normalizationFactors(dds) <- EDASeqNormFactors
# dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
res <- results(dds)
res <- res[order(res$pvalue),]
head(res)

3、According to the user's guide of RUVSeq, I first find "Empirical control genes" by edgeR and run RUVg(), I noticed that in the guide, the "set" in RUVg() has been normalized before:

set <- betweenLaneNormalization(set, which="upper")

,after finding "Empirical control genes", it goes:

set2 <- RUVg(set, empirical, k=1)

It treated the normalizaed "set" as input of RUVg(). Can I run RUVg without set normalization? Will it have a bad impact on results?

I appreciate your help,

Yang

deseq2 normalization RUVg edger • 1.4k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

My current recommendation for DESeq2 is to use Salmon with GC bias correction followed by tximport and DESeqDataSetFromTximport.

This pipeline corrects for a number of important biases including sample-specific GC bias and changes to gene length from any potential differential isoform usage (if this occurs).

ADD COMMENT

Login before adding your answer.

Traffic: 564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6