Question: DESeq2 baseMean from DiffBind summarizedExperiment
0
8 months ago by
rbronste60
rbronste60 wrote:

So I am using DiffBind to create a count matrix as summarizedExperiment and then running this through DESeq2 and I was wondering about ways to filter for specific baseMean ranges within this data? My script looks like the following:

library("DESeq2")
library("ggplot2")
library("BiocParallel")
library("DiffBind")

parallel=TRUE
BPPARAM=MulticoreParam(4)

# Create a dba.count object, important that dba.count be unnormalized (DBA_SCORE_READS).

# Retrieve counts from dba.count

nrows <- 1095036
ncols <- 12
counts <- matrix(runif(nrows * ncols, 1, 1e5), nrows)
rowRanges<-GRanges(rangedCounts)

sampleNames<-c("MMV1",    "MMV2",    "MMV3",    "FMV1",    "FMV2",    "FMV3", "MME7", "MME8", "MME9", "FME1", "FME2", "FME3")
sampleSex<-c("male", "male", "male", "female", "female", "female", "male", "male", "male", "female", "female", "female")
colData<-data.frame(sampleName=sampleNames, sex=sampleSex)

# Retrieve counts from dba.count without the site interval ranges

counts <- as.matrix(mcols(rangedCounts))

# Construct a SummarizedExperiment

se<-SummarizedExperiment(assays=list(counts=counts),rowRanges=rowRanges, colData=colData)

# DESeq2 analysis

table(sampleSex)

dds<-DESeqDataSet(se, design= ~sex)

dds$group <- factor(paste0(dds$sex))

design(dds) <- ~0 + group

dds <- dds[ rowSums(counts(dds)) > 1, ]

dds <- DESeq(dds, betaPrior = FALSE)
written 8 months ago by rbronste60
Answer: C: DESeq2 baseMean from DiffBind summarizedExperiment
1
8 months ago by
Michael Love23k
United States
Michael Love23k wrote:

I don’t follow the question. You want to pre-filter the dds based on the mean of counts? You can pre-filter the object using dds[keep,] where keep may be defined as rows with mean of normalized count greater than some value or rows where x or more samples have a count greater than or equal to y. This is up to you.

Yes actually that answers part of it, however an even deeper and perhaps slightly naive question is how the baseMean called by DESeq2 relates to the original counts and how does that change with normalization within DESeq2 - trying to determine if the problem is with the stringency of called peaks by MACS2 or something happening to during the differential peak calling. An example being that a lot of shared peaks among samples may have a high baseMean such as those at promoters with the differential peaks having low baseMeans relatively - therefore the overall change in the differential peaks is small. Thanks again!

1

Take a look here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#more-information-on-results-columns

Yes this is quite helpful however I guess I am still a bit confused as what baseMean represents in an RNA-seq experiment vs the experiment I am indicating with the code above? The binding matrix in question above is made from ATAC-seq data. Thanks.

1

The column that DESeq2 creates called “baseMean” is rowMeans(counts(dds, normalized=TRUE)) regardless of the type of input data.

Sure of course, I just mean that baseMean in gene expression data represents to some approximation the transcript abundance however in ChIP or ATAC data merely represents the fragment pileup at a specific site, unless I am incorrect?

1

I'll leave it to the DiffBind package authors I suppose

1

In this case, where the count score is DBA_SCORE_READS, the binding matrix does not have pileup scores, but rather the total number of reads that align overlapping each (consensus) peak. This is similar to what you get in RNA data, where the expression matrix has the raw number of reads that overlap the gene (exons), only in the ATAC case, instead of transcripts, they are overlapping open chromatin regions.