Question

Filtering counts in SummarizedExperiment

0

Entering edit mode

rbronste ▴ 60

@rbronste-12189

Last seen 4.5 years ago

Hi I am making a SummarizedExperiment from a DiffBind dba.peakset in the following way (to use in DESeq2):

rangedCounts <- dba.peakset(Adult_count, bRetrieve=TRUE)

nrows <- 1025488
ncols <- 8
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges<-GRanges(rangedCounts)

sampleN<-c("MBV1",   "MBV2",    "FBV1", "FBV2", "MBE7", "MBE8", "FBE1", "FBE2")
sampleS<-c("male", "male", "fem", "fem", "male", "male", "fem", "fem")
sampleT<-c("vehicle", "vehicle", "vehicle", "vehicle", "B", "B", "B", "B")
sampleB<-c("1","2","1","2", "1", "2", "1", "2")
colData<-data.frame(sampleName=sampleN, treatment=sampleT, batch=sampleB, treatment=sampleS)

counts <- as.matrix(mcols(rangedCounts))

se<-SummarizedExperiment(assays=list(counts=counts),rowRanges=rowRanges, colData=colData)

If I look at the count matrix after I can see something like this:

         MBV1  MBV2  MBV3  FBV1  FBV2  FBV3  MBE7  MBE8  MBE9  FBE1  FBE2  FBE3
  [1,]     1     1     1     1     1     1     1    66     1     1    50    34
  [2,]    11     1     1     1     1     1     6    98     1    11   100     1
  [3,]     1     1     1     1     1     1     1     1     1   116   108     1
  [4,]     1     1    22     2    84     1     1     4     1    64     1    40
  [5,]     1     1    18    74    74     1   102     1   126    22     1     1
  [6,]     1     1     1     1    44     1     1     1   122     1     1     1
  [7,]     1     1     1     1     1     1     1    42     1     1    96     1
  [8,]     1     2   156    20     1    58     1   250   130    62     4   282

I would like to either take this or the rangedCounts and filter at each position to lets say set a minimum of 100 for every count in the matrix or any other manipulation. I know how to do rowSums and rowMeans but not sure about other filtering. Please let me know if you can help out with this, thanks!

diffbind matrix summarizedexperiment rangedCounts • 1.4k views

ADD COMMENT • link updated 4.5 years ago by Rory Stark ★ 5.2k • written 4.6 years ago by rbronste ▴ 60

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 23 hours ago

United States

This isn't really a Bioconductor question, but instead is a basic 'how do I get R to do things' question. And you seem to want to do one thing, but maybe something else? I mean, do you want to filter to a minimum of 100 (really?) or something else?

Anyway, you can get a long way with simple tests like

z <- rowSums(assay(se) >= 100)

and then filtering on that, depending on how many of the genomic regions have to have a count of that size.

ADD COMMENT • link 4.6 years ago James W. MacDonald 65k

0

Entering edit mode

This isn't really a Bioconductor answer either. I apologize for the vagueness of the question but I think you know what I was asking and why I asked it here. Your response rehashed what I indicated I already understood. Thank you.

ADD REPLY • link 4.6 years ago rbronste ▴ 60

score 2 · Accepted Answer · 2019-10-22

2

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 15 days ago

Cambridge, UK

You can do this using dba.count() by setting filter=100 and filterFun=min.

You'll end up filtering out most of your peaks -- if there are any single samples with low binding the site will be removed.

ADD COMMENT • link 4.5 years ago Rory Stark ★ 5.2k