Question: Filtering counts in SummarizedExperiment
0
12 days ago by
rbronste60
rbronste60 wrote:

Hi I am making a SummarizedExperiment from a DiffBind dba.peakset in the following way (to use in DESeq2):

rangedCounts <- dba.peakset(Adult_count, bRetrieve=TRUE)

nrows <- 1025488
ncols <- 8
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges<-GRanges(rangedCounts)

sampleN<-c("MBV1",   "MBV2",    "FBV1", "FBV2", "MBE7", "MBE8", "FBE1", "FBE2")
sampleS<-c("male", "male", "fem", "fem", "male", "male", "fem", "fem")
sampleT<-c("vehicle", "vehicle", "vehicle", "vehicle", "B", "B", "B", "B")
sampleB<-c("1","2","1","2", "1", "2", "1", "2")
colData<-data.frame(sampleName=sampleN, treatment=sampleT, batch=sampleB, treatment=sampleS)

counts <- as.matrix(mcols(rangedCounts))

se<-SummarizedExperiment(assays=list(counts=counts),rowRanges=rowRanges, colData=colData)


If I look at the count matrix after I can see something like this:

         MBV1  MBV2  MBV3  FBV1  FBV2  FBV3  MBE7  MBE8  MBE9  FBE1  FBE2  FBE3
[1,]     1     1     1     1     1     1     1    66     1     1    50    34
[2,]    11     1     1     1     1     1     6    98     1    11   100     1
[3,]     1     1     1     1     1     1     1     1     1   116   108     1
[4,]     1     1    22     2    84     1     1     4     1    64     1    40
[5,]     1     1    18    74    74     1   102     1   126    22     1     1
[6,]     1     1     1     1    44     1     1     1   122     1     1     1
[7,]     1     1     1     1     1     1     1    42     1     1    96     1
[8,]     1     2   156    20     1    58     1   250   130    62     4   282


I would like to either take this or the rangedCounts and filter at each position to lets say set a minimum of 100 for every count in the matrix or any other manipulation. I know how to do rowSums and rowMeans but not sure about other filtering. Please let me know if you can help out with this, thanks!

0
11 days ago by
United States
James W. MacDonald51k wrote:

This isn't really a Bioconductor question, but instead is a basic 'how do I get R to do things' question. And you seem to want to do one thing, but maybe something else? I mean, do you want to filter to a minimum of 100 (really?) or something else?

Anyway, you can get a long way with simple tests like

z <- rowSums(assay(se) >= 100)



and then filtering on that, depending on how many of the genomic regions have to have a count of that size.