Question: Filtering counts in SummarizedExperiment
0
12 days ago by
rbronste60
rbronste60 wrote:

Hi I am making a SummarizedExperiment from a DiffBind dba.peakset in the following way (to use in DESeq2):

rangedCounts <- dba.peakset(Adult_count, bRetrieve=TRUE)

nrows <- 1025488
ncols <- 8
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges<-GRanges(rangedCounts)

sampleN<-c("MBV1",   "MBV2",    "FBV1", "FBV2", "MBE7", "MBE8", "FBE1", "FBE2")
sampleS<-c("male", "male", "fem", "fem", "male", "male", "fem", "fem")
sampleT<-c("vehicle", "vehicle", "vehicle", "vehicle", "B", "B", "B", "B")
sampleB<-c("1","2","1","2", "1", "2", "1", "2")
colData<-data.frame(sampleName=sampleN, treatment=sampleT, batch=sampleB, treatment=sampleS)

counts <- as.matrix(mcols(rangedCounts))

se<-SummarizedExperiment(assays=list(counts=counts),rowRanges=rowRanges, colData=colData)


If I look at the count matrix after I can see something like this:

         MBV1  MBV2  MBV3  FBV1  FBV2  FBV3  MBE7  MBE8  MBE9  FBE1  FBE2  FBE3
[1,]     1     1     1     1     1     1     1    66     1     1    50    34
[2,]    11     1     1     1     1     1     6    98     1    11   100     1
[3,]     1     1     1     1     1     1     1     1     1   116   108     1
[4,]     1     1    22     2    84     1     1     4     1    64     1    40
[5,]     1     1    18    74    74     1   102     1   126    22     1     1
[6,]     1     1     1     1    44     1     1     1   122     1     1     1
[7,]     1     1     1     1     1     1     1    42     1     1    96     1
[8,]     1     2   156    20     1    58     1   250   130    62     4   282


I would like to either take this or the rangedCounts and filter at each position to lets say set a minimum of 100 for every count in the matrix or any other manipulation. I know how to do rowSums and rowMeans but not sure about other filtering. Please let me know if you can help out with this, thanks!

modified 11 days ago by James W. MacDonald51k • written 12 days ago by rbronste60
0
11 days ago by
United States
James W. MacDonald51k wrote:

This isn't really a Bioconductor question, but instead is a basic 'how do I get R to do things' question. And you seem to want to do one thing, but maybe something else? I mean, do you want to filter to a minimum of 100 (really?) or something else?

Anyway, you can get a long way with simple tests like

z <- rowSums(assay(se) >= 100)



and then filtering on that, depending on how many of the genomic regions have to have a count of that size.