Question

Diffbind - retrieving individual sample RPKM values for differentially bound sites

0

Entering edit mode

JD ▴ 10

@jd-9512

Last seen 8.2 years ago

Hello -
I have run a dba analysis on 4 cell types, each with 2 replicates. The way I've setup the call to dba.analyze
results in 10 contrasts. I have two questions about retrieving the results:
1) Is it possible to retrieve the DB peaks from an individual contrast, with the score column reported for all samples, in RPKM?

The code I've used is pasted below. It retrieves peaks for a specified contrast but I've been unable to change the score to RPKM in the output file. By that I mean that the values in the Score column in the output are the same as those in the Score column when I call my_dba$peaks[[1]], rather than the values in the RPKM column.

2) What I would actually like is to export a single bed file/GR object of all DB peaks from all contrasts together,
with the rpkm values for all samples in each contrast. Is this possible?

3) Similarly, for all DB peaks in my analysis (I mean the merged results from all contrasts) is it possible to get the
read counts for the peaks in each contrast across all samples, not just the samples that were in the contrast?

I think that perhaps I can ouput a dba report-based object and retrieve the values I'm looking for from this but I'm just unfamiliar with working with this object type. I appreciate any help you can provide!!

my_dba <- dba(sampleSheet = samples, bRemoveRandom = T, minOverlap = 2)
my_dba <- dba.peakset(my_dba, consensus = -DBA_REPLICATE)
my_dba <- dba.count(my_dba, peaks = all_consensus$masks$Consensus, score = DBA_SCORE_RPKM)
contrast <- dba.contrast(my_dba, categories=DBA_FACTOR, minMembers=2)
my_dba <_ dba.analyze(contrast)
contrast1_df <- dba.report(my_dba, contrast = 1, bCounts=T, bNormalized=T, DataType=DBA_DATA_FRAME, file="contrast1")

diffbind • 2.3k views

ADD COMMENT • link updated 8.1 years ago by Rory Stark ★ 5.1k • written 8.3 years ago by JD ▴ 10

score 1 · Answer 1 · 2016-03-09

Another one that’s been sitting here a while…

1. dba.report() only lets you get the read count scores actually used for the analysis, either normalized or non-normalized. As you have seen, you can get the RPKM score by setting the score to RPKM and then retrieving the score:

> my_dba <- dba.count(my_dba,peaks=NULL,score=DBA_SCORE_RPKM)
> rpkm_scores <- dba.peakset(my_dba, bRetrieve=T

You can also access them “directly” for each sample using the $peaks slot of the DBA object:

> rpkm_1 <- my_dba$peaks[[1]]$RPKM

or, to get the RPKM scores with the control reads subtracted:

> rpkm_1 <- my_dba$peaks[[1]]$cRPKM

You can use these to map the scores for a sample. For example, using the sample data included with the package:

> data(tamoxifen_analysis) > report <- dba.report(tamoxifen,contrast=1, bCounts=TRUE)

Note that the row names in the report are which peak they are in the consensus peakset:

> report$BT4741 <- tamoxifen$peaks[[1]]$RPKM[as.numeric(names(x))]

and repeat for each sample.

For the other things you want to do, you can either a) re-count everything or b) get clever with merging GRanges.

If you are willing to re-count, this can be done in a few lines of code. You can get a set of peaks that are identified as differentially bound in any of (eg) five contrasts using a report-based DBA object:

> res_dba <- dba.report(my_dba, contrast=1:5, bDB=TRUE)
> DBpeaks <- dba.peakset(res_dba, bRetrieve=TRUE)
> rpkm_dba <- dba.count(my_dba, score=DBA_SCORE_RPKM, minOverlap=1)
> rpkm_scores <- dba.peakset(rpkm_dba,bRetrieve=TRUE)

If only a subset of the samples are present int he contrasts, you can supply a mask= argument to dba.count() specifying which samples you are interested in.

A less computationally expensive way to do this is to get the alread-computed RPKM scores and merge them with the peaks you interested in:

> my_dba <- dba.count(my_dba, peaks=NULL, score=DBA_SCORE_RPKM)
> all_rpkm <- dba.peakset(my_dba, bRetrieve=TRUE)
> res_dba <- dba.report(my_dba, contrast=1:5, bDB=TRUE)
> DBpeaks <- dba.peakset(res_dba, bRetrieve=TRUE)
> overlaps <- findOverlaps(all_rpkm,DBpeaks, select=“first”)
> overlaps <- !is.na(overlaps)
> rpkm_scores <- rpkm[overlaps,]

Hope this helps!
-Rory