Search
Question: dba.count with pre-defined regions
0
gravatar for sylvia
10 months ago by
sylvia10
sylvia10 wrote:

Hi,
I'm trying to use dba.contrast and dba.analyze to find differential binding sites within specific regions (ie  promoter region ranging from 1000bp upstream of TSS and 500 bp downstream of TSS for a set of genes). I realized that dba.count can use pre-defined regions with the peaks argument so I did something like the following:

# create dba object

TF.peaks <- dba(sampleSheet = "experiments.csv",peakCaller = "macs",peakFormat = "bed",scoreCol = 5 );

# create a GRanges object with specific region

genes <- read.delim('gene_list.txt', as.is = TRUE);
grange <- makeGRangesFromDataFrame(genes);

promoter <- promoters(grange, upstream = 1000, downstream = 500);

# count the reads in the pre-defined region before running dba.contrast and dba.analyze

TF.counts <- dba.count(DBA = TF.peaks, peaks = promoter, score = DBA_SCORE_RPKM);

I got no error message, however, my TF.counts chromosome regions are beyond the range of some chromosome sizes. For example:
range(TF.counts$peaks[[1]]$Start[TF.counts$peaks[[1]]$Chr == 'chr22'])
[1] 22082728 102264308
I double checked my input TF.peaks and promoter which all have the correct chr22 size
range(TF.peaks$peaks[[1]]$V2[TF.peaks$peaks[[1]]$V1 == 'chr22'])
[1] 17639813 51222046
promoter.sub <- promoter[promoter@seqnames == 'chr22',]
range(promoter.sub@ranges@start)
[1] 17081777 51221591
Any idea what's causing the issue? Or is there a better way to look at a differential binding site for specific regions? (I'm using R v3.3.1 and DiffBind v2.2.12)
Many thanks,

Sylvia

 

ADD COMMENTlink modified 10 months ago by Rory Stark2.6k • written 10 months ago by sylvia10
1
gravatar for Rory Stark
10 months ago by
Rory Stark2.6k
CRUK, Cambridge, UK
Rory Stark2.6k wrote:

If you can send me the intervals you are passing in as promoter, I can take a look at what is happening. If possible, it would be good to have a copy of the main DBA object TF.peaks -- if this is too large, you can send me a download link (Dropbox or similar).

Cheers-

Rory

rory.stark @ cruk.cam.ac.uk

P.S. It is best to tag DiffBind questions with the DiffBind tag as well...

ADD COMMENTlink written 10 months ago by Rory Stark2.6k

Thanks Rory! Just sent.

Best,

Sylvia

ADD REPLYlink written 10 months ago by sylvia10
1
gravatar for Rory Stark
10 months ago by
Rory Stark2.6k
CRUK, Cambridge, UK
Rory Stark2.6k wrote:

I've had a look and identified a bug, which should be fixed by the end of the week in DiffBind 2.6.2.

I see you are using an older version, I can provide a workaround. You just need to pass in the intervals as a data.frame, with the chromosome names (first column) as character strings:

> promoter <- promoters(grange, upstream = 1000, downstream = 500)
> promoterDF <- data.frame(promoter)
> promoterDF[,1] <- as.character(promoterDF[,1])
> TF.counts <- dba.count(DBA = TF.peaks, peaks = promoterDF, score = DBA_SCORE_RPKM)

Cheers-

Rory

ADD COMMENTlink modified 10 months ago • written 10 months ago by Rory Stark2.6k

Hi Rory,

Thanks for the quick reply, It works now! One more naive question, if I used a pre-defined region is there still a way to re-center each peak around the point of greatest enrichment, like the summit function? so I can use a shorter sequence for motif finding?

Best,

Sylvia

ADD REPLYlink written 10 months ago by sylvia10
1

Yes, the summits parameter should work to re-center intervals around the point of highest pileup.

However I just tested this, and found another bug. I'll check in a fix for DiffBind 2.6.3

I do have a workaround for you:

> TF.counts <- dba.count(TF.peaks, peaks = promoter, score = DBA_SCORE_RPKM, summits=TRUE)
> TF.counts$called <- matrix(1,nrow(TF.counts$merged), length(TF.counts$peaks))
> TF.counts <- dba.count(TF.peaks, peaks = NULL, summits=100) # re-centered peaks 201bp wide

Also, a suggestion when working with these intervals representing annotated genomic features (eg. promoter regions), you should probably use the filter parameter to remove promoters that don't get mapped reads, otherwise there is the potential for a lot of intervals with zero or very low counts. I'd suggest setting filter=5 in the first call to dba.count().

Cheers-

Rory

ADD REPLYlink written 10 months ago by Rory Stark2.6k

Hi Rory,

For the second TF.counts do you mean

TF.counts <- dba.count("TF.counts", peaks = NULL, summits=100)? or I should still use TF.peaks?

This is really helpful! Thanks a lot.

Best,

Sylvia

ADD REPLYlink written 10 months ago by sylvia10

Hi Rory,

Sorry, and one more thing, is it normal that not all the peaks having 201bp w after using summits function?

Best,

Sylvia

ADD REPLYlink written 10 months ago by sylvia10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 197 users visited in the last hour