dba.count with pre-defined regions
2
0
Entering edit mode
sylvia ▴ 10
@sylvia-5630
Last seen 3.7 years ago

Hi,
I'm trying to use dba.contrast and dba.analyze to find differential binding sites within specific regions (ie  promoter region ranging from 1000bp upstream of TSS and 500 bp downstream of TSS for a set of genes). I realized that dba.count can use pre-defined regions with the peaks argument so I did something like the following:

# create dba object

TF.peaks <- dba(sampleSheet = "experiments.csv",peakCaller = "macs",peakFormat = "bed",scoreCol = 5 );

# create a GRanges object with specific region

genes <- read.delim('gene_list.txt', as.is = TRUE);
grange <- makeGRangesFromDataFrame(genes);

promoter <- promoters(grange, upstream = 1000, downstream = 500);

# count the reads in the pre-defined region before running dba.contrast and dba.analyze

TF.counts <- dba.count(DBA = TF.peaks, peaks = promoter, score = DBA_SCORE_RPKM);

I got no error message, however, my TF.counts chromosome regions are beyond the range of some chromosome sizes. For example:
range(TF.counts$peaks[[1]]$Start[TF.counts$peaks[[1]]$Chr == 'chr22'])
[1] 22082728 102264308
I double checked my input TF.peaks and promoter which all have the correct chr22 size
range(TF.peaks$peaks[[1]]$V2[TF.peaks$peaks[[1]]$V1 == 'chr22'])
[1] 17639813 51222046
promoter.sub <- promoter[promoter@seqnames == 'chr22',]
range(promoter.sub@ranges@start)
[1] 17081777 51221591
Any idea what's causing the issue? Or is there a better way to look at a differential binding site for specific regions? (I'm using R v3.3.1 and DiffBind v2.2.12)
Many thanks,

Sylvia

dba.count dba chipseq diffbind • 1.1k views
1
Entering edit mode
Rory Stark ★ 4.1k
@rory-stark-5741
Last seen 15 days ago
CRUK, Cambridge, UK

If you can send me the intervals you are passing in as promoter, I can take a look at what is happening. If possible, it would be good to have a copy of the main DBA object TF.peaks -- if this is too large, you can send me a download link (Dropbox or similar).

Cheers-

Rory

rory.stark @ cruk.cam.ac.uk

P.S. It is best to tag DiffBind questions with the DiffBind tag as well...

0
Entering edit mode

Thanks Rory! Just sent.

Best,

Sylvia

1
Entering edit mode
Rory Stark ★ 4.1k
@rory-stark-5741
Last seen 15 days ago
CRUK, Cambridge, UK

I've had a look and identified a bug, which should be fixed by the end of the week in DiffBind 2.6.2.

I see you are using an older version, I can provide a workaround. You just need to pass in the intervals as a data.frame, with the chromosome names (first column) as character strings:

> promoter <- promoters(grange, upstream = 1000, downstream = 500)
> promoterDF <- data.frame(promoter)
> promoterDF[,1] <- as.character(promoterDF[,1])
> TF.counts <- dba.count(DBA = TF.peaks, peaks = promoterDF, score = DBA_SCORE_RPKM)

Cheers-

Rory

0
Entering edit mode

Hi Rory,

Thanks for the quick reply, It works now! One more naive question, if I used a pre-defined region is there still a way to re-center each peak around the point of greatest enrichment, like the summit function? so I can use a shorter sequence for motif finding?

Best,

Sylvia

1
Entering edit mode

Yes, the summits parameter should work to re-center intervals around the point of highest pileup.

However I just tested this, and found another bug. I'll check in a fix for DiffBind 2.6.3

I do have a workaround for you:

> TF.counts <- dba.count(TF.peaks, peaks = promoter, score = DBA_SCORE_RPKM, summits=TRUE)
> TF.counts$called <- matrix(1,nrow(TF.counts$merged), length(TF.counts\$peaks))
> TF.counts <- dba.count(TF.peaks, peaks = NULL, summits=100) # re-centered peaks 201bp wide

Also, a suggestion when working with these intervals representing annotated genomic features (eg. promoter regions), you should probably use the filter parameter to remove promoters that don't get mapped reads, otherwise there is the potential for a lot of intervals with zero or very low counts. I'd suggest setting filter=5 in the first call to dba.count().

Cheers-

Rory

0
Entering edit mode

Hi Rory,

For the second TF.counts do you mean

TF.counts <- dba.count("TF.counts", peaks = NULL, summits=100)? or I should still use TF.peaks?

This is really helpful! Thanks a lot.

Best,

Sylvia

0
Entering edit mode

Hi Rory,

Sorry, and one more thing, is it normal that not all the peaks having 201bp w after using summits function?

Best,

Sylvia