DiffBind: reverse absolute fold change in dba.report ie. abs(rep$Fold)
Entering edit mode
Last seen 9 weeks ago
United States

I generated a DBA object with multiple distinct contrasts, all with differentially accessible peaks. I want to generate a data frame with all these peaks and fold changes to plot a fold change x fold change plot for 2 of the contrasts.

dat <- dba.contrast(dat, group1=dat$masks$CON.treated, group2=dat$masks$CON.saline, name1="CON.TRT", name2="CON.SAL")
dat <- dba.contrast(dat, group1=dat$masks$EXP.treated, group2=dat$masks$EXP.saline, name1="EXP.TRT", name2="EXP.SAL")
dat <- dba.analyze(dat, bSubControl = FALSE)
dbSites <- dba.report(dat, bDB=TRUE)

In the dbSites$binding I get the consensus peaks of both contrasts with the fold change. (also in dbSites$peaks)

    1  7395534  7395831               0.82               0.87
    1  9547934  9548310               0.94               1.48
    1 10105145 10105502               0.87               0.00

However, these are the absolute value of the fold change abs(rep$Fold)

    Chr    Start      End abs(rep$Fold)
22 chr1  7395534  7395831          0.82
26 chr1  9547934  9548310          0.94
54 chr1 10105145 10105502          0.87
63 chr1 13153172 13153547          0.63

How can I get the Fold change with sign for direction? I ultimately would like to plot(dbSites$peaks[[1]]$abs(rep$Fold), dbSites$peaks[[2]]$abs(rep$Fold) but with the correct directions.

diffbind atac • 227 views
Entering edit mode
Rory Stark ★ 4.1k
Last seen 17 days ago
CRUK, Cambridge, UK

The report-based DBA object you are getting from dba.report() when you set bDB=TRUE is not really meant to be used in this way. There is no need to access internal, undocumented aspects of the DBA object when the documented interface will provided the information you need.

The preferred way would be to call dba.report() without setting bDB. This will return a GRanges object (or a data.frame if you set DataType=DBA_DATA_FRAME), which includes all the statistics and the 'Fold' with the sign. Of course, you then have to figure out the "consensus" peaks yourself.

Here's some code for doing all this:

# Retrieve stats for all the sites for both contrasts   
allSites <- lapply(1:2,function(contrast){dba.report(dat,contrast, th=1)})

# Filter to include only DB sites and get "consensus"
dbSites <- lapply(allSites,function(x){x[x$FDR<0.05,]})
consensus <- union(dbSites[[1]], dbSites[[2]])

# Get list of consensus sites with statistics
consSites <- lapply(allSites,function(x){sort(x[x %in% consensus,])})

# Plot Fold changes
Entering edit mode

Thanks for your fast reply, Rory! The ordering is not accurate even with the sort function so the plot makes an X shape which is not right. I think it is sorting the GRanges on the seqnames but not consistently between the 2 contrasts. (Contrast 1 starts with chr4 and ends with chr10, Contrast 2 starts with chr1 and ends with chrX). So the peak information on each row isn't the same between the 2 contrasts in the consSites. I found a work around. It is not as elegant as your 5 lines, but I think this works. I forgot about th=1 that made things easier and converted to dataframes with DataType = DBA_DATA_FRAME, and luckily the row.names are consistent across both contrasts. Yeah!

# without bDB=T the dba.plotVenn won't work below
dbSites <- dba.report(dat, bDB = TRUE)

# get the dba.report for each contrast
 df.con <- dba.report(dat, contrast=1, th=1, DataType = DBA_DATA_FRAME)
 df.mut <- dba.report(dat, contrast=2, th=1, DataType = DBA_DATA_FRAME)

# merge by row.names because luckily that is consistent between both dba.reports!!!
data <- merge(df.con, df.mut, by="row.names", all.x=TRUE, all.y=TRUE)
data$Row.names <- as.numeric(data$Row.names)
data <- data[order(data$Row.names),]

# double check the order is correct and clean up
all.equal(data$start.x, data$start.y)
all.equal(data$end.x, data$end.y)
all.equal(data$width.x, data$width.y)
all.equal(data$strand.x, data$strand.y)
data <- data[,c(-11:-13)]

# get out the sig peaks
  for (i in c(1:nrow(data))){
  if (data$FDR.x[i]<0.05 & data$FDR.y[i]<0.05){
    data$GROUP[i]="both" }
  else if (data$FDR.x[i]<0.05 & data$FDR.y[i]>0.05){
  else if (data$FDR.x[i]>0.05 & data$FDR.y[i]<0.05){
  else {
data$GROUP <- factor(data$GROUP)

# compare with dba.plotVenn, was pretty close off by just a few. 
dba.plotVenn(dbSites, 1:2)

ggplot(data[data$GROUP!="none",], aes(x=Fold.x, y=Fold.y, color=GROUP)) + geom_point() + theme_cowplot() + scale_color_manual(values = c("blue","black","red")) + geom_abline(slope = 1, intercept = 0) + xlim(-3,3.2) + ylim(-3,3.2)
Entering edit mode

Looks good! I tested my script on the sample data which only has one chromosome, so I missed this issue. I had considered merging by row.name, but I was trying to do it in fewer lines.

It looks like you have a good idea of what is going on now!


Login before adding your answer.

Traffic: 414 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6