DiffBind for ATAC-Seq count extraction
0
Entering edit mode
Timothy • 0
@tstearns78
Last seen 4 months ago
USA/Bar Harbor/The Jackson Laboratory

I'm currently using esATAC to process a number of ATAC-Seq samples that are done in two biological replicates across four sample groups. I'm interested in using DiffBind to calculate read counts from the resulting PeakCallingFseq.bed and Bowtie2Mapping.bam files. Ultimately, I would like to analyze differential binding using these read counts in edgeR.

I'm using the following information in a dba sample sheet:

• SampleID: a unique ID that is composed of the content from Factor and Replicate columns (8 samples)
• Factor: A combination of content from Condition and Treatment columns (4 sample groups)
• Condition: A cell type (2 conditions)
• Treatment: A genotype (2 treatments)
• Replicate: 1 or 2 (2 replicates per sample group)
• bamReads: file path to the Bowtie2Mapping.bam esATAC file
• bamControl: I'm unsure if this is pertaining to our wild-type control genotype, but I don't believe so. I don't wish to have the control samples directly adjust non-control read counts outside of the necessary normalization procedure; hence I've omitted this column.
• ControlID: I believe I can omit this column, but I saw an example as having this column present with "input" for each sample. I have included this column, but seeing as I don't have a bamControl column perhaps it's not needed.
• Peaks: file path to the PeakCallingFseq.bed esATAC file
• PeakCaller: bed (peak score is in 5th column)

I'm able to extact read counts using the below code (R 3.6.0; DiffBind 2.12.0):

#Create DBA object:
dba.in = dba(sampleSheet="sampleSheet.csv")

#Calculate counts per sample:
dba.count.norm = dba.count(dba.in)

#Export out a dataframe of TMM-normalized reads for input into edgeR:
counts.norm = dba.peakset(dba.count.norm, bRetrieve = T, DataType = DBA_DATA_FRAME)


I believe the steps I have taken are correct, but I've noticed an issue with the counts that I'm wondering may either be attributed to an error made in the steps above or possibly a limitation of TMM normalization. Our project contains samples that have varying library sizes - some samples are twice the size of others. As normalization has already been performed on the counts via the dba.count function, I've omitted normalization in edgeR (with calcNormFactors(method="none")), yet I appear to be seeing bias in our analyses that may be attributed to the vastly different library sizes. If this issue isn't due to an error made in the steps above, I'm wondering if it possible to prevent the dba.count function from applying a default TMM normalization, so that I can use various normalization methods within edgeR, such as TMMwsp or RLE.

Thank you kindly for your feedback and time!

-Tim-

esATAC edgeR DiffBind • 91 views
1
Entering edit mode
Rory Stark ♦ 3.6k
@rory-stark-5741
Last seen 8 hours ago
CRUK, Cambridge, UK

You are fine to leave out the controls, these are meant to be Inputs or non-specific antibody tracks that are mostly relevant for ChIP-seq and less so for ATAC.

If you are extracting the count matrix from DiffBind and using it directly in your own calls to edgeR functions, you should not use normalized counts. You should extract raw counts (score=DBA_SCORE_READS) and do the normalization within edgeR. The returned normalized counts are only useful for plotting and distribution analysis, and should never be used as input to edgeR.

I will note that the TMM normalization in edgeR, which was developed for RNA-seq where most of the reads map to transcripts and most transcripts do not change expression, is frequently inappropriate for ATAC-seq. In ATAC-seq, there are bigger differences in the proportion of reads that map to "peaks", and there can be larger systematic shifts in open chromatin. In this case you may wish to normalize only for library size (by setting the library sizes to the total number of mapped reads and the normalization factors to 1.0. The new vignette for the upcoming release of DiffBind has a detailed section on normalization that may be relevant.