MEDIPS: Number of windows tested in Paired-end data set
1
0
Entering edit mode
@petermcerlean-9754
Last seen 8.2 years ago

Hi Lukas,

I’ve been dealing with some paired-end ChIP-seq data from an inducible-transcription factor experiment.

For each of the samples (x3 treatment & controls) and inputs (x1 each treatment  & control) I’ve done:

MEDIPS.createSet (file = Sample, BSgenome = "BSgenome.Hsapiens.UCSC.hg19", uniq = 1, extend = 120, shift = 0, window_size = 1000, paired =TRUE, bwa=TRUE, chr.select = c("chr1"…."chrY"))

When the samples load, I see that ends which are far apart are bing joined e.g.:

Sample 1

Mean insertion size: 11078.77 nt

SD of the insertion size: 1016181 nt

Max insertion size: 235650514 nt

Min insertion size: 0 nt

I go ahead and run the analysis:

 

MEDIPS.meth (MSet1 = Treatment, MSet2 = Control, ISet1 = Treatment.Input, ISet2 = Control.Input, p.adj = "bonferroni", diff.method = "edgeR", minRowSum=100, MeDIP=F, quantile=TRUE)

I find that almost all the windows are being test for, even if the minRowSum is increased to some crazy level:

minRowsum=100

Total windows: 3,095,689  

Windows Tested: 3,090,305

P<0.05: 247,278

minRowsum=1,000

Total windows: 3,095,689  

Windows Tested: 1,648,789

P<0.05: 153,028

I made extend=0 and removed duplicates but still get this:

minRowsum=100

Total windows: 3,095,689  

Windows Tested: 3,016,374

P<0.05: 243,846

I’ve visualized the BAM files via deeptools and found that there are probably <5,000 discrete regions of enrichment in our treatment samples, consistent with our transcription factor being induced.

I think the mating of pairs over such a large distance in causing this issue but any ideas on how to resolve it?

Kind Regards

Peter

medips r • 2.2k views
ADD COMMENT
0
Entering edit mode
@petermcerlean-9754
Last seen 8.2 years ago

Hi Lukas,

I've just been running the analysis without the paired end parameters and have gotten better results:

Unique

MinRowSum

Paired

BWA

Total Windows 

Windows Tested

P<0.05

Adj P<0.05

1

50

FALSE

FALSE

3,095,689

231,444

41,099

64

1

100

FALSE

FALSE

3,095,689

65,927

18,429

183

1

200

FALSE

FALSE

3,095,689

15,735

6,639

322

I've inspected some of the significant windows visually and the regions that gain enrichment look good. However, those that lose enrichment just seem to be background, which I'm guessing is due to the nature of the induction experiment. 

Can I trust these results over the paired-end approach?

Thanks!

Peter.

ADD COMMENT
0
Entering edit mode
Hi Peter, please find below my response to your first email below (your second email arrived while I was writing). For paired-end reads, the insert size is known. Therefore, MEDIPS does not use the extend parameter but considers the entire fragment between mates for calculating coverage. If you process paired end data in single-end mode, information abt the insert size will be lost and reads will be extended by the extend parameter instead. Moreover, both mates of a Pair will be imported and considered for calculating coverage, therefore the counts are twice the actually sequenced DNA fragments. In my opinion, none of this has a severe negative effect on your results. However, processing paired-end data in paired-end mode is more appropriate. It still puzzle’s me what is going wrong in your case. Bam files are imported by Rsamtools and relevant parameters are: scanFlag = scanBamFlag(isPaired = T, isProperPair = TRUE, hasUnmappedMate = FALSE, isUnmappedQuery = F, isFirstMateRead = T, isSecondMateRead = F) Only the first mate of properly paired mates are supposed to be imported. As the first mate also has the information abt the insert size, MEDIPS will extend the first mate by the insert size prior to calculating coverage at genome wide windows. I am wondering, if e.g. your bwa paired-end bam files have ‘proper Paired’ reads included where both mates span long distances and have long insert sizes? All the best, Lukas Hi Peter, thank you for reporting this observation. Filtering of lowly covered genomic regions happens in the MEDIPS.diffMeth function which is called within the MEDIPS.meth function. Its really straight forward and I currently do not see what can go wrong here. Its roughly: (...) filter = rowSums(values) >= minRowSum (...) cat(paste("Execute edgeR for count data of", sum(filter), "windows...\n", sep = " “)) (...) d <- edgeR::DGEList(counts = values[filter, ], group = edRObj.group) (…) where values is a matrix with rows=genomic windows and columns=all samples at MSet1 and MSet2 (Input is not considered). Moreover, I do see an effect of the minRossum parameter in your example, i.e an increased value results in a decreased number of windows to be tested for differential enrichment. The major effect of reduced number of windows will be on the adjusted p-value. However, I also think its surprising that there remain so many windows which have a coverage > 1000 as a sum over all samples. Can you check if this is true by for example examining the count table for such windows? As you have only 6 ChIP samples, there must be ~166 reads per sample in average for windows surviving a minRowsum parameter of 1000. Do you have such a deep coverage? All the best, Lukas On 21 Apr 2016, at 18:53, peter.mcerlean [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User peter.mcerlean<https: support.bioconductor.org="" u="" 9754=""/> wrote Answer: MEDIPS: Number of windows tested in Paired-end data set<https: support.bioconductor.org="" p="" 81098="" #81117="">: Hi Lukas, I've just been running the analysis without the paired end parameters and have gotten better results: Unique MinRowSum Paired BWA Total Windows Windows Tested P<0.05 Adj P<0.05 1 50 FALSE FALSE 3,095,689 231,444 41,099 64 1 100 FALSE FALSE 3,095,689 65,927 18,429 183 1 200 FALSE FALSE 3,095,689 15,735 6,639 322 I've inspected some of the significant windows visually and are in agreement (i.e. Gain or loss in treatment Vs control). Can I trust these results over the paired-end approach? Thanks! P. ________________________________ Post tags: medips, r You may reply via email or visit A: MEDIPS: Number of windows tested in Paired-end data set
ADD REPLY
0
Entering edit mode

Hi Lukas,

 

I thought the same about there being some ‘proper pairs’ with large inserts so ran Picard-CollectInsertSizeMetrics and sure enough get something like this:

 

MEDIAN_INSERT_SIZE          323

MEDIAN_ABSOLUTE_DEVIATION    75

MIN_INSERT_SIZE     30

MAX_INSERT_SIZE     243946804

MEAN_INSERT_SIZE  332.5694

STANDARD_DEVIATION       115.047712

READ_PAIRS   8731854

 

I filter my BAMs as ‘proper-pair, inserts <=500’ but still loads into MEDIPS with huge insert present:

 

Mean insertion size: 4082.162 nt

SD of the insertion size: 581612.2 nt

Max insertion size: 213644984 nt

 

It would seem that the filter may not be doing the job.

 

The thing is that I can run MACS2 with the BWA aligned BAMs it builds a model no problem with insert size as 300bp?!?!

 

I decided to realign with Bowtie2 (max insert 500bp) and run analysis again (unique=1, paired=T, extend=120) and everything worked fine:

 

MinRowSum

Total Windows

Windows Tested

P<0.05

50

3,095,689

92,626

16,601

100

3,095,689

27,525

8,961

300

3,095,689

1,891

695

 

Not sure what the issue was with the BWA aligned files.  I can send them to you if you’d like?

 

Thanks again!

 

Peter

ADD REPLY
0
Entering edit mode
Dear Peter, thank you for further investigating this issue. Insert sizes appear to be much more reasonable in the bowtie2 case. I have previously observed the 'negative widths are not allowed’ issue for bwa only and not for bowtie. Can you process the bowtie2 alignments by setting paired=T and bwa=T? I urgently need to further investigate the behaviour of RSamtools for the different aligners. I’ll appreciate any comments and hints abt this topic! All the best, Lukas On 27 Apr 2016, at 15:22, peter.mcerlean [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User peter.mcerlean<https: support.bioconductor.org="" u="" 9754=""/> wrote Comment: MEDIPS: Number of windows tested in Paired-end data set<https: support.bioconductor.org="" p="" 81098="" #81985="">: Hi Lukas, I thought the same about there being some ‘proper pairs’ with large inserts so ran Picard-CollectInsertSizeMetrics and sure enough get something like this: MEDIAN_INSERT_SIZE 323 MEDIAN_ABSOLUTE_DEVIATION 75 MIN_INSERT_SIZE 30 MAX_INSERT_SIZE 243946804 MEAN_INSERT_SIZE 332.5694 STANDARD_DEVIATION 115.047712 READ_PAIRS 8731854 I filter my BAMs as ‘proper-pair, inserts <=500’ but still loads into MEDIPS with huge insert present: Mean insertion size: 4082.162 nt SD of the insertion size: 581612.2 nt Max insertion size: 213644984 nt It would seem that the filter may not be doing the job. The thing is that I can run MACS2 with the BWA aligned BAMs it builds a model no problem with insert size as 300bp?!?! I decided to realign with Bowtie2 (max insert 500bp) and run analysis again (exclude ‘bwa’ option, paired=T) but get this: Mean insertion size: 312.011 nt SD of the insertion size: 96.95693 nt Max insertion size: 500 nt Min insertion size: 30 nt Creating GRange Object... Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : solving row 216: negative widths are not allowed Any further ideas? I can send you some of the BAM files if you’d like? Leaning towards analyzing the bwa data as single read and bumping up the minRow Sum to account for the double counts as per your suggestion. Thank again! Peter ________________________________ Post tags: medips, r You may reply via email or visit C: MEDIPS: Number of windows tested in Paired-end data set
ADD REPLY
0
Entering edit mode

 

 

Hi Lukas

I edited my comment above apparently just before your response (always seem to do this!) but when I put the extend=120 I get success with the Bowtie BAMs. 

I'll run them again with extend=0 and bwa/paired options and let you know how it goes. 

Best,

Peter

ADD REPLY
0
Entering edit mode

Hi Lukas,

Everything ran smoothly with the Bowtie BAMs using these various parameters:

Unique MinRowSum Paired BWA EXTEND Total Windows  Windows Tested P<0.05
1 50 TRUE TRUE 0 3,095,689 84,713 14,949
1 100 TRUE TRUE 0 3,095,689 24,796 7,960
1 300 TRUE TRUE 0 3,095,689 1,715 578

Let me know if you'd like to trouble shoot anything else. 

Best, P.

ADD REPLY

Login before adding your answer.

Traffic: 643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6