Hi,
I'll describe my problem from the very beginning. I have a cluster at my disposal, and so for alignment I chunked up my FASTQ files into 1M read chunks. Using the cluster I aligned all of them with the presplit_map.py from diffHic. Later on while merging for the MarkDuplicates step I found out that the mate information was not in sync, so after a conversation with the author,
I ran all the BAM chunks through the FixMateInformation of Picard, followed by MergeSamFiles from Picard without sorting enabled.
So I had to run SortSam as well, which produced the error "Mapped mate should have reference name" the error that FixMateInformation was supposed to fix. So I went back and used MergeSamFiles with sort enabled, it worked perfectly fine followed by markduplicates which produced approximately 120M mapped read pairs in the Metrics file after accounting for 30M read pairs which it thought to be duplicates.
After running PreparePairs(), i lost 99% of my reads as singletons. So I reasoned that,
1. FixMateInformation was unable to fix the original problem, and MarkDuplicates did not validate beyond checking the flags which told it that read pairs had mapped properly. If I run validatesamfile on the duplicates marked BAM file i still get errors in it.
2. My breakdown fastq into chunks and align chunks altogether might not have worked properly. So I created a script to check the order between two mate files. I created an awk script to reorder one mate file based on the order of the other mate, and then I did a diff between the reordered mate file and original mate file. No differences were found meaning the order of both mates are the same. So alignment was fine.
It is not possible that so many mates do not mate, because alternatively I used TADBit's alignment feature to align my files and on average around 800K reads mapped per chunk. So I do not know what is going wrong here. Anyone have any ideas?
Thanks so much! I thought that was one of the cases that were occuring, but for some reason I coordinate sorted my BAM file, I don't know why....thanks so much!!!!!!