Diffbind "No genome detected"
2
0
Entering edit mode
kyliecode • 0
@kyliecode-14088
Last seen 4 months ago
Sweden

Hi! I am trying to run Diffbind for differential enrichment analysis of my ChIP-seq datasets.

To start with, .broadPeak bed files (both ChIP & Input) and .bam files were supplied to Diffbind. However, it failed to generate the greylist. It is something about my bam files? The bam files are having ensembl chromosome names. Or if there is anything related to the header of the bam files?

Thanks in advance!

test <- dba.blacklist(test, blacklist=DBA_BLACKLIST_MM10, greylist=TRUE)
No genome detected.

test.greylist <- dba.blacklist(test, Retrieve=DBA_GREYLIST)
Error: No greylist

sessionInfo( )
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_3.9.6              SummarizedExperiment_1.30.1 Biobase_2.60.0              MatrixGenerics_1.12.0      
 [5] matrixStats_0.63.0          GenomicRanges_1.52.0        GenomeInfoDb_1.36.0         IRanges_2.34.0             
 [9] S4Vectors_0.38.1            BiocGenerics_0.46.0         lubridate_1.9.2             forcats_1.0.0              
[13] stringr_1.5.0               purrr_1.0.1                 readr_2.1.4                 tidyr_1.3.0                
[17] tibble_3.2.1                ggplot2_3.4.2               tidyverse_2.0.0             dplyr_1.1.2                
[21] magrittr_2.0.3             

loaded via a namespace (and not attached):
 [1] bitops_1.0-7             deldir_1.0-9             rlang_1.1.1              compiler_4.3.0          
 [5] png_0.1-8                vctrs_0.6.2              pkgconfig_2.0.3          crayon_1.5.2            
 [9] fastmap_1.1.1            XVector_0.40.0           caTools_1.18.2           utf8_1.2.3              
[13] Rsamtools_2.16.0         tzdb_0.4.0               zlibbioc_1.46.0          DelayedArray_0.26.2     
[17] BiocParallel_1.34.1      jpeg_0.1-10              irlba_2.3.5.1            parallel_4.3.0          
[21] R6_2.5.1                 stringi_1.7.12           RColorBrewer_1.1-3       SQUAREM_2021.1          
[25] limma_3.56.1             rtracklayer_1.60.0       numDeriv_2016.8-1.1      Rcpp_1.0.10             
[29] Matrix_1.5-4.1           timechange_0.2.0         tidyselect_1.2.0         rstudioapi_0.14         
[33] yaml_2.3.7               gplots_3.1.3             codetools_0.2-19         hwriter_1.3.2.1         
[37] lattice_0.21-8           plyr_1.8.8               withr_2.5.0              ShortRead_1.58.0        
[41] coda_0.19-4              Biostrings_2.68.1        pillar_1.9.0             KernSmooth_2.23-21      
[45] generics_0.1.3           invgamma_1.1             RCurl_1.98-1.12          truncnorm_1.0-9         
[49] emdbook_1.3.12           hms_1.1.3                munsell_0.5.0            scales_1.2.1            
[53] ashr_2.2-54              gtools_3.9.4             glue_1.6.2               tools_4.3.0             
[57] apeglm_1.22.1            interp_1.1-4             BiocIO_1.10.0            BSgenome_1.68.0         
[61] locfit_1.5-9.7           GenomicAlignments_1.36.0 systemPipeR_2.6.0        XML_3.99-0.14           
[65] mvtnorm_1.1-3            grid_4.3.0               bbmle_1.0.25             amap_0.8-19             
[69] bdsmatrix_1.3-6          latticeExtra_0.6-30      colorspace_2.1-0         GenomeInfoDbData_1.2.10 
[73] restfulr_0.0.15          cli_3.6.1                GreyListChIP_1.32.0      fansi_1.0.4             
[77] mixsqp_0.3-48            S4Arrays_1.0.4           gtable_0.3.3             digest_0.6.31           
[81] ggrepel_0.9.3            rjson_0.2.21             htmlwidgets_1.6.2        htmltools_0.5.5         
[85] lifecycle_1.0.3          MASS_7.3-60             

```
DiffBind • 1.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States

You can specify the greylist in the same manner as the blacklist, which should fix your problem.

ADD COMMENT
0
Entering edit mode

Hi James,

Thanks for your reply!

For the blacklist, I just used the one provided by DiffBind, which is "DBA_BLACKLIST_MM10". For the greylist, I did not specify the greylist, as I want Diffbind generates the greylist using the input bam I provided. But, I got the error mentioned.

Wanna know why this happens... and how to solve it.

ADD REPLY
1
Entering edit mode

The error you see comes from an internal function called pv.genome that inspects the header of your BAM file and tries to infer the genome based on that. It is unable to do so and returns the error you see. I would imagine that it's because you aligned vs Ensembl (no chr in the chromosome name), but then you are using a UCSC based BSgenome package for the blacklist. I don't know if there is a simple fix, so hopefully Rory Stark will be along in a while to provide advice.

ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 28 days ago
Cambridge, UK

In this case, DiffBind is unable to automatically determine the genome from the supplied Input bam file. You should be able to specify greylist=DBA_BLACKLIST_MM10 to use BSgenome.Mmusculus.UCSC.mm10.

I'd need to see your control bam file to understand why the genome is undetected.

I see that if the genome is not detectable from the Input file, but one was supplied explicitly using the blacklist= parameter, it should default to the genome supplied by blacklist=rather than making you specify it twice. I'll file this as a low-priority fix.

ADD COMMENT
0
Entering edit mode

Hi Dr. Stark,

Thanks very much for your reply!

The header from one of my input is as follow:

@HD VN:1.6  SO:coordinate
@SQ SN:10   LN:130694993
@SQ SN:11   LN:122082543
@SQ SN:12   LN:120129022
@SQ SN:13   LN:120421639
@SQ SN:14   LN:124902244
@SQ SN:15   LN:104043685
@SQ SN:16   LN:98207768
@SQ SN:17   LN:94987271
@SQ SN:18   LN:90702639
@SQ SN:19   LN:61431566
@SQ SN:1    LN:195471971
@SQ SN:2    LN:182113224
@SQ SN:3    LN:160039680
@SQ SN:4    LN:156508116
@SQ SN:5    LN:151834684
@SQ SN:6    LN:149736546
@SQ SN:7    LN:145441459
@SQ SN:8    LN:129401213
@SQ SN:9    LN:124595110
@SQ SN:MT   LN:16299
@SQ SN:X    LN:171031299
@SQ SN:Y    LN:91744698
@PG ID:bowtie2  PN:bowtie2  CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 -x ./Bowtie2Index/genome --threads 12 --passthrough -1 Input_0hr_Rep3_T1_1_val_1.fq.gz -2 Input_0hr_Rep3_T1_2_val_2.fq.gz" VN:2.4.4
@PG ID:samtools PN:samtools CL:samtools view --threads 12 -o Input_0hr_Rep3_T1.Lb.bam - PP:bowtie2  VN:1.15.1
@PG ID:samtools.1   PN:samtools CL:samtools sort -@ 6 -o Input_0hr_Rep3_T1.Lb.sorted.bam -T Input_0hr_Rep3_T1.Lb.sorted Input_0hr_Rep3_T1.Lb.bam    PP:samtools VN:1.15.1
@PG ID:MarkDuplicates   PN:MarkDuplicates   CL:MarkDuplicates --INPUT Input_0hr_Rep3.mLb.sorted.bam --OUTPUT Input_0hr_Rep3.mLb.mkD.sorted.bam --METRICS_FILE Input_0hr_Rep3.mLb.mkD.sorted.MarkDuplicates.metrics.txt --REMOVE_DUPLICATES false --ASSUME_SORTED true --TMP_DIR tmp --VALIDATION_STRINGENCY LENIENT --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --FLOW_MODE false --FLOW_QUALITY_SUM_STRATEGY false --USE_END_IN_UNPAIRED_READS false --USE_UNPAIRED_CLIPPED_END false --UNPAIRED_END_UNCERTAINTY 0 --FLOW_SKIP_FIRST_N_FLOWS 0 --FLOW_Q_IS_KNOWN_END false --FLOW_EFFECTIVE_QUALITY_THRESHOLD 15 --ADD_PG_TAG_TO_READS true --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false  VN:Version:2.27.4-SNAPSHOT
@PG ID:samtools.2   PN:samtools CL:samtools view -F 0x004 -F 0x0008 -f 0x001 -F 0x0400 -q 1 -L genome.include_regions.bed -b Input_0hr_Rep3.mLb.mkD.sorted.bam  PP:samtools.1   VN:1.15.1
@PG ID:samtools.3   PN:samtools PP:samtools.2   VN:1.15.1   CL:samtools sort -n -@ 6 -o Input_0hr_Rep3.mLb.clN.name.sorted.bam -T Input_0hr_Rep3.mLb.clN.name.sorted Input_0hr_Rep3.mLb.flT.sorted.bam
@PG ID:samtools.4   PN:samtools PP:samtools.3   VN:1.15.1   CL:samtools sort -@ 6 -o Input_0hr_Rep3.mLb.clN.sorted.bam -T Input_0hr_Rep3.mLb.clN.sorted Input_0hr_Rep3.mLb.clN.bam
@PG ID:samtools.5   PN:samtools PP:samtools.4   VN:1.17 CL:samtools view -H /Users/hinmanmak/Documents/NGS/Kylie/ChIP/P28364/Analysis/nf-core/macs2/broad/0.1/output/bowtie2/mergedLibrary/Input_0hr_Rep3.mLb.clN.sorted.bam

As the datasets were aligned vs Ensembl, it seems that I cannot use BSgenome.Mmusculus.UCSC.mm10.

In this case, anything I can do as a workaround, rather than generating the bam files again? I can supply the blacklist instead of using blacklist=DBA_BLACKLIST_MM10. But what should I do to let Diffbind generates the greylist using the input bam I provided, in this case?

Or I should I generate the Master greylist by GreyListChIP? Obtain greylist from each input.bam using default settings, and then merge all regions from the lists by simple overlapping method (e.g. 1bp overlap)?

Thanks a lot!

ADD REPLY
0
Entering edit mode

Hi :)

Were you able to solve the problem? I am experiencing the same problem right now ..

ADD REPLY
0
Entering edit mode

What happens if you do this:

library(BSgenome.Mmusculus.UCSC.mm10)
seqlevelsStyle(Mmusculus) <- "NCBI"

And then follow Rory's suggestion for specifying the greylist?

ADD REPLY

Login before adding your answer.

Traffic: 656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6