Hello!
I am trying to generate a motif matrix for my ATAC-Seq data using the mm39 genome for annotations in the Signac R package. My code is below, and using this code, I consistently run into this error message: trying to load regions beyond the boundaries of non-circular sequence "chr1"
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
seqlevelsStyle(annotations) <- 'UCSC'
genome(annotations) <- 'mm39'
RNA_integrated <- AddMotifs(
object = RNA_integrated,
genome = BSgenome.Mmusculus.UCSC.mm39,
pfm = pfm
)
The strange thing is that when I revert back to the mm10 mouse genome, I can generate a motif matrix without a single hitch, so I am wondering if there is something I need to change in my code to accommodate for the mm39 genome? Any and all help would be greatly appreciated!
Unfortunately I've updated my code with the newest GRCm39 genome, and yet I am still getting the same error.... Is there anything else I can try to fix this error other than just trying all of the GRCm39 versions one by one?
You will probably have to ask the Signac people. I don't know exactly what is going on under the hood for
AddMotif
, but theEnsDb
seems OK to me.Presumably what they are doing is extracting the sequences, which appears to work (there's nothing off the end of chr1), so I don't think it's an issue with any Bioconductor packages/functions.
This is the traceback I got after running AddMotifs if this is informative?
It's not informative except for me to say I was right that it's
getSeq
that is providing the error. It appears thatAddMotifs
is passing data off tomatchMotifs
in themotifmatchr
package (both not Bioconductor, so we are off topic to a certain extent), and thenmatchMotifs
does stuff and then callsgetSeq
. AndgetSeq
says 'not so fast my friend!'So, things you can do.
1.) set
options(error = recover)
and when it blows up you can check to see what objects are being passed intomatchMotifs
and maybe figure out what the problem is. This works sometimes IMO, but sometimes it's just more confusing. 2.) usetrace(matchMotifs, browser, signature = c(pwms = "PWMatrixList", subject = "GenomicRanges"))
and then runAddMotifs
again. This will drop into a browser if I am correct that you are dispatching on that signature. You can then step through until you get togetSeq
and then inspect what is being used as the genome and the subject. Note that you can use the<<-
function to dump things out to your .GlobalEnv (your workspace), so you could just step through until you get to the relevant step and then dopwms <<- pwms
andsubject <<- subject
, then quit the debugger (usingQ
), and then you can inspect those two objects to figure out what's wrong.And if what's wrong makes sense, and provides you with a way to get things to work, then good on ya! Or maybe that will just give you information that you can provide to either the Sajita or Greenleaf lab if it's an obvious bug. But at this point I think it's up to you or the authors of the code that's not working to figure out.