Question: Genome database and transcription database have different lengths
0
4 weeks ago by
sarac20
sarac20 wrote:

While trying to assign transcript IDs to my tags using the CAGEfightR package:

TSSs <- assignTxID(TSSs, txModels = txdb, swap="thick")


I was met with the following error:

Error: seqlengths(object) not identical to seqlengths(txModels)


Unless I am mistaken, this seems to arise because

length(seqlengths(TSSs))


is 455 because it is created using the BSgenome.Hsapiens.UCSC.hg38 database using the quantifyCTSSs function, while

length(seqlengths(txdb))


is 595 because txdb is the TxDb.Hsapiens.UCSC.hg38.knownGene database. Unfortunately my R-Fu is not advanced enough to solve this. Is there a way to easily rectify this, or am I missing something?

cage cagefightr • 73 views
modified 4 weeks ago by maltethodberg130 • written 4 weeks ago by sarac20
Answer: Genome database and transcription database have different lengths
2
4 weeks ago by
maltethodberg130
Sweden
maltethodberg130 wrote:

You are correct, CAGEfightR is complaining that the two genomes (obtained via seqinfo()/seqlengths() ) are not identical. I have seen a couple of people having similar problems, so CAGEfightR is probably currently a bit too strict in enforcing this. We will probably remove this error in future versions of CAGEfightR and replace it with a warning instead.

Two ways around this:

1) Simply use the genome from seqinfo(txdb) in quantifyCTSSs. 2) Try and overwrite the seqinfo objects, see for example here: https://support.bioconductor.org/p/118989/#119085.

Hope this helps!

1

Thank you! Number 1 worked a treat.