transcriptR ChIP-seq RNA-seq bam Error in mergeNamedAtomicVectors
1
0
Entering edit mode
@jasonwilliams-21306
Last seen 17 months ago

Hi,

I'm following this vignette (https://rdrr.io/bioc/transcriptR/f/vignettes/transcriptR.Rmd) for transcriptR, when estimating gap distance I get the following error.

# create a range of gap distances to test
# from 0 bp to 10000 bp with the step of 100 bp
gd <- seq(from = 0, to = 10000, by = 100)
estimateGapDistance(object = tds, annot = knownGene.genes, filter.annot = TRUE,
fpkm.quantile = 0.25, gap.dist.range = gd)
# view the optimal gap distance minimazing the sum of two errors
tds

Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", : sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM have incompatible seqlengths: - in 'x': 249250621, 243199373, 198022430, 191154276, 180915260, 171115067, 159138663, 146364022, 141213431, 135534747, 135006516, 133851895, 115169878, 107349540, 102531392, 90354753, 81195210, 78077248, 59128983, 63025520, 48129895, 51304566, 155270560, 59373566, 16571 - in 'y': 248956422, 242193529, 198295559, 190214555, 181538259, 170805979, 159345973, 145138636, 138394717, 133797422, 135086622, 133275309, 114364328, 107043718, 101991189, 90338345, 83257441, 80373285, 58617616, 64444167, 46709983, 50818468, 156040895, 57227415, 16569


I suspect this has something to do with (https://support.bioconductor.org/p/101694/).

But I am really not sure, any help would be appreciated. I processed these paired end reads using STAR to generate this bam file. There doesn't seem to be any advice for preprocessing to generate bam files as input for transcriptR. I assume reads should be aligned to the genome (in the case of this vignette hg19)?

Best,

Jason

TranscriptR RNA-seq ChIP-seq • 431 views
0
Entering edit mode

When you are writing out your post, there is a box right below where you are typing that shows what your post will eventually look like. If you re-read your post, do you think that is particularly legible? Ideally you would use the preview box to ensure that what you are posting is legible. Part of getting people to help is on you to make it easy for them to do so.

Also, when you post a question there is this sentence right above the box where you type that says

This site uses markdown. See tutorial post for advise on post formatting.*

0
Entering edit mode

Hi James. I have edited it now. The preview actually wasn't the same as what was being posted. When I removed the code chunk title, the post became the same as the preview.

1
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States

So if I do

> library(BSgenome.Hsapiens.UCSC.hg19)
> seqlengths(Hsapiens)[1:24]
chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022
chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
141213431 135534747 135006516 133851895 115169878 107349540 102531392  90354753
chr17     chr18     chr19     chr20     chr21     chr22      chrX      chrY
81195210  78077248  59128983  63025520  48129895  51304566 155270560  59373566
> library(BSgenome.Hsapiens.UCSC.hg38)

Attaching package: ‘BSgenome.Hsapiens.UCSC.hg38’

The following object is masked from ‘package:BSgenome.Hsapiens.UCSC.hg19’:

Hsapiens

> seqlengths(Hsapiens)[1:24]
chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
248956422 242193529 198295559 190214555 181538259 170805979 159345973 145138636
chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
138394717 133797422 135086622 133275309 114364328 107043718 101991189  90338345
chr17     chr18     chr19     chr20     chr21     chr22      chrX      chrY
83257441  80373285  58617616  64444167  46709983  50818468 156040895  57227415


You can see that one of the things you are using is based on GRCh37, and the other is based on GRCh38 I would hazard a guess that you have aligned your data to GRCh38, and are blindly following the vignette, which uses TxDb.Hsapiens.UCSC.hg19.knownGene, where hg19 is the UCSC identifier for GRCh37. You seem to have realized that you have mis-matched your genomes, but I am not sure I agree with your prescription.

You could

A.) Re-align your data to GRCh37 in order to be able to exactly follow what it says in the vignette.

B.) Use the TxDb.Hsapiens.UCSC.hg38.knownGene, which matches your aligned data, and is not based on a ten year old genome build, which (apparently little known fact) is something like 70% based on sequencing from one dude from western New York state? H/T Kevin Blighe

The vignette is intended to be an example of how one could use a package to do a common task, not an edict of how that task must be accomplished, so all things equal I would choose B.

0
Entering edit mode

Thanks James. I did suspect this was the case, but thank you for confirming,I have to wait for permission to load the hg38.knownGene library. I will be able to do this next week. So yes, I will choose B! High risk of diabetes type I 8| although any reference genome should only be used for alignment purposes surely.