Hope you don't mind that I'm cc'ing the list.
On 05/27/2014 04:17 PM, Tarca, Adi wrote:
Dear Hervé,
Should I worry about the warning below?
I just want to overlap some rna seq reads with known genes.
Do you mean "overlap"?
Thanks,
Adi
txdb2=makeTranscriptDbFromUCSC(
genome="hg19",
>
tablename="knownGene")
Note that we provide a few "TxDb" packages that contain pre-computed
TranscriptDb objects for a few organisms and tracks:
http://bioconductor.org/packages/release/BiocViews.html#___Transcri
ptDb
There is one for hg19/knownGene: the TxDb.Hsapiens.UCSC.hg19.knownGene
package.
package.
Download the knownGene table ... OK
Download the knownToLocusLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TranscriptDb object ... OK
Warning message:
>
In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) :
>
UCSC data anomaly in 50638 transcript(s): the cds cumulative
length is
length is
not a multiple of 3 for transcripts 'uc001aaa.3'
'uc010nxr.1'
'uc009vis.3'
>
'uc009vjc.1' 'uc009vjd.2'
'uc009vit.3'
'uc009viu.3'
> ???uc009viu.3???
'uc001aae.4' 'uc001aai.1' 'uc001aah.4'
'uc009vir.3'
'uc009viq.3'
> ???uc009viq.3???
'uc001aac.4' 'uc009viv.2' 'uc009viw.2'
'uc009vix.2'
'uc009viy.2'
> ???uc009viy.2???
'uc009viz.2' 'uc010nxs.1' 'uc009vje.2'
'uc009vjf.2'
'uc009vjb.1'
> ???uc009vjb.1???
'uc001aak.3' 'uc021oeg.2' 'uc001aaq.2'
'uc001aar.2'
'uc021oeh.1'
> ???uc021oeh.1???
>
'uc009vjk.2' 'uc001aau.3' 'uc001aax.1'
'uc021oej.1'
'uc021oek.1'
> ???uc021oek.1???
'uc021oel.1' 'uc001abb.3' 'uc001abe.4'
'uc001abi.2'
'uc001abj.3'
> ???uc001abj.3???
'uc009vjm.3' 'uc010nxw.2' 'uc001abl.3'
'uc002khh.3'
'uc001abm.2'
> ???uc001abm.2???
'uc001abo.3' 'uc031pjj.1' 'uc001abp.2'
'uc021oem.2'
'uc009vjn.2'
> ???uc009vjn.2???
'uc009vjo.2' 'uc031pjk.1' 'uc001abt.4'
'uc001abu.1'
[... truncated]
> ???u [... truncated]
This warning is wrong. It's actually easy to check that all the CDS
have a cumulative length that is a multiple of 3:
> cds_by_tx <- cdsBy(txdb2, by="tx")
> table(sum(width(cds_by_tx)) %% 3L)
0
63691
Seems to be a regression introduced in BioC 2.14. Someone in Seattle
will work on a fix and we will notify the list when the fix is
available.
Otherwise, assuming the code in charge of issuing the warning is
working properly, you can get a legitimate warning like this for
some combination of UCSC organism/track (but AFAIK never for the
knownGene track). If all you want to do is find/count overlaps between
some rna seq reads and known genes, then you probably don't care about
CDS at all.
Cheers,
H.
H.
sessioninfo()
Error: could not find function "sessioninfo"
sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
methods
[8] base
other attached packages:
[1] gplots_2.13.0 RColorBrewer_1.0-5 PADOG_1.4.0
> [4] GSA_1.03 nlme_3.1-117
KEGGdzPathwaysGEO_1.1.3
[7] Heatplus_2.8.0 marray_1.40.0 limma_3.18.13
[10] org.Hs.eg.db_2.10.1 preprocessCore_1.24.0 GO.db_2.10.1
[13] SPIA_2.14.0 KEGGgraph_1.20.0 graph_1.40.1
[16] XML_3.98-1.1 KEGG.db_2.10.1 RSQLite_0.11.4
> [19] DBI_0.2-7 R2HTML_2.2.1
rtracklayer_1.22.7
> [22] Rsamtools_1.14.3 Biostrings_2.30.1
GenomicFeatures_1.14.5
[25] AnnotationDbi_1.24.0 Biobase_2.22.0
GenomicRanges_1.14.4
GenomicRanges_1.14.4
[28] XVector_0.2.0 IRanges_1.20.7
BiocGenerics_0.8.0
BiocGenerics_0.8.0
[31] BiocInstaller_1.12.1 multicore_0.2
loaded via a namespace (and not attached):
[1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0
caTools_1.17
caTools_1.17
[5] gdata_2.13.3 grid_3.0.3 gtools_3.4.0
KernSmooth_2.23-12
> KernSmooth_2.23-12
[9] lattice_0.20-29 RCurl_1.95-4.1 stats4_3.0.3
tools_3.0.3
tools_3.0.3
Adi Laurentiu TARCA, Ph.D.
Assistant Professor (Research),
Department of Computer Science & Center for Molecular Medicine and
Genetics, Wayne State University,
> Genetics, Wayne State University,
Director, Bioinformatics and Computational Biology Unit,
Perinatology
Perinatology
> Research Branch (NICHD),
3990 John R., Office 4809,
Detroit, Michigan 48201
Tel: 1-313-5775305
