|
> library('GenomicFeatures') |
|
Loading required package: BiocGenerics |
|
Loading required package: parallel |
|
|
|
Attaching package: ‘BiocGenerics’ |
|
|
|
The following objects are masked from ‘package:parallel’: |
|
|
|
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, |
|
parLapplyLB, parRapply, parSapply, parSapplyLB |
|
|
|
The following objects are masked from ‘package:stats’: |
|
|
|
IQR, mad, sd, var, xtabs |
|
|
|
The following objects are masked from ‘package:base’: |
|
|
|
anyDuplicated, append, as.data.frame, cbind, colMeans, colnames, colSums, do.call, duplicated, eval, evalq, Filter, |
|
Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste, pmax, |
|
pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort, table, |
|
tapply, union, unique, unsplit, which, which.max, which.min |
|
|
|
Loading required package: S4Vectors |
|
Loading required package: stats4 |
|
|
|
Attaching package: ‘S4Vectors’ |
|
|
|
The following object is masked from ‘package:base’: |
|
|
|
expand.grid |
|
|
|
Loading required package: IRanges |
|
Loading required package: GenomeInfoDb |
|
Loading required package: GenomicRanges |
|
Loading required package: AnnotationDbi |
|
Loading required package: Biobase |
|
Welcome to Bioconductor |
|
|
|
Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see |
|
'citation("Biobase")', and for packages 'citation("pkgname")'. |
|
|
|
> library('derfinder') |
|
> library('devtools') |
|
> |
|
> txdb <- makeTxDbFromGFF('ftp://ftp.ensembl.org/pub/release-81/gtf/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz') |
|
Import genomic features from the file as a GRanges object ... trying URL 'ftp://ftp.ensembl.org/pub/release-81/gtf/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz' |
|
ftp data connection made, file length 49757610 bytes |
|
================================================== |
|
downloaded 47.5 MB |
|
|
|
OK |
|
Prepare the 'metadata' data frame ... OK |
|
Make the TxDb object ... OK |
|
Warning messages: |
|
1: RSQLite::dbGetPreparedQuery() is deprecated, please switch to DBI::dbGetQuery(params = bind.data). |
|
2: Named parameters not used in query: internal_chrom_id, chrom, length, is_circular |
|
3: Named parameters not used in query: internal_id, name, type, chrom, strand, start, end |
|
4: Named parameters not used in query: internal_id, name, chrom, strand, start, end |
|
5: Named parameters not used in query: internal_id, name, chrom, strand, start, end |
|
6: Named parameters not used in query: internal_tx_id, exon_rank, internal_exon_id, internal_cds_id |
|
7: Named parameters not used in query: gene_id, internal_tx_id |
|
> |
|
> ## Check chr names |
|
> seqlevels(txdb) |
|
[1] "CHR_HG126_PATCH" "CHR_HG1342_HG2282_PATCH" "CHR_HG1362_PATCH" |
|
[4] "CHR_HG142_HG150_NOVEL_TEST" "CHR_HG151_NOVEL_TEST" "CHR_HG1651_PATCH" |
|
[7] "CHR_HG1832_PATCH" "CHR_HG2021_PATCH" "CHR_HG2030_PATCH" |
|
[10] "CHR_HG2058_PATCH" "CHR_HG2062_PATCH" "CHR_HG2066_PATCH" |
|
[13] "CHR_HG2095_PATCH" "CHR_HG2104_PATCH" "CHR_HG2128_PATCH" |
|
[16] "CHR_HG2191_PATCH" "CHR_HG2217_PATCH" "CHR_HG2232_PATCH" |
|
[19] "CHR_HG2233_PATCH" "CHR_HG2235_PATCH" "CHR_HG2247_PATCH" |
|
[22] "CHR_HG2249_PATCH" "CHR_HG2288_HG2289_PATCH" "CHR_HG2290_PATCH" |
|
[25] "CHR_HG2291_PATCH" "CHR_HG986_PATCH" "CHR_HSCHR10_1_CTG1" |
|
[28] "CHR_HSCHR10_1_CTG2" "CHR_HSCHR10_1_CTG3" "CHR_HSCHR10_1_CTG4" |
|
[31] "CHR_HSCHR11_1_CTG1_2" "CHR_HSCHR11_1_CTG5" "CHR_HSCHR11_1_CTG6" |
|
[34] "CHR_HSCHR11_1_CTG7" "CHR_HSCHR11_1_CTG8" "CHR_HSCHR11_2_CTG1" |
|
[37] "CHR_HSCHR11_2_CTG1_1" "CHR_HSCHR11_3_CTG1" "CHR_HSCHR12_1_CTG1" |
|
[40] "CHR_HSCHR12_1_CTG2_1" "CHR_HSCHR12_2_CTG2" "CHR_HSCHR12_2_CTG2_1" |
|
[43] "CHR_HSCHR12_3_CTG2" "CHR_HSCHR12_3_CTG2_1" "CHR_HSCHR12_4_CTG2" |
|
[46] "CHR_HSCHR12_4_CTG2_1" "CHR_HSCHR12_5_CTG2" "CHR_HSCHR12_5_CTG2_1" |
|
[49] "CHR_HSCHR12_6_CTG2_1" "CHR_HSCHR13_1_CTG1" "CHR_HSCHR13_1_CTG3" |
|
[52] "CHR_HSCHR13_1_CTG5" "CHR_HSCHR14_1_CTG1" "CHR_HSCHR14_2_CTG1" |
|
[55] "CHR_HSCHR14_3_CTG1" "CHR_HSCHR14_7_CTG1" "CHR_HSCHR15_1_CTG1" |
|
[58] "CHR_HSCHR15_1_CTG3" "CHR_HSCHR15_1_CTG8" "CHR_HSCHR15_2_CTG3" |
|
[61] "CHR_HSCHR15_2_CTG8" "CHR_HSCHR15_3_CTG3" "CHR_HSCHR15_3_CTG8" |
|
[64] "CHR_HSCHR15_4_CTG8" "CHR_HSCHR15_5_CTG8" "CHR_HSCHR15_6_CTG8" |
|
[67] "CHR_HSCHR16_1_CTG1" "CHR_HSCHR16_1_CTG3_1" "CHR_HSCHR16_2_CTG3_1" |
|
[70] "CHR_HSCHR16_3_CTG1" "CHR_HSCHR16_4_CTG1" "CHR_HSCHR16_CTG2" |
|
[73] "CHR_HSCHR17_10_CTG4" "CHR_HSCHR17_1_CTG1" "CHR_HSCHR17_1_CTG2" |
|
[76] "CHR_HSCHR17_1_CTG4" "CHR_HSCHR17_1_CTG5" "CHR_HSCHR17_1_CTG9" |
|
[79] "CHR_HSCHR17_2_CTG1" "CHR_HSCHR17_2_CTG2" "CHR_HSCHR17_2_CTG4" |
|
[82] "CHR_HSCHR17_2_CTG5" "CHR_HSCHR17_3_CTG2" "CHR_HSCHR17_3_CTG4" |
|
[85] "CHR_HSCHR17_4_CTG4" "CHR_HSCHR17_5_CTG4" "CHR_HSCHR17_6_CTG4" |
|
[88] "CHR_HSCHR17_7_CTG4" "CHR_HSCHR17_8_CTG4" "CHR_HSCHR17_9_CTG4" |
|
[91] "CHR_HSCHR18_1_CTG1_1" "CHR_HSCHR18_1_CTG2_1" "CHR_HSCHR18_2_CTG1_1" |
|
[94] "CHR_HSCHR18_2_CTG2" "CHR_HSCHR18_2_CTG2_1" "CHR_HSCHR18_3_CTG2_1" |
|
[97] "CHR_HSCHR18_ALT21_CTG2_1" "CHR_HSCHR18_ALT2_CTG2_1" "CHR_HSCHR19KIR_ABC08_A1_HAP_CTG3_1" |
|
[100] "CHR_HSCHR19KIR_ABC08_AB_HAP_C_P_CTG3_1" "CHR_HSCHR19KIR_ABC08_AB_HAP_T_P_CTG3_1" "CHR_HSCHR19KIR_FH05_A_HAP_CTG3_1" |
|
[103] "CHR_HSCHR19KIR_FH05_B_HAP_CTG3_1" "CHR_HSCHR19KIR_FH06_A_HAP_CTG3_1" "CHR_HSCHR19KIR_FH06_BA1_HAP_CTG3_1" |
|
[106] "CHR_HSCHR19KIR_FH08_A_HAP_CTG3_1" "CHR_HSCHR19KIR_FH08_BAX_HAP_CTG3_1" "CHR_HSCHR19KIR_FH13_A_HAP_CTG3_1" |
|
[109] "CHR_HSCHR19KIR_FH13_BA2_HAP_CTG3_1" "CHR_HSCHR19KIR_FH15_A_HAP_CTG3_1" "CHR_HSCHR19KIR_FH15_B_HAP_CTG3_1" |
|
[112] "CHR_HSCHR19KIR_G085_A_HAP_CTG3_1" "CHR_HSCHR19KIR_G085_BA1_HAP_CTG3_1" "CHR_HSCHR19KIR_G248_A_HAP_CTG3_1" |
|
[115] "CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1" "CHR_HSCHR19KIR_GRC212_AB_HAP_CTG3_1" "CHR_HSCHR19KIR_GRC212_BA1_HAP_CTG3_1" |
|
[118] "CHR_HSCHR19KIR_LUCE_A_HAP_CTG3_1" "CHR_HSCHR19KIR_LUCE_BDEL_HAP_CTG3_1" "CHR_HSCHR19KIR_RP5_B_HAP_CTG3_1" |
|
[121] "CHR_HSCHR19KIR_RSH_A_HAP_CTG3_1" "CHR_HSCHR19KIR_RSH_BA2_HAP_CTG3_1" "CHR_HSCHR19KIR_T7526_A_HAP_CTG3_1" |
|
[124] "CHR_HSCHR19KIR_T7526_BDEL_HAP_CTG3_1" "CHR_HSCHR19LRC_COX1_CTG3_1" "CHR_HSCHR19LRC_COX2_CTG3_1" |
|
[127] "CHR_HSCHR19LRC_LRC_I_CTG3_1" "CHR_HSCHR19LRC_LRC_J_CTG3_1" "CHR_HSCHR19LRC_LRC_S_CTG3_1" |
|
[130] "CHR_HSCHR19LRC_LRC_T_CTG3_1" "CHR_HSCHR19LRC_PGF1_CTG3_1" "CHR_HSCHR19LRC_PGF2_CTG3_1" |
|
[133] "CHR_HSCHR19_1_CTG2" "CHR_HSCHR19_1_CTG3_1" "CHR_HSCHR19_2_CTG2" |
|
[136] "CHR_HSCHR19_2_CTG3_1" "CHR_HSCHR19_3_CTG2" "CHR_HSCHR19_3_CTG3_1" |
|
[139] "CHR_HSCHR19_4_CTG2" "CHR_HSCHR19_4_CTG3_1" "CHR_HSCHR19_5_CTG2" |
|
[142] "CHR_HSCHR1_1_CTG11" "CHR_HSCHR1_1_CTG3" "CHR_HSCHR1_1_CTG31" |
|
[145] "CHR_HSCHR1_1_CTG32_1" "CHR_HSCHR1_2_CTG3" "CHR_HSCHR1_2_CTG31" |
|
[148] "CHR_HSCHR1_2_CTG32_1" "CHR_HSCHR1_3_CTG31" "CHR_HSCHR1_3_CTG32_1" |
|
[151] "CHR_HSCHR1_4_CTG31" "CHR_HSCHR1_ALT2_1_CTG32_1" "CHR_HSCHR20_1_CTG1" |
|
[154] "CHR_HSCHR20_1_CTG2" "CHR_HSCHR20_1_CTG3" "CHR_HSCHR20_1_CTG4" |
|
[157] "CHR_HSCHR21_2_CTG1_1" "CHR_HSCHR21_3_CTG1_1" "CHR_HSCHR21_4_CTG1_1" |
|
[160] "CHR_HSCHR21_5_CTG2" "CHR_HSCHR21_6_CTG1_1" "CHR_HSCHR21_8_CTG1_1" |
|
[163] "CHR_HSCHR22_1_CTG1" "CHR_HSCHR22_1_CTG2" "CHR_HSCHR22_1_CTG3" |
|
[166] "CHR_HSCHR22_1_CTG4" "CHR_HSCHR22_1_CTG5" "CHR_HSCHR22_1_CTG6" |
|
[169] "CHR_HSCHR22_1_CTG7" "CHR_HSCHR22_2_CTG1" "CHR_HSCHR22_3_CTG1" |
|
[172] "CHR_HSCHR22_4_CTG1" "CHR_HSCHR22_5_CTG1" "CHR_HSCHR2_1_CTG1" |
|
[175] "CHR_HSCHR2_1_CTG15" "CHR_HSCHR2_1_CTG5" "CHR_HSCHR2_1_CTG7" |
|
[178] "CHR_HSCHR2_1_CTG7_2" "CHR_HSCHR2_2_CTG1" "CHR_HSCHR2_2_CTG15" |
|
[181] "CHR_HSCHR2_2_CTG7" "CHR_HSCHR2_2_CTG7_2" "CHR_HSCHR2_3_CTG1" |
|
[184] "CHR_HSCHR2_3_CTG15" "CHR_HSCHR2_3_CTG7_2" "CHR_HSCHR2_4_CTG1" |
|
[187] "CHR_HSCHR3_1_CTG1" "CHR_HSCHR3_1_CTG2_1" "CHR_HSCHR3_1_CTG3" |
|
[190] "CHR_HSCHR3_2_CTG2_1" "CHR_HSCHR3_2_CTG3" "CHR_HSCHR3_3_CTG1" |
|
[193] "CHR_HSCHR3_3_CTG3" "CHR_HSCHR3_4_CTG2_1" "CHR_HSCHR3_4_CTG3" |
|
[196] "CHR_HSCHR3_5_CTG2_1" "CHR_HSCHR3_5_CTG3" "CHR_HSCHR3_6_CTG3" |
|
[199] "CHR_HSCHR3_7_CTG3" "CHR_HSCHR3_8_CTG3" "CHR_HSCHR3_9_CTG3" |
|
[202] "CHR_HSCHR4_1_CTG12" "CHR_HSCHR4_1_CTG4" "CHR_HSCHR4_1_CTG6" |
|
[205] "CHR_HSCHR4_1_CTG9" "CHR_HSCHR4_2_CTG12" "CHR_HSCHR4_3_CTG12" |
|
[208] "CHR_HSCHR4_4_CTG12" "CHR_HSCHR4_5_CTG12" "CHR_HSCHR4_6_CTG12" |
|
[211] "CHR_HSCHR4_7_CTG12" "CHR_HSCHR5_1_CTG1" "CHR_HSCHR5_1_CTG1_1" |
|
[214] "CHR_HSCHR5_1_CTG5" "CHR_HSCHR5_2_CTG1" "CHR_HSCHR5_2_CTG1_1" |
|
[217] "CHR_HSCHR5_2_CTG5" "CHR_HSCHR5_3_CTG1" "CHR_HSCHR5_3_CTG5" |
|
[220] "CHR_HSCHR5_4_CTG1" "CHR_HSCHR5_4_CTG1_1" "CHR_HSCHR5_5_CTG1" |
|
[223] "CHR_HSCHR5_6_CTG1" "CHR_HSCHR5_7_CTG1" "CHR_HSCHR6_1_CTG2" |
|
[226] "CHR_HSCHR6_1_CTG3" "CHR_HSCHR6_1_CTG4" "CHR_HSCHR6_1_CTG5" |
|
[229] "CHR_HSCHR6_1_CTG6" "CHR_HSCHR6_1_CTG7" "CHR_HSCHR6_1_CTG8" |
|
[232] "CHR_HSCHR6_1_CTG9" "CHR_HSCHR6_8_CTG1" "CHR_HSCHR6_MHC_APD_CTG1" |
|
[235] "CHR_HSCHR6_MHC_COX_CTG1" "CHR_HSCHR6_MHC_DBB_CTG1" "CHR_HSCHR6_MHC_MANN_CTG1" |
|
[238] "CHR_HSCHR6_MHC_MCF_CTG1" "CHR_HSCHR6_MHC_QBL_CTG1" "CHR_HSCHR6_MHC_SSTO_CTG1" |
|
[241] "CHR_HSCHR7_1_CTG1" "CHR_HSCHR7_1_CTG4_4" "CHR_HSCHR7_1_CTG6" |
|
[244] "CHR_HSCHR7_1_CTG7" "CHR_HSCHR7_2_CTG1" "CHR_HSCHR7_2_CTG4_4" |
|
[247] "CHR_HSCHR7_2_CTG6" "CHR_HSCHR7_2_CTG7" "CHR_HSCHR7_3_CTG6" |
|
[250] "CHR_HSCHR8_1_CTG1" "CHR_HSCHR8_1_CTG6" "CHR_HSCHR8_1_CTG7" |
|
[253] "CHR_HSCHR8_2_CTG1" "CHR_HSCHR8_2_CTG7" "CHR_HSCHR8_3_CTG1" |
|
[256] "CHR_HSCHR8_3_CTG7" "CHR_HSCHR8_4_CTG1" "CHR_HSCHR8_4_CTG7" |
|
[259] "CHR_HSCHR8_5_CTG1" "CHR_HSCHR8_5_CTG7" "CHR_HSCHR8_6_CTG1" |
|
[262] "CHR_HSCHR8_7_CTG1" "CHR_HSCHR8_8_CTG1" "CHR_HSCHR8_9_CTG1" |
|
[265] "CHR_HSCHR9_1_CTG1" "CHR_HSCHR9_1_CTG2" "CHR_HSCHR9_1_CTG3" |
|
[268] "CHR_HSCHR9_1_CTG4" "CHR_HSCHR9_1_CTG5" "CHR_HSCHRX_1_CTG3" |
|
[271] "CHR_HSCHRX_2_CTG12" "CHR_HSCHRX_2_CTG3" "1" |
|
[274] "2" "3" "4" |
|
[277] "5" "6" "7" |
|
[280] "8" "9" "10" |
|
[283] "11" "12" "13" |
|
[286] "14" "15" "16" |
|
[289] "17" "18" "19" |
|
[292] "20" "21" "22" |
|
[295] "X" "Y" "MT" |
|
[298] "GL000008.2" "GL000009.2" "GL000194.1" |
|
[301] "GL000195.1" "GL000205.2" "GL000213.1" |
|
[304] "GL000216.2" "GL000218.1" "GL000219.1" |
|
[307] "GL000220.1" "GL000224.1" "GL000225.1" |
|
[310] "KI270442.1" "KI270706.1" "KI270707.1" |
|
[313] "KI270708.1" "KI270711.1" "KI270713.1" |
|
[316] "KI270714.1" "KI270721.1" "KI270722.1" |
|
[319] "KI270723.1" "KI270724.1" "KI270726.1" |
|
[322] "KI270727.1" "KI270728.1" "KI270731.1" |
|
[325] "KI270733.1" "KI270734.1" "KI270741.1" |
|
[328] "KI270743.1" "KI270744.1" "KI270750.1" |
|
[331] "KI270752.1" |
|
> |
|
> ## Keep just chrs 1 to 22, X, Y, and MT |
|
> txdb <- keepSeqlevels(txdb, c(1:22, 'X', 'Y', 'MT')) |
|
> seqlevels(txdb) |
|
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "X" "Y" "MT" |
|
> |
|
> ## For speed testing: just use a portion of the txdb data |
|
> test_txdb <- keepSeqlevels(txdb, c('Y', 'MT')) |
|
> |
|
> ## Attempt to create genomic state object |
|
> gs <- makeGenomicState(test_txdb, style = 'Ensembl', currentStyle = 'Ensembl') |
|
'select()' returned 1:1 mapping between keys and columns |
|
Error: logical subscript contains NAs |
|
> traceback() |
|
10: stop(wmsg(...), call. = FALSE) |
|
9: .subscript_error("logical subscript contains NAs") |
|
8: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, |
|
allow.NAs = allow.NAs) |
|
7: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, |
|
allow.NAs = allow.NAs) |
|
6: normalizeSingleBracketSubscript(i, x) |
|
5: extractROWS(x, i) |
|
4: extractROWS(x, i) |
|
3: x[width(x) != seqlengths(fullInterGR)[i]] |
|
2: x[width(x) != seqlengths(fullInterGR)[i]] |
|
1: makeGenomicState(test_txdb, style = "Ensembl", currentStyle = "Ensembl") |
|
> |
|
> ## Issue is with the seqlengths being missing |
|
> seqlengths(test_txdb) |
|
Y MT |
|
NA NA |
|
> |
|
> ## Adding the sequence lengths doesn't work |
|
> hg38_chrominfo <- fetchExtendedChromInfoFromUCSC("hg38") |
|
> new_info <- hg38_chrominfo$UCSC_seqlength[match(seqlevels(test_txdb), |
|
+ mapSeqlevels(hg38_chrominfo$UCSC_seqlevel, 'Ensembl'))] |
|
> names(new_info) <- names(seqlengths(test_txdb)) |
|
> seqlengths(test_txdb) <- new_info |
|
Error in (function (classes, fdef, mtable) : |
|
unable to find an inherited method for function ‘seqinfo<-’ for signature ‘"TxDb"’ |
|
> |
|
> ## Using the approach described in https://support.bioconductor.org/p/67680/#67766 |
|
> ## That is, loading the GFF as a GRanges object and setting the lengths there |
|
> ## before using makeTxDbFromGRanges() |
|
> library('rtracklayer') |
|
> gr <- import('ftp://ftp.ensembl.org/pub/release-81/gtf/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz') |
|
trying URL 'ftp://ftp.ensembl.org/pub/release-81/gtf/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz' |
|
ftp data connection made, file length 49757610 bytes |
|
================================================== |
|
downloaded 47.5 MB |
|
|
|
> |
|
> ## Make a small GRanges object for testing |
|
> gr_small <- keepSeqlevels(gr, c('Y', 'MT'), pruning.mode = 'tidy') |
|
> |
|
> ## Set the chromosome lengths, genome |
|
> seqlengths(gr_small) <- new_info |
|
> genome(gr_small) <- 'hg38' |
|
> txdb2 <- makeTxDbFromGRanges(gr_small) |
|
Warning messages: |
|
1: Named parameters not used in query: internal_chrom_id, chrom, length, is_circular |
|
2: Named parameters not used in query: internal_id, name, type, chrom, strand, start, end |
|
3: Named parameters not used in query: internal_id, name, chrom, strand, start, end |
|
4: Named parameters not used in query: internal_id, name, chrom, strand, start, end |
|
5: Named parameters not used in query: internal_tx_id, exon_rank, internal_exon_id, internal_cds_id |
|
6: Named parameters not used in query: gene_id, internal_tx_id |
|
> |
|
> ## Now create the genomic state object |
|
> gs <- makeGenomicState(txdb2, style = 'Ensembl', currentStyle = 'Ensembl') |
|
'select()' returned 1:1 mapping between keys and columns |
|
> gs |
|
GRangesList object of length 2: |
|
$fullGenome |
|
GRanges object with 4180 ranges and 4 metadata columns: |
|
seqnames ranges strand | theRegion tx_id tx_name gene |
|
<Rle> <IRanges> <Rle> | <character> <IntegerList> <CharacterList> <IntegerList> |
|
1 Y [2784749, 2784853] + | exon 1 ENST00000516032 431 |
|
2 Y [2789827, 2790328] + | exon 2 ENST00000454281 377 |
|
3 Y [2827982, 2828218] + | exon 3 ENST00000430735 267 |
|
4 Y [2841486, 2841627] + | exon 4 ENST00000250784 10 |
|
5 Y [2841628, 2841919] + | intron 4,5 ENST00000250784,ENST00000430575 10 |
|
... ... ... ... . ... ... ... ... |
|
4176 Y [26454093, 26508212] * | intergenic |
|
4177 Y [26579691, 26586641] * | intergenic |
|
4178 Y [26591602, 26594850] * | intergenic |
|
4179 Y [26634653, 56855243] * | intergenic |
|
4180 Y [56855489, 57227415] * | intergenic |
|
|
|
... |
|
<1 more element> |
|
------- |
|
seqinfo: 1 sequence from hg38 genome |
|
> |
|
> ## Reproducibility information |
|
> print('Reproducibility information:') |
|
[1] "Reproducibility information:" |
|
> Sys.time() |
|
[1] "2017-03-01 12:12:14 EST" |
|
> options(width = 120) |
|
> session_info() |
|
Session info ----------------------------------------------------------------------------------------------------------- |
|
setting value |
|
version R Under development (unstable) (2016-10-26 r71594) |
|
system x86_64, darwin13.4.0 |
|
ui AQUA |
|
language (EN) |
|
collate en_US.UTF-8 |
|
tz America/New_York |
|
date 2017-03-01 |
|
|
|
Packages --------------------------------------------------------------------------------------------------------------- |
|
package * version date source |
|
acepack 1.4.1 2016-10-29 CRAN (R 3.4.0) |
|
AnnotationDbi * 1.37.3 2017-02-09 Bioconductor |
|
assertthat 0.1 2013-12-06 CRAN (R 3.4.0) |
|
backports 1.0.5 2017-01-18 CRAN (R 3.4.0) |
|
base64enc 0.1-3 2015-07-28 CRAN (R 3.4.0) |
|
Biobase * 2.35.1 2017-02-23 Bioconductor |
|
BiocGenerics * 0.21.3 2017-01-12 Bioconductor |
|
BiocParallel 1.9.5 2017-01-24 Bioconductor |
|
biomaRt 2.31.4 2017-01-13 Bioconductor |
|
Biostrings 2.43.4 2017-02-02 Bioconductor |
|
bitops 1.0-6 2013-08-17 CRAN (R 3.4.0) |
|
BSgenome 1.43.5 2017-02-02 Bioconductor |
|
bumphunter 1.15.0 2016-10-23 Bioconductor |
|
checkmate 1.8.2 2016-11-02 CRAN (R 3.4.0) |
|
cluster 2.0.5 2016-10-08 CRAN (R 3.4.0) |
|
codetools 0.2-15 2016-10-05 CRAN (R 3.4.0) |
|
colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0) |
|
data.table 1.10.4 2017-02-01 CRAN (R 3.4.0) |
|
DBI 0.5-1 2016-09-10 CRAN (R 3.4.0) |
|
DelayedArray 0.1.7 2017-02-17 Bioconductor |
|
derfinder * 1.9.6 2017-01-13 Bioconductor |
|
derfinderHelper 1.9.3 2016-11-29 Bioconductor |
|
devtools * 1.12.0 2016-12-05 CRAN (R 3.4.0) |
|
digest 0.6.12 2017-01-27 CRAN (R 3.4.0) |
|
doRNG 1.6 2014-03-07 CRAN (R 3.4.0) |
|
foreach 1.4.3 2015-10-13 CRAN (R 3.4.0) |
|
foreign 0.8-67 2016-09-13 CRAN (R 3.4.0) |
|
Formula 1.2-1 2015-04-07 CRAN (R 3.4.0) |
|
GenomeInfoDb * 1.11.9 2017-02-08 Bioconductor |
|
GenomeInfoDbData 0.99.0 2017-02-14 Bioconductor |
|
GenomicAlignments 1.11.9 2017-02-02 Bioconductor |
|
GenomicFeatures * 1.27.8 2017-02-11 Bioconductor |
|
GenomicFiles 1.11.3 2016-11-29 Bioconductor |
|
GenomicRanges * 1.27.22 2017-02-02 Bioconductor |
|
ggplot2 2.2.1 2016-12-30 CRAN (R 3.4.0) |
|
gridExtra 2.2.1 2016-02-29 CRAN (R 3.4.0) |
|
gtable 0.2.0 2016-02-26 CRAN (R 3.4.0) |
|
Hmisc 4.0-2 2016-12-31 CRAN (R 3.4.0) |
|
htmlTable 1.9 2017-01-26 CRAN (R 3.4.0) |
|
htmltools 0.3.5 2016-03-21 CRAN (R 3.4.0) |
|
htmlwidgets 0.8 2016-11-09 CRAN (R 3.4.0) |
|
IRanges * 2.9.18 2017-02-02 Bioconductor |
|
iterators 1.0.8 2015-10-13 CRAN (R 3.4.0) |
|
knitr 1.15.1 2016-11-22 CRAN (R 3.4.0) |
|
lattice 0.20-34 2016-09-06 CRAN (R 3.4.0) |
|
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.4.0) |
|
lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.0) |
|
locfit 1.5-9.1 2013-04-20 CRAN (R 3.4.0) |
|
magrittr 1.5 2014-11-22 CRAN (R 3.4.0) |
|
Matrix 1.2-8 2017-01-20 CRAN (R 3.4.0) |
|
matrixStats 0.51.0 2016-10-09 CRAN (R 3.4.0) |
|
memoise 1.0.0 2016-01-29 CRAN (R 3.4.0) |
|
munsell 0.4.3 2016-02-13 CRAN (R 3.4.0) |
|
nnet 7.3-12 2016-02-02 CRAN (R 3.4.0) |
|
pkgmaker 0.22 2014-05-14 CRAN (R 3.4.0) |
|
plyr 1.8.4 2016-06-08 CRAN (R 3.4.0) |
|
qvalue 2.7.0 2016-10-23 Bioconductor |
|
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.4.0) |
|
Rcpp 0.12.9 2017-01-14 CRAN (R 3.4.0) |
|
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.4.0) |
|
registry 0.3 2015-07-08 CRAN (R 3.4.0) |
|
reshape2 1.4.2 2016-10-22 CRAN (R 3.4.0) |
|
rngtools 1.2.4 2014-03-06 CRAN (R 3.4.0) |
|
rpart 4.1-10 2015-06-29 CRAN (R 3.4.0) |
|
Rsamtools 1.27.12 2017-01-24 Bioconductor |
|
RSQLite 1.1-2 2017-01-08 CRAN (R 3.4.0) |
|
rtracklayer * 1.35.6 2017-02-19 cran (@1.35.6) |
|
S4Vectors * 0.13.15 2017-02-14 cran (@0.13.15) |
|
scales 0.4.1 2016-11-09 CRAN (R 3.4.0) |
|
stringi 1.1.2 2016-10-01 CRAN (R 3.4.0) |
|
stringr 1.2.0 2017-02-18 CRAN (R 3.4.0) |
|
SummarizedExperiment 1.5.7 2017-02-23 Bioconductor |
|
survival 2.40-1 2016-10-30 CRAN (R 3.4.0) |
|
tibble 1.2 2016-08-26 CRAN (R 3.4.0) |
|
VariantAnnotation 1.21.17 2017-02-12 Bioconductor |
|
withr 1.0.2 2016-06-20 CRAN (R 3.4.0) |
|
XML 3.98-1.5 2016-11-10 CRAN (R 3.4.0) |
|
xtable 1.8-2 2016-02-05 CRAN (R 3.4.0) |
|
XVector 0.15.2 2017-02-02 Bioconductor |
|
zlibbioc 1.21.0 2016-10-23 Bioconductor |
Thanks for all your help Leo.
library(GenomicFeatures)
library(derfinder)
library(devtools)
library(rtracklayer)
gr <- import('ftp://ftp.ensembl.org/pub/release-81/gtf/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz')
gr_small <- keepSeqlevels(gr, c('Y', 'MT'), pruning.mode = 'tidy')
We don't have the devel version of bioC on our cluster(we're at 3.4), is the pruning mode available without using 3.5?
gr_small <- keepSeqlevels(gr, c('Y', 'MT'))
genome(gr_small) <- 'hg38'
txdb2 <- makeTxDbFromGRanges(gr_small)
gs <- makeGenomicState(txdb2, style = 'Ensembl', currentStyle = 'Ensembl')
I'm going to guess that not using the pruning.mode is why I'm getting an error here?
Hi Andrew,
"pruning.mode" is a recent argument and has to be specified in Bioc-devel, but I meant to say previously that everything would work in Bioc-release if you dropped that part. You were close in your last attempt, but you forgot the crucial step of adding the chromosome lengths. I assume that you were using derfinder 1.8.0 because 1.8.1 would have printed an error related to the missing chromosome lengths. Like I said before, derfinder 1.8.1 will be available via biocLite() in like a day or two. In any case, even with derfinder 1.8.0, you can get everything to run if you specify the chromosome lengths.
Here's the code and output using Bioc-release (3.4).
Best,
Leo
Oh I see, somehow I missed that line.
Thanks!
No problem ^^
Hi Leo, everything above worked, but I'm having another issue now. Using the script that wa sworking before but with the new GRCH38 txdb, the out put of matchGenes isn't quite right. The, columns, name, annotation and geneL, are all NA's. Any Idea what this might be?
annotation_obj <- annotateTranscripts(txdb)
nearest_annotation_df <- matchGenes(der_grange_obj, subject = annotation_obj)
write_tsv(nearest_annotation_df, str_c(output_dir, '/nearest_annotation_df.tsv'))
head -n 2 nearest_annotation_df.tsv
name annotation description region distance subregion insideDistance exonnumber nexons UTR strand geneL codingL Entrez subjectHits
NA NA inside intron inside 14741.0 inside intron 175.0 10.0 11 inside transcription region - 15166.0 NA ENSG00000227232 13715
Hi Andrew,
You should post this as a new question instead of a comment.
I'm not sure, but this would be a bumphunter issue, not a derfinder one. Most likely those functions assume that the TxDb object has a specific structure. For example, that the gene names are Entrez IDs instead of Gencode ids.
Best,
Leo
Thanks! I'll see if the Bumphunter documentation has anything to say.
Yeah, it's definitely a bumphunter issue as you can see at https://github.com/Bioconductor-mirror/bumphunter/blob/515c9c45a6a1b1fa03d1936fb946d8fbea184710/R/matchGenes.R#L182-L202. The current code assumes that you are working with Entrez gene ids. It should be possible to make the code more flexible to work with other types of gene ids.
Check https://github.com/rafalab/bumphunter/issues/15