Leo -- thanks for your recent help with fullCoverage() in derfinder. After your updates, that function began working very well for me. However, I have discovered that analyzeChr() is still having issues with chromosomes > 22. When I give it a chromosome in the human range, it does fine, but once we get up into higher numbers, it breaks.
R 3.1.1
Bioconductor version 3.0 (BiocInstaller 1.16.0)
derfinder 1.0.4
GenomeInfoDb 1.2.2 (version including the Canis familiaris genome update)
My script:
library(derfinder)
library(GenomeInfoDb)
library(GenomicRanges)
args <- commandArgs(trailingOnly = TRUE)
perms = 200 # for alpha = 0.05
cores = 15 # too many -> memory problems
#perms = 1
#cores = 25
chroms = c(args[1])
files <- rawFiles(datadir="tophat", samplepatt="tophat-juncs-*")
# Set up our groups
tameIds <- read.table("/home/hekman2/Dropbox/UIUC/transcriptome/hypothalamus/notes/samples-hyp-tame.txt")
tameFileNames <- lapply(tameIds, function(id) { return(paste("tophat-juncs-", id, sep="")) } )
groups <- as.factor(sapply(names(files), function(filename) { ifelse(grepl(filename, tameFileNames), "tame", "aggr") }))
# Initial data analysis over all chromosomes:
print("Starting fullCoverage...")
fullCov <- fullCoverage(files, chrs = chroms, species = 'canis_familiaris')
sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1)
models <- makeModels(sampleDepths, testvars = groups)
# Keep only bases with coverage > 2
print("Filtering...")
filteredCov <- lapply(fullCov, filterData, cutoff = 2)
originalWd <- getwd()
setwd(file.path(originalWd, 'derResults'))
# Analyze each chromosome individually (analysis will be parallelized
# and significance will be evaluated by permutations).
for (chrom in chroms) {
print(chrom)
currentChrom <- paste("chr", chrom, sep="")
analyzeChr(chr = currentChrom, filteredCov[[currentChrom]], models, mc.cores=cores, groupInfo = groups, writeOutput = TRUE, cutoffFstat = 0.05, nPermute = perms, seeds = 19731107 + seq_len(perms))
}
print("Chromosome analysis completed.")
The error on chromosomes > 22:
2014-11-02 12:44:36 analyzeChr: Annotating regions Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : subscript contains NAs or out-of-bounds indices Calls: analyzeChr ... extractROWS -> normalizeSingleBracketSubscript -> NSBS -> NSBS In addition: Warning message: In .Seqinfo.mergexy(x, y) : The 2 combined objects have no sequence levels in common. (Use suppressWarnings() to suppress this warning.) Execution halted
I also noticed this output message during execution of analyzeChr():
nearestgene: loading bumphunter hg19 transcript database
I'm guessing we should not be using hg19. In fact, this makes me wonder if the other low-numbered chromosomes which completed correctly have valid results or if they need to be re-run against a different transcript database.
Any help you can give is much appreciated. I suspect that the only solution will involve you editing the code again, for which I'm sorry if it's true!
Jessica

You might also want to check how to create a `TxDB.*` package (or at least the object) for Canis familiaris as I don't see one at http://www.bioconductor.org/packages/release/BiocViews.html#___AnnotationData Check more details at http://www.bioconductor.org/packages/release/bioc/vignettes/GenomicFeatures/inst/doc/GenomicFeatures.pdf
Sure. How would I use such an object (I'm not sure where it fits in to my workflow)?
I think that bumphunter::annotateNearest() could in principle use `TxDb` objects. If that's true, then you'll need one for Canis familiaris.
But well, I don't know if my 1st sentence is true