derfinder: mergeResults() running for three days and still going...
0
0
Entering edit mode
@jessicahekman-6877
Last seen 8.6 years ago
United States

Having successfully run analyzeChr() on chromosomes up through 22 and X (but not higher-numbered chromosomes, see derfinder: analyzeChr() breaks on chromosomes > 22), I ran mergeResults() to move forward with analysis on this first selection of chromosomes. However, the R process has now been running for three days and counting. The last message was "recalculating p-values". It has created a fullNullSummary.Rdata file which is 934M in size.

Is mergeChromosomes() expected to run this long on 23 chromosomes? I have 700 million reads over all my 39 chromosomes, so at a wild guess, perhaps 400 million on the chromosomes that are being merged.

R 3.1.1
Bioconductor version 3.0 (BiocInstaller 1.16.0)
derfinder 1.0.4

Script:

library(derfinder)
chrs <- paste("chr", c(1:22, "X"), sep="")
load("derResults/chr1/optionsStats.Rdata")
mergeResults(chrs=chrs, prefix="derResults",
    genomicState = genomicState$fullGenome,
    optionsStats = optionsStats)


Thank you!

Jessica

 

 

derfinder • 1.4k views
ADD COMMENT
1
Entering edit mode

Hi Jessica,

It seems to me that you are using a low cutoff for your data set, which is then generating a very large amount of candidate DERs. Look at the results for each of the chromosomes and see how many you candidate DERs you have in each one.

Merging the results doesn't have anything to do with how many millions reads you have. It's about how many candidate DERs you have.

As for the other question, I already posted an answer.

Cheers,

Leo

ADD REPLY
0
Entering edit mode

Hi Leo,

Ah, I see. You're right, I have tens of thousands of regions for each chromosome, if I'm looking right ("region$regions" tells me "GRanges object with 17284 ranges and 14 metadata columns").

So when you say I have a low cutoff, I think you mean this line:

filteredCov <- lapply(fullCov, filterData, cutoff = 2)


I don't have a feel for how high I should increase this, but I suspect the answer is to try on a few chromosomes and see what happens. Average coverage for my reads aligned to the dog genome is about 10.6X over the whole genome but more like 27.4X over exons. Since derfinder looks at intron and intra-genic differences as well as exon differences maybe I'll start at a cutoff of 8 and see where that gets me. If you have suggestions, I'd appreciate it!

I see now that you did answer the other question -- my apologies; I didn't get email notification and when I checked it said "0 answers" (but of course now when I scroll down I remember that a post isn't always an answer.) I'm still learning how this interface works, I guess.

Thank you very much!

Jessica

 

ADD REPLY
1
Entering edit mode

Hi Jessica,

I meant the F-statistic cutoff, not the data cutoff. That is, 

analyzeChr(cutoffFstat = 0.05)

You don't have to change the data cutoff (2 in your case).

You seem to be getting a very large amount of null regions with that cutoff, so try increasing it with a small number of chrs (maybe even just one). For example, if you go to chr1, how many null regions do you have now?

$ cd chr1
$ R
> load('regions.Rdata')
## Number of null regions
> length(regions$nullWidths)
## Number of candidate DERs
> length(regions$regions)

The time taken for calculating the p-values from all chrs (at the mergeResults() step) depends on how many candidate DERs you have.

You might be getting more than a million null regions, which would explain the slowness. I'll see if the code can be made more efficient later this week.

As for the other question, it says "1 answer". Did you refresh the page? Anyhow the person who asked the question has to accept the answer: you haven't done so for any of your other questions. That is, anyone could "answer" but you have to decide if it's appropriate or not. There are also "answers" and "comments" (like this one), which are supposed to be used for different things. A "comment" is used to discuss an answer, ask for clarifications, request more details, etc.

As for getting emails, I'm surprised you didn't get one. That might be due to your settings of when you want to get emails.

And don't worry, we are all learning how this interface works =)

Cheers,

Leo

 

--- Edited for clarity and to correct a mistake. The other question has 1 answer, not 0 ---

ADD REPLY

Login before adding your answer.

Traffic: 881 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6