Question

BitSeq transcript IDs

0

Entering edit mode

Peter Glaus ▴ 70

@peter-glaus-5589

Last seen 11.4 years ago

Hi Maayan, if you want to work with files, there should be a file 'A08485_gr4_mini.run1.tr' which has information about transcripts, keeps the same order as .thetaMeans file and the second column contains transcript names. (all BitSeq output keeps the same order of reference sequences) Also, I think the first element of the result 'res_A08485$exp' contains mean expression and variance and should contain the transcript names as well. Try: head(res_A08485$exp) Peter. On 04/04/13 23:12, Maayan Kreitzman wrote: > Hi Peter, > I think I've successfully run BitSeq on a few mini bam files. When I > look in the resulting thetaMeans file, the transcript ID column is > just numbers. Is there a way to have the proper IDs from the > annotation? I'm copying the top of the file: > > # T => Mrows > # M 183985 > # file containing the mean value of theta - realtive abundace of > fragments and counts > # (overall mean, overall counts, mean of saved samples, and mean > from every chain are reported) > # columns: > # <transcriptid> <meanthetaoverall> <meanreadcountoverall> > <meanthetasaved> <chain1mean> <chain2mean> <chain3mean> <chain4mean> > #thetaAct: 0.000000000e+00 0 0.000000000e+00 0.000000000e+00 > 0.000000000e+00 0.000000000e+00 0.000000000e+00 > 1 2.329622950e-06 1 2.336304311e-06 2.384341203e-06 > 2.274251443e-06 2.328343218e-06 2.331555935e-06 > 2 4.591043926e-06 1 4.591127729e-06 4.730587366e-06 > 4.525049801e-06 4.538165770e-06 4.570372768e-06 > 3 2.394661546e-06 1 2.414399085e-06 2.505416202e-06 > 2.208339897e-06 2.389828455e-06 2.475061630e-06 > 4 2.328241183e-06 1 2.353625902e-06 2.319018416e-06 > 2.315240398e-06 2.373995323e-06 2.304710595e-06 > 5 2.300820499e-06 1 2.362273440e-06 2.228251927e-06 > 2.281075630e-06 2.309094754e-06 2.384859685e-06 > 6 2.389348690e-06 1 2.410694464e-06 2.348631392e-06 > 2.522953884e-06 2.443492278e-06 2.242317204e-06 > 7 2.400561999e-06 1 2.419266200e-06 2.475096269e-06 > 2.441005711e-06 2.306151611e-06 2.379994405e-06 > 8 2.307687449e-06 1 2.195070192e-06 2.316539235e-06 > 2.437629395e-06 2.250309118e-06 2.226272048e-06 > 9 2.399691624e-06 1 2.452207949e-06 2.380327529e-06 > 2.509800276e-06 2.263119465e-06 2.445519224e-06 > > > > This is the command that I ran: > >res_A08485 <- getExpression("A08485_gr4_mini.sam", > "/projects/mkreitzman_prj/expression_quantification_testing/testing/ test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.al l.fa", > + outPrefix="A08485_gr4_mini.run1", log=TRUE, seed=47) > > thanks for your help, > > Maayan > > On 03/26/2013 06:28 AM, Peter Glaus wrote: >> Dear Maayan, >> not sure what is the problem as I was not able to replicate this error. >> Could you please try: >> 1) running with verbose=TRUE option, so there will be a bit more output. >> 2) use option outPrefix. With this option, BitSeq does not use temporary >> directories. >> Examples: assuming your working direcotry in R is: >> /projects/mkreitzman_prj/expression_quantification_testing/testing/ BitSeq >> you could run: >> getExpression("A08485_gr4.sam_mini.sam", >> "../test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69. cdna.all.fa", >> outPrefix="A08485_gr4.sam_mini.run1", log=TRUE, seed=47, verbose=TRUE); >> >> Which should create 3 file in the directory: >> A08485_gr4.sam_mini.run1.rpkm, A08485_gr4.sam_mini.run1.thetaMeans and >> A08485_gr4.sam_mini.run1.mean. >> >> (You can also use relative/absolute paths in the outPrefix, just make >> sure the directories exist.) >> >> If the run fails again, you can have a look into the directory and see >> what kind of files were created. >> >> Regards, >> Peter. >> >> On 25/03/13 22:00, Maayan Kreitzman wrote: >>> Hi Peter, >>> I'm still having some difficulty with the first step of BitSeq. >>> I made myself a mini dataset with just 5 million reads so that I could run everything in my R console without writing and R script and submitting to the cluster (just until I know what I'm doing) >>> >>> but, I got this error: >>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >>> >>> since it seemed to be an issue with the temp directory, I tried to change my TMPDIR in the shell to somewhere with plenty of space: >>> [mkreitzman@xhost09 ~]$ echo $TMPDIR >>> /projects/wtss_scratch/maayan >>> >>> but, this did not make a difference to where the temp files were created. >>> I copied the whole session below. >>> >>> thanks, >>> Maayan >>> >>> >>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quanti fication_testing/testing/BitSeq/A08485_gr4.sam_mini.sam", "/projects/m kreitzman_prj/expression_quantification_testing/testing/test_data/stra nd_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa", >>> + log = TRUE, seed=47) >>> ## Computing alignment probabilities. >>> [time: +1.283333 m] >>> [time: +0.333333 m] >>> [time: +0.000000 m] >>> [time: +1.200000 m] >>> [time: +0.500000 m] >>> [time: +0.000000 m] >>> ## Estimating transcript expression levels. >>> Mappings: 1606295 >>> Ntotal: 2408007 >>> 10000 [time: +0.000000 s] >>> 100000 [time: +0.000000 s] >>> 1000000 [time: +2.000000 s] >>> Finished Reading! >>> Total hits = 3212590 >>> Isoforms: 183986 >>> Burn in: 1000 DONE. [time: +6.633333 m] >>> >>> Sampling DONE. [time: +9.600000 m] >>> rHat (for 1000 samples) >>> rHat (rHat from subset | tid | mean theta) >>> 1.0040 ( 1.0040 | 36651 | 0.0000) >>> 1.0040 ( 1.0072 | 178210 | 0.0000) >>> 1.0036 ( 1.0088 | 148680 | 0.0000) >>> Mean rHat of worst 10 transcripts: 1.003527 >>> Mean C0: (50 50 50 50 ). Nunmap: 801712 >>> >>> Producing 649 final samples. >>> >>> Sampling DONE. [time: +6.833333 m] >>> rHat (for 649 samples) >>> rHat (rHat from subset | tid | mean theta) >>> 1.0061 ( 1.0051 | 157831 | 0.0000) >>> 1.0059 ( 1.0048 | 104363 | 0.0000) >>> 1.0058 ( 1.0068 | 71659 | 0.0000) >>> Mean rHat of worst 10 transcripts: 1.005543 >>> Mean C0: (50 50 50 50 ). Nunmap: 801712 >>> >>> Total samples: 6596 >>> ## Computing means. >>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >>> ________________________________________ >>> From: Peter Glaus [glaus@cs.man.ac.uk] >>> Sent: Friday, March 22, 2013 5:32 AM >>> To: Maayan Kreitzman >>> Subject: Re: BitSeq getExpression crash >>> >>> Hi Maayan, >>> I believe the error is caused by process running out of memory. I am not >>> 100% sure, but when I saw this kinds of errors before, it was caused by >>> lack of memory. The estimation can be quite CPU and memory intensive, so >>> I advice running it on a computing cluster instead of using regular >>> desktop/notebook machine. >>> >>> Regarding your function call, when running actual analysis (not just >>> testing/trying out), please use higher values for MCMC_burnIn, >>> MCMC_samplesN and MCMC_samplesSave (the default when leaving these blank >>> is 1000 and is usually "good enough"), the computation will take longer, >>> however the estimates will be much more accurate as well. >>> (The values 200, 200, 50 are used in the vignette because the example >>> data is very small, and the vignette has to run within time limit.) >>> >>> Also, for future reference when you have questions regarding >>> Bioconductor packages, please post to Bioconductor user mailing list >>> (and CC package author), as you might sometimes get replies from other >>> users and also your post might help some other users if they encounter >>> similar problem in the future. >>> >>> Best regards, >>> Peter. >>> >>> On 21/03/13 21:46, Maayan Kreitzman wrote: >>>> Dear Peter, >>>> I'm trying to run BitSeq, and am running into a problem after several hours of the getExpression function running. >>>> This same thing happened twice, on different servers. What weird is that not only does the function crash, it actually exits R. >>>> this is the error message: >>>> >>>> terminate called after throwing an instance of 'std::bad_alloc' >>>> what(): St9bad_alloc >>>> Aborted >>>> >>>> I have no experience whatsoever with R, so this may be a novice mistake, but your help would be greatly appreciated. >>>> I've copied the whole session below. >>>> >>>> thanks in advance, >>>> Maayan >>>> >>>> >>>>> library("BitSeq") >>>> Loading required package: Rsamtools >>>> Loading required package: IRanges >>>> Loading required package: BiocGenerics >>>> >>>> Attaching package: BiocGenerics >>>> >>>> The following object(s) are masked from package:stats: >>>> >>>> xtabs >>>> >>>> The following object(s) are masked from package:base: >>>> >>>> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, >>>> get, intersect, lapply, Map, mapply, mget, order, paste, pmax, >>>> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, >>>> rownames, sapply, setdiff, table, tapply, union, unique >>>> >>>> Loading required package: GenomicRanges >>>> Loading required package: Biostrings >>>> Loading required package: zlibbioc >>>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quant ification_testing/testing/test_data/strand_specific/transcriptome/bowt ie2transcriptome/A08473_gr4.sam", >>>> + "/projects/mkreitzman_prj/expression_quantification_testing/tes ting/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cd na.all.fa", >>>> + log = TRUE, MCMC_burnIn=200,MCMC_samplesN=200,MCMC_samplesSave=50,seed=47) >>>> ## Computing alignment probabilities. >>>> [time: +1.866667 m] >>>> [time: +36.400000 m] >>>> [time: +0.000000 m] >>>> [time: +117.050000 m] >>>> [time: +0.500000 m] >>>> [time: +0.000000 m] >>>> ## Estimating transcript expression levels. >>>> Mappings: 71092830 >>>> Ntotal: 123098679 >>>> 10000 [time: +1.000000 s] >>>> 100000 [time: +0.000000 s] >>>> 1000000 [time: +3.000000 s] >>>> 10000000 [time: +25.000000 s] >>>> Read only 14186178 reads. >>>> Finished Reading! >>>> Total hits = 28372355 >>>> Isoforms: 183985 >>>> Burn in: 200 DONE. [time: +12.016667 m] >>>> >>>> Sampling DONE. [time: +12.850000 m] >>>> rHat (for 200 samples) >>>> rHat (rHat from subset | tid | mean theta) >>>> 1.0252 ( 1.1173 | 89080 | 0.0000) >>>> 1.0216 ( 1.1351 | 126802 | 0.0000) >>>> 1.0183 ( 1.0151 | 183201 | 0.0000) >>>> Mean rHat of worst 10 transcripts: 1.018596 >>>> Mean C0: (3516 3520 3529 3518 ). Nunmap: 52005849 >>>> >>>> Producing 33 final samples. >>>> >>>> Sampling DONE. [time: +2.166667 m] >>>> rHat (for 33 samples) >>>> rHat (rHat from subset | tid | mean theta) >>>> 1.1193 ( 1.1332 | 117458 | 0.0000) >>>> 1.1181 ( 1.1229 | 158878 | 0.0000) >>>> 1.1074 ( 1.1004 | 43840 | 0.0000) >>>> Mean rHat of worst 10 transcripts: 1.108279 >>>> Mean C0: (3528 3512 3523 3520 ). Nunmap: 52005849 >>>> >>>> Total samples: 932 >>>> terminate called after throwing an instance of 'std::bad_alloc' >>>> what(): St9bad_alloc >>>> Aborted > [[alternative HTML version deleted]]

Alignment PROcess BitSeq Alignment PROcess BitSeq • 1.7k views

ADD COMMENT • link updated 12.9 years ago by Maayan Kreitzman ▴ 30 • written 12.9 years ago by Peter Glaus ▴ 70

score 0 · Answer 1 · 2013-04-04

Hi Peter, I think I've successfully run BitSeq on a few mini bam files. When I look in the resulting thetaMeans file, the transcript ID column is just numbers. Is there a way to have the proper IDs from the annotation? I'm copying the top of the file: # T => Mrows # M 183985 # file containing the mean value of theta - realtive abundace of fragments and counts # (overall mean, overall counts, mean of saved samples, and mean from every chain are reported) # columns: # <transcriptid> <meanthetaoverall> <meanreadcountoverall> <meanthetasaved> <chain1mean> <chain2mean> <chain3mean> <chain4mean> #thetaAct: 0.000000000e+00 0 0.000000000e+00 0.000000000e+00 0.000000000e+00 0.000000000e+00 0.000000000e+00 1 2.329622950e-06 1 2.336304311e-06 2.384341203e-06 2.274251443e-06 2.328343218e-06 2.331555935e-06 2 4.591043926e-06 1 4.591127729e-06 4.730587366e-06 4.525049801e-06 4.538165770e-06 4.570372768e-06 3 2.394661546e-06 1 2.414399085e-06 2.505416202e-06 2.208339897e-06 2.389828455e-06 2.475061630e-06 4 2.328241183e-06 1 2.353625902e-06 2.319018416e-06 2.315240398e-06 2.373995323e-06 2.304710595e-06 5 2.300820499e-06 1 2.362273440e-06 2.228251927e-06 2.281075630e-06 2.309094754e-06 2.384859685e-06 6 2.389348690e-06 1 2.410694464e-06 2.348631392e-06 2.522953884e-06 2.443492278e-06 2.242317204e-06 7 2.400561999e-06 1 2.419266200e-06 2.475096269e-06 2.441005711e-06 2.306151611e-06 2.379994405e-06 8 2.307687449e-06 1 2.195070192e-06 2.316539235e-06 2.437629395e-06 2.250309118e-06 2.226272048e-06 9 2.399691624e-06 1 2.452207949e-06 2.380327529e-06 2.509800276e-06 2.263119465e-06 2.445519224e-06 This is the command that I ran: >res_A08485 <- getExpression("A08485_gr4_mini.sam", "/projects/mkreitzman_prj/expression_quantification_testing/testing/te st_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all. fa", + outPrefix="A08485_gr4_mini.run1", log=TRUE, seed=47) thanks for your help, Maayan On 03/26/2013 06:28 AM, Peter Glaus wrote: > Dear Maayan, > not sure what is the problem as I was not able to replicate this error. > Could you please try: > 1) running with verbose=TRUE option, so there will be a bit more output. > 2) use option outPrefix. With this option, BitSeq does not use temporary > directories. > Examples: assuming your working direcotry in R is: > /projects/mkreitzman_prj/expression_quantification_testing/testing/B itSeq > you could run: > getExpression("A08485_gr4.sam_mini.sam", > "../test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.c dna.all.fa", > outPrefix="A08485_gr4.sam_mini.run1", log=TRUE, seed=47, verbose=TRUE); > > Which should create 3 file in the directory: > A08485_gr4.sam_mini.run1.rpkm, A08485_gr4.sam_mini.run1.thetaMeans and > A08485_gr4.sam_mini.run1.mean. > > (You can also use relative/absolute paths in the outPrefix, just make > sure the directories exist.) > > If the run fails again, you can have a look into the directory and see > what kind of files were created. > > Regards, > Peter. > > On 25/03/13 22:00, Maayan Kreitzman wrote: >> Hi Peter, >> I'm still having some difficulty with the first step of BitSeq. >> I made myself a mini dataset with just 5 million reads so that I could run everything in my R console without writing and R script and submitting to the cluster (just until I know what I'm doing) >> >> but, I got this error: >> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >> >> since it seemed to be an issue with the temp directory, I tried to change my TMPDIR in the shell to somewhere with plenty of space: >> [mkreitzman@xhost09 ~]$ echo $TMPDIR >> /projects/wtss_scratch/maayan >> >> but, this did not make a difference to where the temp files were created. >> I copied the whole session below. >> >> thanks, >> Maayan >> >> >>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quantif ication_testing/testing/BitSeq/A08485_gr4.sam_mini.sam", "/projects/mk reitzman_prj/expression_quantification_testing/testing/test_data/stran d_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa", >> + log = TRUE, seed=47) >> ## Computing alignment probabilities. >> [time: +1.283333 m] >> [time: +0.333333 m] >> [time: +0.000000 m] >> [time: +1.200000 m] >> [time: +0.500000 m] >> [time: +0.000000 m] >> ## Estimating transcript expression levels. >> Mappings: 1606295 >> Ntotal: 2408007 >> 10000 [time: +0.000000 s] >> 100000 [time: +0.000000 s] >> 1000000 [time: +2.000000 s] >> Finished Reading! >> Total hits = 3212590 >> Isoforms: 183986 >> Burn in: 1000 DONE. [time: +6.633333 m] >> >> Sampling DONE. [time: +9.600000 m] >> rHat (for 1000 samples) >> rHat (rHat from subset | tid | mean theta) >> 1.0040 ( 1.0040 | 36651 | 0.0000) >> 1.0040 ( 1.0072 | 178210 | 0.0000) >> 1.0036 ( 1.0088 | 148680 | 0.0000) >> Mean rHat of worst 10 transcripts: 1.003527 >> Mean C0: (50 50 50 50 ). Nunmap: 801712 >> >> Producing 649 final samples. >> >> Sampling DONE. [time: +6.833333 m] >> rHat (for 649 samples) >> rHat (rHat from subset | tid | mean theta) >> 1.0061 ( 1.0051 | 157831 | 0.0000) >> 1.0059 ( 1.0048 | 104363 | 0.0000) >> 1.0058 ( 1.0068 | 71659 | 0.0000) >> Mean rHat of worst 10 transcripts: 1.005543 >> Mean C0: (50 50 50 50 ). Nunmap: 801712 >> >> Total samples: 6596 >> ## Computing means. >> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >> ________________________________________ >> From: Peter Glaus [glaus@cs.man.ac.uk] >> Sent: Friday, March 22, 2013 5:32 AM >> To: Maayan Kreitzman >> Subject: Re: BitSeq getExpression crash >> >> Hi Maayan, >> I believe the error is caused by process running out of memory. I am not >> 100% sure, but when I saw this kinds of errors before, it was caused by >> lack of memory. The estimation can be quite CPU and memory intensive, so >> I advice running it on a computing cluster instead of using regular >> desktop/notebook machine. >> >> Regarding your function call, when running actual analysis (not just >> testing/trying out), please use higher values for MCMC_burnIn, >> MCMC_samplesN and MCMC_samplesSave (the default when leaving these blank >> is 1000 and is usually "good enough"), the computation will take longer, >> however the estimates will be much more accurate as well. >> (The values 200, 200, 50 are used in the vignette because the example >> data is very small, and the vignette has to run within time limit.) >> >> Also, for future reference when you have questions regarding >> Bioconductor packages, please post to Bioconductor user mailing list >> (and CC package author), as you might sometimes get replies from other >> users and also your post might help some other users if they encounter >> similar problem in the future. >> >> Best regards, >> Peter. >> >> On 21/03/13 21:46, Maayan Kreitzman wrote: >>> Dear Peter, >>> I'm trying to run BitSeq, and am running into a problem after several hours of the getExpression function running. >>> This same thing happened twice, on different servers. What weird is that not only does the function crash, it actually exits R. >>> this is the error message: >>> >>> terminate called after throwing an instance of 'std::bad_alloc' >>> what(): St9bad_alloc >>> Aborted >>> >>> I have no experience whatsoever with R, so this may be a novice mistake, but your help would be greatly appreciated. >>> I've copied the whole session below. >>> >>> thanks in advance, >>> Maayan >>> >>> >>>> library("BitSeq") >>> Loading required package: Rsamtools >>> Loading required package: IRanges >>> Loading required package: BiocGenerics >>> >>> Attaching package: BiocGenerics >>> >>> The following object(s) are masked from package:stats: >>> >>> xtabs >>> >>> The following object(s) are masked from package:base: >>> >>> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, >>> get, intersect, lapply, Map, mapply, mget, order, paste, pmax, >>> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, >>> rownames, sapply, setdiff, table, tapply, union, unique >>> >>> Loading required package: GenomicRanges >>> Loading required package: Biostrings >>> Loading required package: zlibbioc >>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quanti fication_testing/testing/test_data/strand_specific/transcriptome/bowti e2transcriptome/A08473_gr4.sam", >>> + "/projects/mkreitzman_prj/expression_quantification_testing/test ing/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdn a.all.fa", >>> + log = TRUE, MCMC_burnIn=200,MCMC_samplesN=200,MCMC_samplesSave=50,seed=47) >>> ## Computing alignment probabilities. >>> [time: +1.866667 m] >>> [time: +36.400000 m] >>> [time: +0.000000 m] >>> [time: +117.050000 m] >>> [time: +0.500000 m] >>> [time: +0.000000 m] >>> ## Estimating transcript expression levels. >>> Mappings: 71092830 >>> Ntotal: 123098679 >>> 10000 [time: +1.000000 s] >>> 100000 [time: +0.000000 s] >>> 1000000 [time: +3.000000 s] >>> 10000000 [time: +25.000000 s] >>> Read only 14186178 reads. >>> Finished Reading! >>> Total hits = 28372355 >>> Isoforms: 183985 >>> Burn in: 200 DONE. [time: +12.016667 m] >>> >>> Sampling DONE. [time: +12.850000 m] >>> rHat (for 200 samples) >>> rHat (rHat from subset | tid | mean theta) >>> 1.0252 ( 1.1173 | 89080 | 0.0000) >>> 1.0216 ( 1.1351 | 126802 | 0.0000) >>> 1.0183 ( 1.0151 | 183201 | 0.0000) >>> Mean rHat of worst 10 transcripts: 1.018596 >>> Mean C0: (3516 3520 3529 3518 ). Nunmap: 52005849 >>> >>> Producing 33 final samples. >>> >>> Sampling DONE. [time: +2.166667 m] >>> rHat (for 33 samples) >>> rHat (rHat from subset | tid | mean theta) >>> 1.1193 ( 1.1332 | 117458 | 0.0000) >>> 1.1181 ( 1.1229 | 158878 | 0.0000) >>> 1.1074 ( 1.1004 | 43840 | 0.0000) >>> Mean rHat of worst 10 transcripts: 1.108279 >>> Mean C0: (3528 3512 3523 3520 ). Nunmap: 52005849 >>> >>> Total samples: 932 >>> terminate called after throwing an instance of 'std::bad_alloc' >>> what(): St9bad_alloc >>> Aborted [[alternative HTML version deleted]]

score 0 · Answer 2 · 2013-04-05

Hi Peter, thanks for that. head(res_A08485$exp) is great. BitSeq does produce the .tr files for each of the input libraries. Another question, regarding getDE: which tr file should I provide to getDE? all of them? how do I specify them? Also, I got the error below when running getDE. The libraries are just the first 1 million reads from name-sorted alignments from bowtie2, so they are quite small (I've noticed the beginning of name-sorted files have a lot of unaligned reads in them). Not sure if that may have caused a problem. I looked for " lowess-f" in the manual but didn't find anything. thanks again, maayan > group4Files = c(res_A08473$fn,res_A08485$fn) > group3Files = c(res_A08606$fn,res_A08616$fn) > de1 <- getDE(list(group3Files, group4Files), samples=TRUE) ## No trInfoFile provided, will try using A08606_gr3_mini.run1.tr for result's row names. Computing overall mean. Estimating hyperparameters. [time: +1.000000 s] Error in estimateHyperPar(parFile, conditions, meanFile = meanFile, norm = norm, : Main: Unable to produce smooth beta >0. Try adjusting the parameter lowess-f. On 04/05/2013 03:06 AM, Peter Glaus wrote: > Hi Maayan, > if you want to work with files, there should be a file > 'A08485_gr4_mini.run1.tr' which has information about transcripts, > keeps the same order as .thetaMeans file and the second column > contains transcript names. (all BitSeq output keeps the same order of > reference sequences) > > Also, I think the first element of the result 'res_A08485$exp' > contains mean expression and variance and should contain the > transcript names as well. Try: > head(res_A08485$exp) > > Peter. > > On 04/04/13 23:12, Maayan Kreitzman wrote: >> Hi Peter, >> I think I've successfully run BitSeq on a few mini bam files. When I >> look in the resulting thetaMeans file, the transcript ID column is >> just numbers. Is there a way to have the proper IDs from the >> annotation? I'm copying the top of the file: >> >> # T => Mrows >> # M 183985 >> # file containing the mean value of theta - realtive abundace of >> fragments and counts >> # (overall mean, overall counts, mean of saved samples, and mean >> from every chain are reported) >> # columns: >> # <transcriptid> <meanthetaoverall> <meanreadcountoverall> >> <meanthetasaved> <chain1mean> <chain2mean> <chain3mean> <chain4mean> >> #thetaAct: 0.000000000e+00 0 0.000000000e+00 0.000000000e+00 >> 0.000000000e+00 0.000000000e+00 0.000000000e+00 >> 1 2.329622950e-06 1 2.336304311e-06 2.384341203e-06 >> 2.274251443e-06 2.328343218e-06 2.331555935e-06 >> 2 4.591043926e-06 1 4.591127729e-06 4.730587366e-06 >> 4.525049801e-06 4.538165770e-06 4.570372768e-06 >> 3 2.394661546e-06 1 2.414399085e-06 2.505416202e-06 >> 2.208339897e-06 2.389828455e-06 2.475061630e-06 >> 4 2.328241183e-06 1 2.353625902e-06 2.319018416e-06 >> 2.315240398e-06 2.373995323e-06 2.304710595e-06 >> 5 2.300820499e-06 1 2.362273440e-06 2.228251927e-06 >> 2.281075630e-06 2.309094754e-06 2.384859685e-06 >> 6 2.389348690e-06 1 2.410694464e-06 2.348631392e-06 >> 2.522953884e-06 2.443492278e-06 2.242317204e-06 >> 7 2.400561999e-06 1 2.419266200e-06 2.475096269e-06 >> 2.441005711e-06 2.306151611e-06 2.379994405e-06 >> 8 2.307687449e-06 1 2.195070192e-06 2.316539235e-06 >> 2.437629395e-06 2.250309118e-06 2.226272048e-06 >> 9 2.399691624e-06 1 2.452207949e-06 2.380327529e-06 >> 2.509800276e-06 2.263119465e-06 2.445519224e-06 >> >> >> >> This is the command that I ran: >> >res_A08485 <- getExpression("A08485_gr4_mini.sam", >> "/projects/mkreitzman_prj/expression_quantification_testing/testing /test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.a ll.fa", >> + outPrefix="A08485_gr4_mini.run1", log=TRUE, seed=47) >> >> thanks for your help, >> >> Maayan >> >> On 03/26/2013 06:28 AM, Peter Glaus wrote: >>> Dear Maayan, >>> not sure what is the problem as I was not able to replicate this error. >>> Could you please try: >>> 1) running with verbose=TRUE option, so there will be a bit more output. >>> 2) use option outPrefix. With this option, BitSeq does not use temporary >>> directories. >>> Examples: assuming your working direcotry in R is: >>> /projects/mkreitzman_prj/expression_quantification_testing/testing /BitSeq >>> you could run: >>> getExpression("A08485_gr4.sam_mini.sam", >>> "../test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69 .cdna.all.fa", >>> outPrefix="A08485_gr4.sam_mini.run1", log=TRUE, seed=47, verbose=TRUE); >>> >>> Which should create 3 file in the directory: >>> A08485_gr4.sam_mini.run1.rpkm, A08485_gr4.sam_mini.run1.thetaMeans and >>> A08485_gr4.sam_mini.run1.mean. >>> >>> (You can also use relative/absolute paths in the outPrefix, just make >>> sure the directories exist.) >>> >>> If the run fails again, you can have a look into the directory and see >>> what kind of files were created. >>> >>> Regards, >>> Peter. >>> >>> On 25/03/13 22:00, Maayan Kreitzman wrote: >>>> Hi Peter, >>>> I'm still having some difficulty with the first step of BitSeq. >>>> I made myself a mini dataset with just 5 million reads so that I could run everything in my R console without writing and R script and submitting to the cluster (just until I know what I'm doing) >>>> >>>> but, I got this error: >>>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >>>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >>>> >>>> since it seemed to be an issue with the temp directory, I tried to change my TMPDIR in the shell to somewhere with plenty of space: >>>> [mkreitzman@xhost09 ~]$ echo $TMPDIR >>>> /projects/wtss_scratch/maayan >>>> >>>> but, this did not make a difference to where the temp files were created. >>>> I copied the whole session below. >>>> >>>> thanks, >>>> Maayan >>>> >>>> >>>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quant ification_testing/testing/BitSeq/A08485_gr4.sam_mini.sam", "/projects/ mkreitzman_prj/expression_quantification_testing/testing/test_data/str and_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa", >>>> + log = TRUE, seed=47) >>>> ## Computing alignment probabilities. >>>> [time: +1.283333 m] >>>> [time: +0.333333 m] >>>> [time: +0.000000 m] >>>> [time: +1.200000 m] >>>> [time: +0.500000 m] >>>> [time: +0.000000 m] >>>> ## Estimating transcript expression levels. >>>> Mappings: 1606295 >>>> Ntotal: 2408007 >>>> 10000 [time: +0.000000 s] >>>> 100000 [time: +0.000000 s] >>>> 1000000 [time: +2.000000 s] >>>> Finished Reading! >>>> Total hits = 3212590 >>>> Isoforms: 183986 >>>> Burn in: 1000 DONE. [time: +6.633333 m] >>>> >>>> Sampling DONE. [time: +9.600000 m] >>>> rHat (for 1000 samples) >>>> rHat (rHat from subset | tid | mean theta) >>>> 1.0040 ( 1.0040 | 36651 | 0.0000) >>>> 1.0040 ( 1.0072 | 178210 | 0.0000) >>>> 1.0036 ( 1.0088 | 148680 | 0.0000) >>>> Mean rHat of worst 10 transcripts: 1.003527 >>>> Mean C0: (50 50 50 50 ). Nunmap: 801712 >>>> >>>> Producing 649 final samples. >>>> >>>> Sampling DONE. [time: +6.833333 m] >>>> rHat (for 649 samples) >>>> rHat (rHat from subset | tid | mean theta) >>>> 1.0061 ( 1.0051 | 157831 | 0.0000) >>>> 1.0059 ( 1.0048 | 104363 | 0.0000) >>>> 1.0058 ( 1.0068 | 71659 | 0.0000) >>>> Mean rHat of worst 10 transcripts: 1.005543 >>>> Mean C0: (50 50 50 50 ). Nunmap: 801712 >>>> >>>> Total samples: 6596 >>>> ## Computing means. >>>> Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : >>>> Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS- 27c66ebf4a21.rpkm failed to open. >>>> ________________________________________ >>>> From: Peter Glaus [glaus@cs.man.ac.uk] >>>> Sent: Friday, March 22, 2013 5:32 AM >>>> To: Maayan Kreitzman >>>> Subject: Re: BitSeq getExpression crash >>>> >>>> Hi Maayan, >>>> I believe the error is caused by process running out of memory. I am not >>>> 100% sure, but when I saw this kinds of errors before, it was caused by >>>> lack of memory. The estimation can be quite CPU and memory intensive, so >>>> I advice running it on a computing cluster instead of using regular >>>> desktop/notebook machine. >>>> >>>> Regarding your function call, when running actual analysis (not just >>>> testing/trying out), please use higher values for MCMC_burnIn, >>>> MCMC_samplesN and MCMC_samplesSave (the default when leaving these blank >>>> is 1000 and is usually "good enough"), the computation will take longer, >>>> however the estimates will be much more accurate as well. >>>> (The values 200, 200, 50 are used in the vignette because the example >>>> data is very small, and the vignette has to run within time limit.) >>>> >>>> Also, for future reference when you have questions regarding >>>> Bioconductor packages, please post to Bioconductor user mailing list >>>> (and CC package author), as you might sometimes get replies from other >>>> users and also your post might help some other users if they encounter >>>> similar problem in the future. >>>> >>>> Best regards, >>>> Peter. >>>> >>>> On 21/03/13 21:46, Maayan Kreitzman wrote: >>>>> Dear Peter, >>>>> I'm trying to run BitSeq, and am running into a problem after several hours of the getExpression function running. >>>>> This same thing happened twice, on different servers. What weird is that not only does the function crash, it actually exits R. >>>>> this is the error message: >>>>> >>>>> terminate called after throwing an instance of 'std::bad_alloc' >>>>> what(): St9bad_alloc >>>>> Aborted >>>>> >>>>> I have no experience whatsoever with R, so this may be a novice mistake, but your help would be greatly appreciated. >>>>> I've copied the whole session below. >>>>> >>>>> thanks in advance, >>>>> Maayan >>>>> >>>>> >>>>>> library("BitSeq") >>>>> Loading required package: Rsamtools >>>>> Loading required package: IRanges >>>>> Loading required package: BiocGenerics >>>>> >>>>> Attaching package: BiocGenerics >>>>> >>>>> The following object(s) are masked from package:stats: >>>>> >>>>> xtabs >>>>> >>>>> The following object(s) are masked from package:base: >>>>> >>>>> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, >>>>> get, intersect, lapply, Map, mapply, mget, order, paste, pmax, >>>>> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, >>>>> rownames, sapply, setdiff, table, tapply, union, unique >>>>> >>>>> Loading required package: GenomicRanges >>>>> Loading required package: Biostrings >>>>> Loading required package: zlibbioc >>>>>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quan tification_testing/testing/test_data/strand_specific/transcriptome/bow tie2transcriptome/A08473_gr4.sam", >>>>> + "/projects/mkreitzman_prj/expression_quantification_testing/te sting/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.c dna.all.fa", >>>>> + log = TRUE, MCMC_burnIn=200,MCMC_samplesN=200,MCMC_samplesSave=50,seed=47) >>>>> ## Computing alignment probabilities. >>>>> [time: +1.866667 m] >>>>> [time: +36.400000 m] >>>>> [time: +0.000000 m] >>>>> [time: +117.050000 m] >>>>> [time: +0.500000 m] >>>>> [time: +0.000000 m] >>>>> ## Estimating transcript expression levels. >>>>> Mappings: 71092830 >>>>> Ntotal: 123098679 >>>>> 10000 [time: +1.000000 s] >>>>> 100000 [time: +0.000000 s] >>>>> 1000000 [time: +3.000000 s] >>>>> 10000000 [time: +25.000000 s] >>>>> Read only 14186178 reads. >>>>> Finished Reading! >>>>> Total hits = 28372355 >>>>> Isoforms: 183985 >>>>> Burn in: 200 DONE. [time: +12.016667 m] >>>>> >>>>> Sampling DONE. [time: +12.850000 m] >>>>> rHat (for 200 samples) >>>>> rHat (rHat from subset | tid | mean theta) >>>>> 1.0252 ( 1.1173 | 89080 | 0.0000) >>>>> 1.0216 ( 1.1351 | 126802 | 0.0000) >>>>> 1.0183 ( 1.0151 | 183201 | 0.0000) >>>>> Mean rHat of worst 10 transcripts: 1.018596 >>>>> Mean C0: (3516 3520 3529 3518 ). Nunmap: 52005849 >>>>> >>>>> Producing 33 final samples. >>>>> >>>>> Sampling DONE. [time: +2.166667 m] >>>>> rHat (for 33 samples) >>>>> rHat (rHat from subset | tid | mean theta) >>>>> 1.1193 ( 1.1332 | 117458 | 0.0000) >>>>> 1.1181 ( 1.1229 | 158878 | 0.0000) >>>>> 1.1074 ( 1.1004 | 43840 | 0.0000) >>>>> Mean rHat of worst 10 transcripts: 1.108279 >>>>> Mean C0: (3528 3512 3523 3520 ). Nunmap: 52005849 >>>>> >>>>> Total samples: 932 >>>>> terminate called after throwing an instance of 'std::bad_alloc' >>>>> what(): St9bad_alloc >>>>> Aborted >> > [[alternative HTML version deleted]]