BitSeq input data error
1
0
Entering edit mode
@fatemehsadat-seyednasrollah-5367
Last seen 9.6 years ago
Hi all, I want to use SamBit to get the DE genes of my dataset. I have used TopHat to map my data. As far as I understand BitSeq needs to input files: The BAM/SAM file and the fasta file. I used the accepted_hits.bam file from TopHat and the fasta file I have used to map but I got error in the first line of my code. Below is the code and the error I received. Any suggestions to fix it? many thanks in advance. res1 <- getExpression("accepted_hits.bam", "sample.fasta", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) and the error: Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : Main: number of transcripts don't match: 25 vs 5927492 > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BitSeq_1.0.1 zlibbioc_1.2.0 Rsamtools_1.8.6 [4] Biostrings_2.24.1 GenomicRanges_1.8.13 IRanges_1.14.4 [7] BiocGenerics_0.2.0 BiocInstaller_1.4.9 loaded via a namespace (and not attached): [1] bitops_1.0-4.2 stats4_2.15.1 tools_2.15.1
BitSeq BitSeq • 1.7k views
ADD COMMENT
0
Entering edit mode
Peter Glaus ▴ 70
@peter-glaus-5589
Last seen 9.6 years ago
Dear Fatemehsadat, this particular error report in version 1.0.1 of BitSeq is a bit ambiguous, so please try updating to the newest version of BitSeq. Most probably the error is caused by the number of transcripts in BAM file being different from the number in reference Fasta (25 vs 5927492). What kind of reference are you using? BitSeq expects the reference to be assembled transcriptome (one Fasta entry is one entire transcript), and you don't need TopHat to align reads to this kind of reference. Regards, Peter Glaus. On 11/05/12 17:28, Fatemehsadat Seyednasrollah wrote: > Hi all, > > I want to use SamBit to get the DE genes of my dataset. I have used TopHat to map my data. As far as I understand BitSeq needs to input files: The BAM/SAM file and the fasta file. I used the accepted_hits.bam file from TopHat and the fasta file I have used to map but I got error in the first line of my code. Below is the code and the error I received. Any suggestions to fix it? many thanks in advance. > > res1 <- getExpression("accepted_hits.bam", "sample.fasta", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) > > and the error: > Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : > Main: number of transcripts don't match: 25 vs 5927492 > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BitSeq_1.0.1 zlibbioc_1.2.0 Rsamtools_1.8.6 > [4] Biostrings_2.24.1 GenomicRanges_1.8.13 IRanges_1.14.4 > [7] BiocGenerics_0.2.0 BiocInstaller_1.4.9 > > loaded via a namespace (and not attached): > [1] bitops_1.0-4.2 stats4_2.15.1 tools_2.15.1 > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Dear Peter, So many thanks for the reply. First I changed the version of BitSeq now it is 1.2.0. But maybe I am misunderstood by the input data files. In the BitSeq manual it is recommended to align data by bowtie and obtain the SAM input file, which I have used TopHat instead of bowtie. Is it necessary to align only with bowtie or I am misunderstood by what you mean about the alignment file. And then now that I have run my command with this new version I got this error: (my reference is refseq hg19 the one I have used for the alignment) res1 <- getExpression("accepted_hits.bam", "genome.fa", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) ## Computing alignment probabilities. Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : Main: Transcript info length and sequence length of transcript 0 DO NOT MATCH! (249250621 16571) Many Thanks in advance. Best Regards, Fatemeh ________________________________________ From: Peter Glaus [glaus@cs.man.ac.uk] Sent: Tuesday, November 06, 2012 3:19 AM To: Fatemehsadat Seyednasrollah Cc: bioconductor at r-project.org Subject: Re: [BioC] BitSeq input data error Dear Fatemehsadat, this particular error report in version 1.0.1 of BitSeq is a bit ambiguous, so please try updating to the newest version of BitSeq. Most probably the error is caused by the number of transcripts in BAM file being different from the number in reference Fasta (25 vs 5927492). What kind of reference are you using? BitSeq expects the reference to be assembled transcriptome (one Fasta entry is one entire transcript), and you don't need TopHat to align reads to this kind of reference. Regards, Peter Glaus. On 11/05/12 17:28, Fatemehsadat Seyednasrollah wrote: > Hi all, > > I want to use SamBit to get the DE genes of my dataset. I have used TopHat to map my data. As far as I understand BitSeq needs to input files: The BAM/SAM file and the fasta file. I used the accepted_hits.bam file from TopHat and the fasta file I have used to map but I got error in the first line of my code. Below is the code and the error I received. Any suggestions to fix it? many thanks in advance. > > res1 <- getExpression("accepted_hits.bam", "sample.fasta", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) > > and the error: > Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : > Main: number of transcripts don't match: 25 vs 5927492 > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BitSeq_1.0.1 zlibbioc_1.2.0 Rsamtools_1.8.6 > [4] Biostrings_2.24.1 GenomicRanges_1.8.13 IRanges_1.14.4 > [7] BiocGenerics_0.2.0 BiocInstaller_1.4.9 > > loaded via a namespace (and not attached): > [1] bitops_1.0-4.2 stats4_2.15.1 tools_2.15.1 > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Dear Fatemeh, TopHat is used for spliced alignment against genome. And from your command I assume that genome.fa contains genomic reference. BitSeq, on the other hand, assumes known genome annotation - known splice variants (or transcripts). So the alignment is done against transcriptome reference. For some genomes with known annotation, you can download transcript reference from e.g. UCSC (I haven't used RefSeq much, but I think you should download table:"refGene" and select output format:"sequence" sequence type: "mrna"), or ensembl (human: ftp://ftp.ensembl.org/pub/release-69/fasta/homo_sapiens/cdna/). Because the reference consists of transcripts, you don't really need spliced alignment and so you should use Bowtie instead of TopHat, and then provide the bam file and "transcriptome.fa" to getExpression function. Regards, Peter. On 11/06/2012 08:49 AM, Fatemehsadat Seyednasrollah wrote: > Dear Peter, > > So many thanks for the reply. First I changed the version of BitSeq now it is 1.2.0. But maybe I am misunderstood by the input data files. In the BitSeq manual it is recommended to align data by bowtie and obtain the SAM input file, which I have used TopHat instead of bowtie. Is it necessary to align only with bowtie or I am misunderstood by what you mean about the alignment file. > And then now that I have run my command with this new version I got this error: > > (my reference is refseq hg19 the one I have used for the alignment) > res1 <- getExpression("accepted_hits.bam", "genome.fa", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) > > ## Computing alignment probabilities. > Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : > Main: Transcript info length and sequence length of transcript 0 DO NOT MATCH! (249250621 16571) > > Many Thanks in advance. > Best Regards, > Fatemeh > > ________________________________________ > From: Peter Glaus [glaus at cs.man.ac.uk] > Sent: Tuesday, November 06, 2012 3:19 AM > To: Fatemehsadat Seyednasrollah > Cc: bioconductor at r-project.org > Subject: Re: [BioC] BitSeq input data error > > Dear Fatemehsadat, > this particular error report in version 1.0.1 of BitSeq is a bit > ambiguous, so please try updating to the newest version of BitSeq. > > Most probably the error is caused by the number of transcripts in BAM > file being different from the number in reference Fasta (25 vs 5927492). > What kind of reference are you using? BitSeq expects the reference to be > assembled transcriptome (one Fasta entry is one entire transcript), and > you don't need TopHat to align reads to this kind of reference. > > Regards, > Peter Glaus. > > On 11/05/12 17:28, Fatemehsadat Seyednasrollah wrote: >> Hi all, >> >> I want to use SamBit to get the DE genes of my dataset. I have used TopHat to map my data. As far as I understand BitSeq needs to input files: The BAM/SAM file and the fasta file. I used the accepted_hits.bam file from TopHat and the fasta file I have used to map but I got error in the first line of my code. Below is the code and the error I received. Any suggestions to fix it? many thanks in advance. >> >> res1 <- getExpression("accepted_hits.bam", "sample.fasta", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) >> >> and the error: >> Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : >> Main: number of transcripts don't match: 25 vs 5927492 >> >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] BitSeq_1.0.1 zlibbioc_1.2.0 Rsamtools_1.8.6 >> [4] Biostrings_2.24.1 GenomicRanges_1.8.13 IRanges_1.14.4 >> [7] BiocGenerics_0.2.0 BiocInstaller_1.4.9 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-4.2 stats4_2.15.1 tools_2.15.1 >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Dear Peter, Actually I have a dataset and my final goal is to compare DE genes from BitSeq and cuffdiff. I have used TpHat and as you said the genome reference to map my data and then run cuffdiff. Now do you think it would be a fair comparison of differential expression if I use another type of reference and mapper? (I know this is the only way of doing it but I would be grateful if you have any hints to help). What about the idea of making my own transcriptome reference from the genome reference I have used? Do think there are any scripts or tools making it? Sorry I asked a lot. Best Regards, Fatemeh ________________________________________ From: Peter Glaus [glaus@cs.man.ac.uk] Sent: Tuesday, November 06, 2012 1:03 PM To: Fatemehsadat Seyednasrollah Cc: bioconductor at r-project.org Subject: Re: [BioC] BitSeq input data error Dear Fatemeh, TopHat is used for spliced alignment against genome. And from your command I assume that genome.fa contains genomic reference. BitSeq, on the other hand, assumes known genome annotation - known splice variants (or transcripts). So the alignment is done against transcriptome reference. For some genomes with known annotation, you can download transcript reference from e.g. UCSC (I haven't used RefSeq much, but I think you should download table:"refGene" and select output format:"sequence" sequence type: "mrna"), or ensembl (human: ftp://ftp.ensembl.org/pub/release-69/fasta/homo_sapiens/cdna/). Because the reference consists of transcripts, you don't really need spliced alignment and so you should use Bowtie instead of TopHat, and then provide the bam file and "transcriptome.fa" to getExpression function. Regards, Peter. On 11/06/2012 08:49 AM, Fatemehsadat Seyednasrollah wrote: > Dear Peter, > > So many thanks for the reply. First I changed the version of BitSeq now it is 1.2.0. But maybe I am misunderstood by the input data files. In the BitSeq manual it is recommended to align data by bowtie and obtain the SAM input file, which I have used TopHat instead of bowtie. Is it necessary to align only with bowtie or I am misunderstood by what you mean about the alignment file. > And then now that I have run my command with this new version I got this error: > > (my reference is refseq hg19 the one I have used for the alignment) > res1 <- getExpression("accepted_hits.bam", "genome.fa", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) > > ## Computing alignment probabilities. > Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : > Main: Transcript info length and sequence length of transcript 0 DO NOT MATCH! (249250621 16571) > > Many Thanks in advance. > Best Regards, > Fatemeh > > ________________________________________ > From: Peter Glaus [glaus at cs.man.ac.uk] > Sent: Tuesday, November 06, 2012 3:19 AM > To: Fatemehsadat Seyednasrollah > Cc: bioconductor at r-project.org > Subject: Re: [BioC] BitSeq input data error > > Dear Fatemehsadat, > this particular error report in version 1.0.1 of BitSeq is a bit > ambiguous, so please try updating to the newest version of BitSeq. > > Most probably the error is caused by the number of transcripts in BAM > file being different from the number in reference Fasta (25 vs 5927492). > What kind of reference are you using? BitSeq expects the reference to be > assembled transcriptome (one Fasta entry is one entire transcript), and > you don't need TopHat to align reads to this kind of reference. > > Regards, > Peter Glaus. > > On 11/05/12 17:28, Fatemehsadat Seyednasrollah wrote: >> Hi all, >> >> I want to use SamBit to get the DE genes of my dataset. I have used TopHat to map my data. As far as I understand BitSeq needs to input files: The BAM/SAM file and the fasta file. I used the accepted_hits.bam file from TopHat and the fasta file I have used to map but I got error in the first line of my code. Below is the code and the error I received. Any suggestions to fix it? many thanks in advance. >> >> res1 <- getExpression("accepted_hits.bam", "sample.fasta", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) >> >> and the error: >> Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : >> Main: number of transcripts don't match: 25 vs 5927492 >> >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] BitSeq_1.0.1 zlibbioc_1.2.0 Rsamtools_1.8.6 >> [4] Biostrings_2.24.1 GenomicRanges_1.8.13 IRanges_1.14.4 >> [7] BiocGenerics_0.2.0 BiocInstaller_1.4.9 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-4.2 stats4_2.15.1 tools_2.15.1 >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Dear Fatemeh, as for the mapper, TopHat uses Bowtie internally and they are both suited for slightly different purpose and different scenarios. So using Bowtie is ok as far as I can tell, because BitSeq works in slightly different scenario than Cuffdiff. With the reference, I think you have two options. If there is an existing annotation for your reference, then you can run Cuffdiff with an option which enables it to use the annotation (you provide gtf file of known transcripts), so both programs start with same reference and annotation, just in different format. The other thing you could do is to use the GTF file produced by Cuffdiff (or Cufflinks) and construct transcriptome based no that and genome reference. BitSeq does not provide a tool for doing that, but MMSeq package (http://bgx.org.uk/software/mmseq.html) has a program extract_transcripts, which should be able to do it, but I don't think I ever tried it myself, so you better look at their website for more information about how to use it. Peter. On 11/06/2012 12:09 PM, Fatemehsadat Seyednasrollah wrote: > Dear Peter, > > Actually I have a dataset and my final goal is to compare DE genes from BitSeq and cuffdiff. > I have used TpHat and as you said the genome reference to map my data and then run cuffdiff. Now do you think it would be a fair comparison of differential expression if I use another type of reference and mapper? (I know this is the only way of doing it but I would be grateful if you have any hints to help). > What about the idea of making my own transcriptome reference from the genome reference I have used? Do think there are any scripts or tools making it? > Sorry I asked a lot. > Best Regards, > Fatemeh > ________________________________________ > From: Peter Glaus [glaus at cs.man.ac.uk] > Sent: Tuesday, November 06, 2012 1:03 PM > To: Fatemehsadat Seyednasrollah > Cc: bioconductor at r-project.org > Subject: Re: [BioC] BitSeq input data error > > Dear Fatemeh, > TopHat is used for spliced alignment against genome. And from your > command I assume that genome.fa contains genomic reference. > > BitSeq, on the other hand, assumes known genome annotation - known > splice variants (or transcripts). So the alignment is done against > transcriptome reference. For some genomes with known annotation, you can > download transcript reference from e.g. UCSC (I haven't used RefSeq > much, but I think you should download table:"refGene" and select output > format:"sequence" sequence type: "mrna"), or ensembl (human: > ftp://ftp.ensembl.org/pub/release-69/fasta/homo_sapiens/cdna/). > > Because the reference consists of transcripts, you don't really need > spliced alignment and so you should use Bowtie instead of TopHat, and > then provide the bam file and "transcriptome.fa" to getExpression function. > > Regards, > Peter. > > > On 11/06/2012 08:49 AM, Fatemehsadat Seyednasrollah wrote: >> Dear Peter, >> >> So many thanks for the reply. First I changed the version of BitSeq now it is 1.2.0. But maybe I am misunderstood by the input data files. In the BitSeq manual it is recommended to align data by bowtie and obtain the SAM input file, which I have used TopHat instead of bowtie. Is it necessary to align only with bowtie or I am misunderstood by what you mean about the alignment file. >> And then now that I have run my command with this new version I got this error: >> >> (my reference is refseq hg19 the one I have used for the alignment) >> res1 <- getExpression("accepted_hits.bam", "genome.fa", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) >> >> ## Computing alignment probabilities. >> Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : >> Main: Transcript info length and sequence length of transcript 0 DO NOT MATCH! (249250621 16571) >> >> Many Thanks in advance. >> Best Regards, >> Fatemeh >> >> ________________________________________ >> From: Peter Glaus [glaus at cs.man.ac.uk] >> Sent: Tuesday, November 06, 2012 3:19 AM >> To: Fatemehsadat Seyednasrollah >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] BitSeq input data error >> >> Dear Fatemehsadat, >> this particular error report in version 1.0.1 of BitSeq is a bit >> ambiguous, so please try updating to the newest version of BitSeq. >> >> Most probably the error is caused by the number of transcripts in BAM >> file being different from the number in reference Fasta (25 vs 5927492). >> What kind of reference are you using? BitSeq expects the reference to be >> assembled transcriptome (one Fasta entry is one entire transcript), and >> you don't need TopHat to align reads to this kind of reference. >> >> Regards, >> Peter Glaus. >> >> On 11/05/12 17:28, Fatemehsadat Seyednasrollah wrote: >>> Hi all, >>> >>> I want to use SamBit to get the DE genes of my dataset. I have used TopHat to map my data. As far as I understand BitSeq needs to input files: The BAM/SAM file and the fasta file. I used the accepted_hits.bam file from TopHat and the fasta file I have used to map but I got error in the first line of my code. Below is the code and the error I received. Any suggestions to fix it? many thanks in advance. >>> >>> res1 <- getExpression("accepted_hits.bam", "sample.fasta", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) >>> >>> and the error: >>> Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : >>> Main: number of transcripts don't match: 25 vs 5927492 >>> >>>> sessionInfo() >>> R version 2.15.1 (2012-06-22) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] BitSeq_1.0.1 zlibbioc_1.2.0 Rsamtools_1.8.6 >>> [4] Biostrings_2.24.1 GenomicRanges_1.8.13 IRanges_1.14.4 >>> [7] BiocGenerics_0.2.0 BiocInstaller_1.4.9 >>> >>> loaded via a namespace (and not attached): >>> [1] bitops_1.0-4.2 stats4_2.15.1 tools_2.15.1 >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Many Many thanks. With Best Regards, Fatemeh ________________________________________ From: Peter Glaus [glaus@cs.man.ac.uk] Sent: Tuesday, November 06, 2012 5:02 PM To: Fatemehsadat Seyednasrollah Cc: bioconductor at r-project.org Subject: Re: [BioC] BitSeq input data error Dear Fatemeh, as for the mapper, TopHat uses Bowtie internally and they are both suited for slightly different purpose and different scenarios. So using Bowtie is ok as far as I can tell, because BitSeq works in slightly different scenario than Cuffdiff. With the reference, I think you have two options. If there is an existing annotation for your reference, then you can run Cuffdiff with an option which enables it to use the annotation (you provide gtf file of known transcripts), so both programs start with same reference and annotation, just in different format. The other thing you could do is to use the GTF file produced by Cuffdiff (or Cufflinks) and construct transcriptome based no that and genome reference. BitSeq does not provide a tool for doing that, but MMSeq package (http://bgx.org.uk/software/mmseq.html) has a program extract_transcripts, which should be able to do it, but I don't think I ever tried it myself, so you better look at their website for more information about how to use it. Peter. On 11/06/2012 12:09 PM, Fatemehsadat Seyednasrollah wrote: > Dear Peter, > > Actually I have a dataset and my final goal is to compare DE genes from BitSeq and cuffdiff. > I have used TpHat and as you said the genome reference to map my data and then run cuffdiff. Now do you think it would be a fair comparison of differential expression if I use another type of reference and mapper? (I know this is the only way of doing it but I would be grateful if you have any hints to help). > What about the idea of making my own transcriptome reference from the genome reference I have used? Do think there are any scripts or tools making it? > Sorry I asked a lot. > Best Regards, > Fatemeh > ________________________________________ > From: Peter Glaus [glaus at cs.man.ac.uk] > Sent: Tuesday, November 06, 2012 1:03 PM > To: Fatemehsadat Seyednasrollah > Cc: bioconductor at r-project.org > Subject: Re: [BioC] BitSeq input data error > > Dear Fatemeh, > TopHat is used for spliced alignment against genome. And from your > command I assume that genome.fa contains genomic reference. > > BitSeq, on the other hand, assumes known genome annotation - known > splice variants (or transcripts). So the alignment is done against > transcriptome reference. For some genomes with known annotation, you can > download transcript reference from e.g. UCSC (I haven't used RefSeq > much, but I think you should download table:"refGene" and select output > format:"sequence" sequence type: "mrna"), or ensembl (human: > ftp://ftp.ensembl.org/pub/release-69/fasta/homo_sapiens/cdna/). > > Because the reference consists of transcripts, you don't really need > spliced alignment and so you should use Bowtie instead of TopHat, and > then provide the bam file and "transcriptome.fa" to getExpression function. > > Regards, > Peter. > > > On 11/06/2012 08:49 AM, Fatemehsadat Seyednasrollah wrote: >> Dear Peter, >> >> So many thanks for the reply. First I changed the version of BitSeq now it is 1.2.0. But maybe I am misunderstood by the input data files. In the BitSeq manual it is recommended to align data by bowtie and obtain the SAM input file, which I have used TopHat instead of bowtie. Is it necessary to align only with bowtie or I am misunderstood by what you mean about the alignment file. >> And then now that I have run my command with this new version I got this error: >> >> (my reference is refseq hg19 the one I have used for the alignment) >> res1 <- getExpression("accepted_hits.bam", "genome.fa", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) >> >> ## Computing alignment probabilities. >> Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : >> Main: Transcript info length and sequence length of transcript 0 DO NOT MATCH! (249250621 16571) >> >> Many Thanks in advance. >> Best Regards, >> Fatemeh >> >> ________________________________________ >> From: Peter Glaus [glaus at cs.man.ac.uk] >> Sent: Tuesday, November 06, 2012 3:19 AM >> To: Fatemehsadat Seyednasrollah >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] BitSeq input data error >> >> Dear Fatemehsadat, >> this particular error report in version 1.0.1 of BitSeq is a bit >> ambiguous, so please try updating to the newest version of BitSeq. >> >> Most probably the error is caused by the number of transcripts in BAM >> file being different from the number in reference Fasta (25 vs 5927492). >> What kind of reference are you using? BitSeq expects the reference to be >> assembled transcriptome (one Fasta entry is one entire transcript), and >> you don't need TopHat to align reads to this kind of reference. >> >> Regards, >> Peter Glaus. >> >> On 11/05/12 17:28, Fatemehsadat Seyednasrollah wrote: >>> Hi all, >>> >>> I want to use SamBit to get the DE genes of my dataset. I have used TopHat to map my data. As far as I understand BitSeq needs to input files: The BAM/SAM file and the fasta file. I used the accepted_hits.bam file from TopHat and the fasta file I have used to map but I got error in the first line of my code. Below is the code and the error I received. Any suggestions to fix it? many thanks in advance. >>> >>> res1 <- getExpression("accepted_hits.bam", "sample.fasta", log=TRUE,MCMC_burnIn=200, MCMC_samplesN=200, MCMC_samplesSave=50) >>> >>> and the error: >>> Error in parseAlignment(alignFile, probF, trSeqFile, trInfoFile = trF, : >>> Main: number of transcripts don't match: 25 vs 5927492 >>> >>>> sessionInfo() >>> R version 2.15.1 (2012-06-22) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] BitSeq_1.0.1 zlibbioc_1.2.0 Rsamtools_1.8.6 >>> [4] Biostrings_2.24.1 GenomicRanges_1.8.13 IRanges_1.14.4 >>> [7] BiocGenerics_0.2.0 BiocInstaller_1.4.9 >>> >>> loaded via a namespace (and not attached): >>> [1] bitops_1.0-4.2 stats4_2.15.1 tools_2.15.1 >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6