Question: deqseq_count and BWA-based SAM files
0
gravatar for Guest User
7.9 years ago by
Guest User12k
Guest User12k wrote:
Hello all, I'm trying to use DEXSeq to identify and quantitate alternative splicing events following transgene expression. I've used BWA to map my reads and now have the resulting SAM files. However, when I use dexseq_count (a Python script) to convert this to a ExonCountSet, I get the following error message: "/common/groups/dac/Hawaii/my_new_env/lib/python2.6/site- packages/HTSeq/__init__.py:592: UserWarning: Read HWUSI- EAS381R:1:10:12324:1388#CGATGT claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)" I can see that my SAM file is not "sorted" properly. Does anyone have a workaround for this? I've searched all over and all I can find is people writing, "I used a perl script as a workaround." Thanks in advance, Wyatt -- output of sessionInfo(): R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] DEXSeq_1.0.2 Biobase_2.14.0 loaded via a namespace (and not attached): [1] hwriter_1.3 plyr_1.6 statmod_1.4.14 stringr_0.6 -- Sent via the guest posting facility at bioconductor.org.
convert dexseq • 872 views
ADD COMMENTlink modified 7.9 years ago by Shona Wood70 • written 7.9 years ago by Guest User12k
Answer: deqseq_count and BWA-based SAM files
0
gravatar for wang peter
7.9 years ago by
wang peter2.0k
wang peter2.0k wrote:
you can sort them by linux command sort -T . -S 2G -k 1,1 -k 3,3 $target_fa.psl.sam > $target_fa.psl.nameSorted.sam -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
ADD COMMENTlink written 7.9 years ago by wang peter2.0k
I'd rather use `samtools sort ...` You probably want to (1) convert your sam to bam, (2) then sort, (3) then index, (4) then use. >From the command line: ## convert to bam samtools view -bS your.sam > your-tmp.bam ## sort bam samtools sort your-tmp.bam your ## index bam samtools index your.bam Now you can junk your sam file and save mucho HD space to boot. -steve On Tue, Dec 13, 2011 at 10:41 AM, wang peter <wng.peter at="" gmail.com=""> wrote: > you can sort them by linux command > > sort -T . -S 2G -k 1,1 -k 3,3 $target_fa.psl.sam > $target_fa.psl.nameSorted.sam > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLYlink written 7.9 years ago by Steve Lianoglou12k
Thanks to everyone who has replied to this. I really appreciate it! Unfortunately, none of these has worked. I've used both Shan's script as well as samtools and am still having the same problem. Despite everything being very nicely sorted, I still getting the same error message. I'm not sure what to do. I may write the developer of the dexseq_count script. . . . Thanks so much, Wyatt ----- Original Message ----- From: "Steve Lianoglou" <mailinglist.honeypot@gmail.com> To: "wang peter" <wng.peter at="" gmail.com=""> Cc: "Wyatt McMahon [guest]" <guest at="" bioconductor.org="">, bioconductor at r-project.org, wmcmahon at vbi.vt.edu Sent: Tuesday, December 13, 2011 4:02:01 PM Subject: Re: [BioC] deqseq_count and BWA-based SAM files I'd rather use `samtools sort ...` You probably want to (1) convert your sam to bam, (2) then sort, (3) then index, (4) then use. >From the command line: ## convert to bam samtools view -bS your.sam > your-tmp.bam ## sort bam samtools sort your-tmp.bam your ## index bam samtools index your.bam Now you can junk your sam file and save mucho HD space to boot. -steve On Tue, Dec 13, 2011 at 10:41 AM, wang peter <wng.peter at="" gmail.com=""> wrote: > you can sort them by linux command > > sort -T . -S 2G -k 1,1 -k 3,3 $target_fa.psl.sam > $target_fa.psl.nameSorted.sam > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLYlink written 7.9 years ago by Wyatt McMahon20
Hi Wyatt On 2011-12-13 22:06, Wyatt McMahon wrote: > Unfortunately, none of these has worked. I've used both Shan's > script as well as samtools and am still having the same problem. > Despite everything being very nicely sorted, I still getting the same > error message. Do you get the error for every read or only for some. The latter is typically harmless. To explain: The way how the SAM format stores paired-end reads is, IMO, rather unfortunate. Each mate gets its own SAM line, and the two SAM lines can be at rather different places in the file. Once you sort by name, the mates will be close to each other (even though they may still be mixed up in case there is more than one alignment for the pair). HTSeq takes a chunk of adjacent lines with the same read ID and arranges them into matching pairs (by using the MRNM and MPOS (or RNEXT and PNEXT in the new terminology) columns). If this does not work, the warning is displayed. Often, if you do some filtering, you might remove a SAM line for a read but leave in the line for its mate. HTSeq will simply skip such reads but display the warning you saw. You can silence the warnings (but also all others) with the '-q' option if they bother you. Simon
ADD REPLYlink written 7.9 years ago by Simon Anders3.6k
Answer: deqseq_count and BWA-based SAM files
0
gravatar for Shona Wood
7.9 years ago by
Shona Wood70
Shona Wood70 wrote:
Hi Wyatt, I use SAMTOOLS to sort sam files: http://samtools.sourceforge.net/samtools.shtml ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Wyatt McMahon [guest] [guest@bioconductor.org] Sent: 13 December 2011 14:42 To: bioconductor at r-project.org; wmcmahon at vbi.vt.edu Subject: [BioC] deqseq_count and BWA-based SAM files Hello all, I'm trying to use DEXSeq to identify and quantitate alternative splicing events following transgene expression. I've used BWA to map my reads and now have the resulting SAM files. However, when I use dexseq_count (a Python script) to convert this to a ExonCountSet, I get the following error message: "/common/groups/dac/Hawaii/my_new_env/lib/python2.6/site- packages/HTSeq/__init__.py:592: UserWarning: Read HWUSI- EAS381R:1:10:12324:1388#CGATGT claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)" I can see that my SAM file is not "sorted" properly. Does anyone have a workaround for this? I've searched all over and all I can find is people writing, "I used a perl script as a workaround." Thanks in advance, Wyatt -- output of sessionInfo(): R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] DEXSeq_1.0.2 Biobase_2.14.0 loaded via a namespace (and not attached): [1] hwriter_1.3 plyr_1.6 statmod_1.4.14 stringr_0.6 -- Sent via the guest posting facility at bioconductor.org. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 7.9 years ago by Shona Wood70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 188 users visited in the last hour