all reads in ambiguous_readpair_position, from dexseq_count.py in DEXseq
0
0
Entering edit mode
Wu, Di ▴ 120
@wu-di-4945
Last seen 8.8 years ago
United States

 

Hello, 

I am trying to use DEXseq for splicing in RNAseq data.

 

I met a problem in the step to use Python code "dexseq_count.py", when following from 

 

Section 2.5 Counting reads in http://bioconductor.org/packages/release/bioc/vignettes/DEXSeq/inst/doc/DEXSeq.pdf

 

I used:

python /nas02/apps/r-3.1.1/lib64/R/library/DEXSeq/python_scripts/dexseq_count.py -p yes -r pos -s no Homo_sapiens.GRCh38.80.DEXSeq.chr.gff sam/BC13.sam txt/BC13.txt

 

I get all reads as  "_ambiguous_readpair_position 15208796"

 

I know my data is from pair-end. I checked the order, it seems as by position. There are many reads with good quality scores, 50. Not sure about "-s" as yes or no, but I tried both and got the same results.

 

These are the versions of software I am using.

module load samtools/1.2

module load python/2.7.6

module load r/3.1.1

 

I’ve also seen the python code from website "https://github.com/olgabot/rna-seq-diff-exprn/blob/master/scripts/external/dexseq_count.py"

 

The sam file was converted from bam file. I also tried using bam file directly in dexseq_count.py, that gives same results.

 

According to sam file, it seems tophat and hg19 were used. I wonder whether one reason of my wrong result is from the different reference genome.  Both “Homo_sapiens.GRCh37.75.gtf” and “Homo_sapiens.GRCh38.80.gtf” gave same results here. Not sure if other gtf file available for hg19.

 

I suspect my problem is the "sam" format I have doesn't correspond well with the setup in the python code.

 

The top of one of the one of the SAM file are as following.

@HD     VN:1.0  SO:coordinate

@SQ     SN:chr1 LN:249250621

@SQ     SN:chr10        LN:135534747

@SQ     SN:chr11        LN:135006516

@SQ     SN:chr12        LN:133851895

@SQ     SN:chr13        LN:115169878

@SQ     SN:chr14        LN:107349540

@SQ     SN:chr15        LN:102531392

@SQ     SN:chr16        LN:90354753

@SQ     SN:chr17        LN:81195210

@SQ     SN:chr18        LN:78077248

@SQ     SN:chr19        LN:59128983

@SQ     SN:chr2 LN:243199373

@SQ     SN:chr20        LN:63025520

@SQ     SN:chr21        LN:48129895

@SQ     SN:chr22        LN:51304566

@SQ     SN:chr3 LN:198022430

@SQ     SN:chr4 LN:191154276

@SQ     SN:chr5 LN:180915260

@SQ     SN:chr6 LN:171115067

@SQ     SN:chr7 LN:159138663

@SQ     SN:chr8 LN:146364022

@SQ     SN:chr9 LN:141213431

@SQ     SN:chrM LN:16571

@SQ     SN:chrX LN:155270560

@SQ     SN:chrY LN:59373566

@PG     ID:TopHat       VN:2.0.9        CL:/opt/applications/tophat/2.0.9/gnu/bin/tophat -p 12 --color --quals -r 75 --mate-std-dev 70 --report-secondary-alignments --library-type fr-secondstrand -N 4 --read-edit-dist 4 -o /gpfs/home/cdesai/Projects/SOLiD_Runs/Run_8_2_2013/Mapping/BC13 /gpfs/group/databases/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex_c/genome /gpfs/home/cdesai/Projects/SOLiD_Runs/Run_8_2_2013/Data/Processed_Data/BC13/BC13_F3_30.csfasta /gpfs/home/cdesai/Projects/SOLiD_Runs/Run_8_2_2013/Data/Processed_Data/BC13/BC13_F5_30.csfasta /gpfs/home/cdesai/Projects/SOLiD_Runs/Run_8_2_2013/Data/Processed_Data/BC13/BC13_F3_30.qual /gpfs/home/cdesai/Projects/SOLiD_Runs/Run_8_2_2013/Data/Processed_Data/BC13/BC13_F5_30.qual

922_166_1011_F3 321     chr1    10098   0       29M     chr3    141427968       0       AACCCTAACCCAACCCTAACCCTAACCCT   8RTVWP:?EDA8OPQP@5;QF%!!'@D>'   XA:i:3  MD:Z:29 NM:i:0  CM:i:3  NH:i:20 CC:Z:chr11      CP:i:175342     XS:A:+  HI:i:5

922_166_1011_F3 353     chr1    10098   0       29M     chrX    71380063        0       AACCCTAACCCAACCCTAACCCTAACCCT   8RTVWP:?EDA8OPQP@5;QF%!!'@D>'   XA:i:3  MD:Z:29 NM:i:0  CM:i:3  NH:i:20 XS:A:+  HI:i:19

2016_216_51_F3  345     chr1    10544   0       29M     *       0       0       AAATCTGTGCAGAGGACAACGCAGGTCCG   >>77>JVY\HKXQOPMRR9>RSJKQUXVV   NM:i:1  NH:i:7  CC:Z:chr12      CP:i:95051      XS:A:-  HI:i:0

443_343_1981_F3 89      chr1    10559   0       29M     *       0       0       ACAACGCAGCTCCGCCCTCGCGGTGCTCT   !!IA*-7I5!!BL?:46E?D]G+@C2/**   NM:i:0  NH:i:7  CC:Z:chr12      CP:i:95036      XS:A:-  HI:i:0

1319_1128_1427_F3       337     chr1    10568   0       29M     chr12   6705281 0       CTCCGCCCTCGCGGTGCTCTCCGGGTCTG   1?NabR@L_^EGC2L[QUba\]```a__B   XA:i:0  MD:Z:29 NM:i:0  CM:i:0  NH:i:19 CC:Z:chr2       CP:i:133110172  XS:A:-  HI:i:9

Thank you for your help!

Di

 

 

dexseq • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6