rsubread versus BWA ( Burrows-Wheeler Alignmener) for whole genome alignment
3
0
Entering edit mode
@tarekmohamed-9489
Last seen 5.9 years ago

Hi All,

I have some whole genome sequencing data that were generated from illumina hiseq platform.

as a first step, I will align the fastq file to a reference genome. Can I use Rsubread for that?  I have used it for RNAseq mapping before and it was very good, I got 90% of the read were mapped to the ref genome. Does BWA have any advantage as compared to Rsubread.

Thanks

Tarek

rsubread BWA NGS • 3.0k views
ADD COMMENT
1
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 3 days ago
Australia/Melbourne

Yes you can use the Rsubread align() function to align DNA-seq reads. You will need to set 'type="dna"' for this mapping.

For the comparison between Rsubread and BWA, you may take a look at this paper:

http://www.ncbi.nlm.nih.gov/pubmed/23558742

ADD COMMENT
0
Entering edit mode

Hi Wei,

Thanks for your reply.

I am processing whole genome sequencing data, following to alignment, I want to call SNPs within different samples (100 samples). SNP data will be used to perform a downstream GWAS.

DO you think I can use exactSNP() to call SNPs or shall I stick to GATK.

 

Thanks

Tarek

ADD REPLY
0
Entering edit mode

I cant see why you can not use exactSNP.
 

ADD REPLY
0
Entering edit mode

Hi Wei,

I tried this with the unzipped file and it returns

 *** caught segfault ***
address 0x7fff9a688000, cause 'memory not mapped'

Thanks

Tarek

ADD REPLY
0
Entering edit mode

Can you provide the full screen output?

ADD REPLY
0
Entering edit mode

 >library(Rsubread)
> setwd("/projects/b1042/BurridgeLab/rnaseq_rarg")
>exactSNP(SNPAnnotationFile="All_20151104.vcf",readFile="subread_jg4.sam",isBAM=FALSE,refGenomeFile="BSgenome.Hsapiens.NCBI.GRCh38.fasta",outputFile="subread_jg4.sam.vcf",nthreads=40)             

//====================== Running (08-Sep-2016 11:52:33) ======================\\
||                                                                            ||
|| WARNING format issue in file 'subread_jg4.sam':                            ||
||         The required format is : SAM                                       ||
||         The real format seems to be : BAM                                  ||
|| A wrong format may result in wrong results or crash the program.           ||
|| Please refer to the manual for file format options.                        ||
|| If the file is in the correct format, please ignore this message.          ||
||                                                                            ||
|| WARNING format issue in file 'BSgenome.Hsapiens.NCBI.GRCh38.fasta':        ||
||         The required format is : SAM                                       ||
||         The file format is unknown.                                        ||
|| A wrong format may result in wrong results or crash the program.           ||
|| Please refer to the manual for file format options.                        ||
|| If the file is in the correct format, please ignore this message.          ||
||                                                                            ||
|| Split SAM file into ./temp-snps-003575-58311A9F-* ...                      ||

 *** caught segfault ***
address 0x7fff9598f000, cause 'memory not mapped'

 

ADD REPLY
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 3 days ago
Australia/Melbourne

Apparently exactSNP complains about the format of your input data. You need to specify 'isBAM = TRUE' if your input data are in BAM format.

Also please refer to the posting guide (http://www.bioconductor.org/help/support/posting-guide/) when you post questions. You should provide output of sessionInfo() and other info.

ADD COMMENT
0
Entering edit mode

Hi Wei,

I have already specified isBAM =FALSE. I have added here the sessionInfo() 

>exactSNP(SNPAnnotationFile="All_20151104.vcf",readFile="CH1_0uM_subjunc.sam",isBAM=FALSE,refGenomeFile="BSgenome.Hsapiens.NCBI.GRCh38.fasta",outputFile="subread_jg4.sam.vcf",nthreads=40)

 

> sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.4 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base 

ADD REPLY
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 3 days ago
Australia/Melbourne

Sorry you should specify 'isBAM=TRUE' if you provide a BAM input.

Your R version is 1 year old. You will need to update both your R and Rsubread to the latest.

ADD COMMENT
0
Entering edit mode

Hi wie,

One more questions.

I tried subread to align whole genome sequence from one sample (paired end, 150 bp reads) and I used 'thread=64' and 216 core (9 nodes, 24 core each). but after two days it processed only 15% and I had to kill the job.

I am using the university quest, so basically I can get as many nodes/cores as I want. Is there any way by which I can accelerate the alignment.

>align(index="my_index",readfile1=c("141023_H0E72_113-JG4_L003_R1.fastq.gz"),readfile2=c("141023_H0E72_113-JG4_L003_R2.fastq.gz"), output_file=c("subread_jg4.sam"),nthreads=64,unique=TRUE,type=1)

 

Thanks

Tarek

 

ADD REPLY
0
Entering edit mode

Hi Tarek, are you using the latest version of Rsubread? The session info you provided above showed that your Rsubread and R are at least 1 year old.

Also as I mentioned above, please include all your commands and the full screen output when you post. This will help you to get help from others.

ADD REPLY

Login before adding your answer.

Traffic: 494 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6