alignShortReads
1
0
Entering edit mode
@sonja-althammer-6105
Last seen 9.6 years ago
Hi! I am having trouble aligning a DNAStringSet to a BSgenome. The example in the help works for me.... Could anyone explain to me the error I am getting, please and how I can make it work? Actually I would like to align these reads to a specific gene... Is there maybe a better way? Thanks! Sonja > alignShortReads(reads, Hsapiens, seqNames="chr1") Error in .Call2("ACtree2_build", tb, pp_exclude, base_codes, nodebuf_ptr, : non base DNA letter found in Trusted Band for pattern 248 > class(reads) [1] "DNAStringSet" attr(,"package") [1] "Biostrings" > class(Hsapiens) [1] "BSgenome" attr(,"package") [1] "BSgenome" [[alternative HTML version deleted]]
BSgenome BSgenome BSgenome BSgenome • 725 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 2 days ago
United States
On 08/21/2013 02:54 PM, Sonja Althammer wrote: > Hi! > I am having trouble aligning a DNAStringSet to a BSgenome. The example in > the help works for me.... > Could anyone explain to me the error I am getting, please and how I can > make it work? Actually I would like to align these reads to a specific > gene... Is there maybe a better way? > Thanks! > Sonja > > >> alignShortReads(reads, Hsapiens, seqNames="chr1") > Error in .Call2("ACtree2_build", tb, pp_exclude, base_codes, nodebuf_ptr, alignShortReads is from R453Plus1Toolbox. DNAStringSet can contain ambiguity letters such as 'M', to indicate that the nucleotide could be either A or C > IUPAC_CODE_MAP A C G T M R W S Y K V "A" "C" "G" "T" "AC" "AG" "AT" "CG" "CT" "GT" "ACG" H D B N "ACT" "AGT" "CGT" "ACGT" but the aligner in alignShortReads expects only > DNA_BASES [1] "A" "C" "G" "T" If your sequences do contain just bases, it would make sense to run Biostrings::matchPDict directly on the sequence of the gene of interest. You could obtained the sequence of your gene(s) of interest by, e.g., using the function ?select and the Homo.sapiens package to get coordinates of the gene, ?getSeq to extract the sequence from the Hsapiens BSgenome object you already have, and matchPDict to do the alignment. Probably in a way that requires more care, I did the following to arrive at the sequence of the transcripts of BRCA1 library(BSgenome.Hsapiens.UCSC.hg19) library(Homo.sapiens) Discover what I can do: ## 'keytypes' I can query with... keytypes(Homo.sapiens) ## discover the correct way to specify a gene symbol I'm interested in grep("BRCA", keys(Homo.sapiens, "SYMBOL")) ## discover what info I can extract cols(Homo.sapiens) Now create ranges for transcripts ## select genomic coordinates for BRCA1 xx = select(Homo.sapiens, "BRCA1", c("TXID", "TXCHROM", "TXSTRAND", "TXSTART", "TXEND"), "SYMBOL") ## turn coordinates into a GRanges yy = with(xx, GRanges(TXCHROM, IRanges(TXSTART, TXEND), TXSTRAND)) or alternatively map to the central 'Entrez' identifier and... ## figure out Entrez ID corresponding to symbol eid = select(Homo.sapiens, "BRCA1", "ENTREZID", "SYMBOL")$ENTREZID ## extract transcript coordinates from TxDb yy = transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")[eid] What's a gene? Maybe it's just the range of transcripts: gn = range(yy) Finally retrieve the relevant sequence(s) seq = getSeq(Hsapiens, yy) ## or getSeq(Hsapiens, gn) Judging by the package you're using, you might have longer 454 reads, where the ungapped high fidelity matches of matchPDict are not appropriate. If you're interested in more general alignments, or if the sequences you're trying to align do have ambiguity letters, there is Biostrings::pairwiseAlignment. It might still be that this general purpose aligner is not appropriate for the task you're trying to accomplish, so some detail (and perhaps input from others on the list!) might be needed. Hope that provides some guidance. Martin > : > non base DNA letter found in Trusted Band for pattern 248 >> class(reads) > [1] "DNAStringSet" > attr(,"package") > [1] "Biostrings" > >> class(Hsapiens) > [1] "BSgenome" > attr(,"package") > [1] "BSgenome" > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6