Aligning short reads to a single gene
1
1
Entering edit mode
Ugo Borello ▴ 340
@ugo-borello-5753
Last seen 5.8 years ago
France
Dear all, a sanity check of my RNA-Seq experiment is driving me crazy. I want to verify that the GFP reporter gene is expressed in my control samples. This reporter gene, of course, is not present in the mouse reference genome but only in my transgenic mice. Therefore, I thought to align the short reads of my RNA-Seq fastq files to the GFP gene sequence to verify its mRNA presence in my samples. I came out with this na?ve solution: library(ShortRead) ## read fastq file reads<- readFastq('filename.fastq') ## get the sequences seq<- sread(reads) ## keep only the sequences without 'N' alf <- alphabetFrequency( sread(reads)) clean<- seq[alf[,"N"] == 0] ## make a dictionary of those sequences pdict0<- PDict(clean) ## match the dictionary of my sequences to the GFP gene (~800 bp) myseq<-DNAString("ATG...TAA") ## here the GFP gene sequence mysearch <- matchPDict(pdict0, myseq, max.mismatch=0) ## exact match count_index <- sum(countIndex(mysearch)) count_index ## the total number of alignments Any suggestion for a nicer and more efficient solution? Thank you in advance for your help Ugo
• 3.3k views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 17 days ago
Australia/Melbourne/Olivia Newton-John …
An alternative way will be to use an aligner to align your reads to your reporter gene. You'll need to build an index first using the sequence of your reporter gene and then perform the alignment. But the alignment should be very fast because your 'reference sequence' is very short. The alignment will look after sequencing errors in your reads as well. Wei On Feb 8, 2013, at 7:30 PM, Ugo Borello wrote: > Dear all, > a sanity check of my RNA-Seq experiment is driving me crazy. > > I want to verify that the GFP reporter gene is expressed in my control > samples. This reporter gene, of course, is not present in the mouse > reference genome but only in my transgenic mice. > Therefore, I thought to align the short reads of my RNA-Seq fastq files to > the GFP gene sequence to verify its mRNA presence in my samples. > > > I came out with this na?ve solution: > > library(ShortRead) > > ## read fastq file > reads<- readFastq('filename.fastq') > > ## get the sequences > seq<- sread(reads) > > ## keep only the sequences without 'N' > alf <- alphabetFrequency( sread(reads)) > clean<- seq[alf[,"N"] == 0] > > ## make a dictionary of those sequences > pdict0<- PDict(clean) > > ## match the dictionary of my sequences to the GFP gene (~800 bp) > myseq<-DNAString("ATG...TAA") ## here the GFP gene sequence > mysearch <- matchPDict(pdict0, myseq, max.mismatch=0) ## exact match > count_index <- sum(countIndex(mysearch)) > count_index ## the total number of alignments > > > > Any suggestion for a nicer and more efficient solution? > > Thank you in advance for your help > > Ugo > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6