Aligning short reads to a single gene

1

Entering edit mode

Ugo Borello ▴ 340

@ugo-borello-5753

Last seen 5.8 years ago

France

Dear all, a sanity check of my RNA-Seq experiment is driving me crazy. I want to verify that the GFP reporter gene is expressed in my control samples. This reporter gene, of course, is not present in the mouse reference genome but only in my transgenic mice. Therefore, I thought to align the short reads of my RNA-Seq fastq files to the GFP gene sequence to verify its mRNA presence in my samples. I came out with this na?ve solution: library(ShortRead) ## read fastq file reads<- readFastq('filename.fastq') ## get the sequences seq<- sread(reads) ## keep only the sequences without 'N' alf <- alphabetFrequency( sread(reads)) clean<- seq[alf[,"N"] == 0] ## make a dictionary of those sequences pdict0<- PDict(clean) ## match the dictionary of my sequences to the GFP gene (~800 bp) myseq<-DNAString("ATG...TAA") ## here the GFP gene sequence mysearch <- matchPDict(pdict0, myseq, max.mismatch=0) ## exact match count_index <- sum(countIndex(mysearch)) count_index ## the total number of alignments Any suggestion for a nicer and more efficient solution? Thank you in advance for your help Ugo

• 3.3k views

ADD COMMENT • link updated 11.2 years ago by Wei Shi ★ 3.6k • written 11.2 years ago by Ugo Borello ▴ 340

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 17 days ago

Australia/Melbourne/Olivia Newton-John …

An alternative way will be to use an aligner to align your reads to your reporter gene. You'll need to build an index first using the sequence of your reporter gene and then perform the alignment. But the alignment should be very fast because your 'reference sequence' is very short. The alignment will look after sequencing errors in your reads as well. Wei On Feb 8, 2013, at 7:30 PM, Ugo Borello wrote: > Dear all, > a sanity check of my RNA-Seq experiment is driving me crazy. > > I want to verify that the GFP reporter gene is expressed in my control > samples. This reporter gene, of course, is not present in the mouse > reference genome but only in my transgenic mice. > Therefore, I thought to align the short reads of my RNA-Seq fastq files to > the GFP gene sequence to verify its mRNA presence in my samples. > > > I came out with this na?ve solution: > > library(ShortRead) > > ## read fastq file > reads<- readFastq('filename.fastq') > > ## get the sequences > seq<- sread(reads) > > ## keep only the sequences without 'N' > alf <- alphabetFrequency( sread(reads)) > clean<- seq[alf[,"N"] == 0] > > ## make a dictionary of those sequences > pdict0<- PDict(clean) > > ## match the dictionary of my sequences to the GFP gene (~800 bp) > myseq<-DNAString("ATG...TAA") ## here the GFP gene sequence > mysearch <- matchPDict(pdict0, myseq, max.mismatch=0) ## exact match > count_index <- sum(countIndex(mysearch)) > count_index ## the total number of alignments > > > > Any suggestion for a nicer and more efficient solution? > > Thank you in advance for your help > > Ugo > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 11.2 years ago Wei Shi ★ 3.6k

Login before adding your answer.