Search
Question: mapping FASTQ files to single gene (fluorescent reporter)
0
gravatar for A
12 days ago by
A10
A10 wrote:

Hi all, 

I have recently posted an Rsubread featurecounts related question, but I have another one, based on a quality control analysis I would like to do: 

I am attempting to identify a fluorescent reporter that is driven of a specific promoter in our cells of interest which we then sort (based on fluorescence) and sequence. The fluorescent reporter is ZsG. I cannot find a dedicated FASTA file for this on ensembl or anywhere else for that matter apart from the ebi. The ebi contains two sequences (presumably two isoforms of the protein) that I used as the FASTA file for the reference genome. The following code was used which succesful

ref <- "AAF03372.fa"
buildindex(basename="reference_index",reference=ref)

Reference index builds successfully with no errors... 

I then use the code as per the vignette for paired-end data:

reads1<-"lane1_001_.......R1.fastq.gz"
reads2<-"ane1_001_.......R2.fastq.gz"

align(index="reference_index",readfile1=reads1,readfile2=reads2,
output_file="alignResultsPE_1.BAM",phredOffset=33 )

This also run succesfully, however, I get 0 mapping! And am abit concerned because I have no reason to believe out of ~300 samples that cell sorting has not worked correctly.. Especially as some organs express alot of this reporter based on a bigger population of cells of interest, and the gating is stringent based on negative control samples... so I have no reason to doubt that these cells are not green!!.. 

Are there any tips that you could give for why this may not be working? could it be the sequence in the fasta file? or any parameters that i should be tweaking in the align function (I have increased the number of mismatches to 6 to see if that helped but it didn't!)

Total fragments : 16,782,941                                           ||

||              Mapped : 0 (0.0%)                                             ||

||     Uniquely mapped : 0                                                    ||

||       Multi-mapping : 0                                                    ||

||                                                                            ||

||            Unmapped : 16,782,941                                           ||

||                                                                            || etc.......

Is there anything I am clearly doing wrong? What are the potential problems that could cause this, is compiling a library of one gene sufficient for this type of analysis? should this be added on to a FASTA file from the mouse genome etc... Any advice or tips would be greatly appreciated! Many thanks!!

ADD COMMENTlink written 12 days ago by A10

As per the posting guide, you need to provide the session info so we can see what versions of software packages you use. Please also provide the full screen output of buildindex() and align() functions. How long are the two reference sequences?

ADD REPLYlink written 11 days ago by Wei Shi2.9k

Thank you!! please see below! Apologies this was not done initially: Please see information requested below including all outputs:

Session Info:

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ShortRead_1.40.0            GenomicAlignments_1.18.0    SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [5] matrixStats_0.54.0          Biobase_2.42.0              Rsamtools_1.34.0            GenomicRanges_1.34.0       
 [9] GenomeInfoDb_1.18.1         Biostrings_2.50.1           XVector_0.22.0              IRanges_2.16.0             
[13] S4Vectors_0.20.1            BiocParallel_1.16.0         BiocGenerics_0.28.0         Rsubread_1.32.1            

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.5.1          Formula_1.2-3          assertthat_0.2.0      
 [5] BiocManager_1.30.4     latticeExtra_0.6-28    blob_1.1.1             GenomeInfoDbData_1.2.0
 [9] pillar_1.3.0           RSQLite_2.1.1          backports_1.1.2        lattice_0.20-38       
[13] glue_1.3.0             digest_0.6.18          RColorBrewer_1.1-2     checkmate_1.8.5       
[17] colorspace_1.3-2       htmltools_0.3.6        Matrix_1.2-15          plyr_1.8.4            
[21] DESeq2_1.22.1          XML_3.98-1.16          pkgconfig_2.0.2        genefilter_1.64.0     
[25] zlibbioc_1.28.0        purrr_0.2.5            xtable_1.8-3           scales_1.0.0          
[29] htmlTable_1.12         tibble_1.4.2           annotate_1.60.0        ggplot2_3.1.0         
[33] nnet_7.3-12            lazyeval_0.2.1         survival_2.43-1        magrittr_1.5          
[37] crayon_1.3.4           memoise_1.1.0          hwriter_1.3.2          foreign_0.8-71        
[41] tools_3.5.1            data.table_1.11.8      stringr_1.3.1          locfit_1.5-9.1        
[45] munsell_0.5.0          cluster_2.0.7-1        AnnotationDbi_1.44.0   bindrcpp_0.2.2        
[49] compiler_3.5.1         rlang_0.3.0.1          grid_3.5.1             RCurl_1.95-4.11       
[53] rstudioapi_0.8         htmlwidgets_1.3        bitops_1.0-6           base64enc_0.1-3       
[57] gtable_0.2.0           DBI_1.0.0              R6_2.3.0               gridExtra_2.3         
[61] knitr_1.20             dplyr_0.7.8            bit_1.1-14             bindr_0.1.1           
[65] Hmisc_4.1-1            stringi_1.2.4          Rcpp_1.0.0             geneplotter_1.60.0    
[69] rpart_4.1-13           acepack_1.4.1          tidyselect_0.2.5      

 

 

ADD REPLYlink written 11 days ago by A10

Submittign differenet replies because of word count: Build index output:

BuildIndex output:

Rsubread 1.32.1

//=========================== indexBuilder setting ===========================\

|| ||

|| Index name : reference_index ||

|| Index space : base-space ||

|| One block index : yes ||

|| Repeat threshold : 100 repeats ||

|| Distance to next subread : 1 ||

|| ||

|| Input files : 1 file in total ||

|| o AAF03372.fasta ||

|| ||

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\

|| ||

|| Check the integrity of provided reference sequences ... ||

|| No format issues were found ||

|| Scan uninformative subreads in reference sequences ... ||

|| 8%, 0 mins elapsed, rate=0.0k bps/s, total=0m ||

|| 16%, 0 mins elapsed, rate=0.1k bps/s, total=0m ||

|| 24%, 0 mins elapsed, rate=0.1k bps/s, total=0m ||

|| 33%, 0 mins elapsed, rate=0.2k bps/s, total=0m ||

|| 41%, 0 mins elapsed, rate=0.2k bps/s, total=0m ||

|| 49%, 0 mins elapsed, rate=0.3k bps/s, total=0m ||

|| 58%, 0 mins elapsed, rate=0.3k bps/s, total=0m ||

|| 66%, 0 mins elapsed, rate=0.4k bps/s, total=0m ||

|| 74%, 0 mins elapsed, rate=0.4k bps/s, total=0m ||

|| 83%, 0 mins elapsed, rate=0.5k bps/s, total=0m ||

|| 91%, 0 mins elapsed, rate=0.5k bps/s, total=0m ||

|| 99%, 0 mins elapsed, rate=0.6k bps/s, total=0m ||

|| 107%, 0 mins elapsed, rate=0.6k bps/s, total=0m ||

|| Build the index... ||

|| 199%, 0 mins elapsed, rate=612.0k bps/s, total=0m ||

|| 207%, 0 mins elapsed, rate=318.8k bps/s, total=0m ||

|| 215%, 0 mins elapsed, rate=265.2k bps/s, total=0m ||

|| 224%, 0 mins elapsed, rate=275.4k bps/s, total=0m ||

|| 232%, 0 mins elapsed, rate=238.0k bps/s, total=0m ||

|| 240%, 0 mins elapsed, rate=246.5k bps/s, total=0m ||

|| 249%, 0 mins elapsed, rate=255.0k bps/s, total=0m ||

|| 257%, 0 mins elapsed, rate=158.1k bps/s, total=0m ||

|| 265%, 0 mins elapsed, rate=136.0k bps/s, total=0m ||

|| 274%, 0 mins elapsed, rate=120.2k bps/s, total=0m ||

|| 282%, 0 mins elapsed, rate=115.6k bps/s, total=0m ||

|| 290%, 0 mins elapsed, rate=119.0k bps/s, total=0m ||

|| 299%, 0 mins elapsed, rate=114.7k bps/s, total=0m ||

|| 307%, 0 mins elapsed, rate=117.9k bps/s, total=0m ||

|| Save current index block... ||

|| [ 0.0% finished ] ||

|| [ 10.0% finished ] ||

|| [ 20.0% finished ] ||

|| [ 30.0% finished ] ||

|| [ 40.0% finished ] ||

|| [ 50.0% finished ] ||

|| [ 60.0% finished ] ||

|| [ 70.0% finished ] ||

|| [ 80.0% finished ] ||

|| [ 90.0% finished ] ||

|| [ 100.0% finished ] ||

|| ||

|| Total running time: 0.1 minutes. ||

|| Index reference_index was successfully built! ||

|| ||

 

ADD REPLYlink written 11 days ago by A10

aligh() output:

|Function : Read alignment (RNA-Seq) || || Input file 1 : lane1_020_Ki_GTCTTGGC_L001_4198STDY7571463_R1.fastq.gz || || Input file 2 : lane1_020_Ki_GTCTTGGC_L001_4198STDY7571463_R2.fastq.gz || || Output file : alignResultsPE_1.BAM (BAM) || || Index name : reference_index || || || || ------------------------------------ || || || || Threads : 1 || || Phred offset : 33 || || # of extracted subreads : 10 || || Min read1 vote : 3 || || Min read2 vote : 1 || || Max fragment size : 600 || || Min fragment size : 50 || || Max mismatches : 3 || || Max indel length : 5 || || Report multi-mapping reads : yes || || Max alignments per multi-mapping read : 1 || || || \===================== http://subread.sourceforge.net/ ======================//

//================= Running (06-Dec-2018 07:53:10, pid=3681) =================\ || || || The input file contains base space reads. || || The range of Phred scores observed in the data is [0,37] || || Load the 1-th index block... || || 0% completed, 0.0 mins elapsed, rate=36.6k fragments per second || || 6% completed, 0.6 mins elapsed, rate=57.5k fragments per second || ...... 99% completed, 10 mins elapsed, rate=49.3k fragments per second || || || || Completed successfully. || || || \============================================================================//

//================================= Summary ==================================\ || || || Total fragments : 30,661,522 || || Mapped : 0 (0.0%) || || Uniquely mapped : 0 || || Multi-mapping : 0 || || || || Unmapped : 30,661,522 || || || || Correctly paired : 0 || || Not mapped in pairs : 0 || || Only one end mapped : 0 || || Multi-chromosomes : 0 || || Different strands : 0 || || Not in PE distance : 0 || || Abnormal order : 0 || || || || Indels : 0 || || || || Running time : 10.4 minutes ||

ADD REPLYlink modified 11 days ago • written 11 days ago by A10

apologies, align() output pasted correctly but posted as above, hope thats ok to read?

>ENA|AAF03373|AAF03373.1 Zoanthus sp. fluorescent protein FP538
ATGGCTCATTCAAAGCACGGTCTAAAAGAAGAAATGACAATGAAATACCACATGGAAGGG
TGCGTCAACGGACATAAATTTGTGATCACGGGCGAAGGCATTGGATATCCGTTCAAAGGG
AAACAGACTATTAATCTGTGTGTGATCGAAGGGGGACCATTGCCATTTTCCGAAGACATA
TTGTCAGCTGGCTTTAAGTACGGAGACAGGATTTTCACTGAATATCCTCAAGACATAGTA
GACTATTTCAAGAACTCGTGTCCTGCTGGATATACATGGGGCAGGTCTTTTCTCTTTGAG
GATGGAGCAGTCTGCATATGCAATGTAGATATAACAGTGAGTGTCAAAGAAAACTGCATT
TATCATAAGAGCATATTTAATGGAATGAATTTTCCTGCTGATGGACCTGTGATGAAAAAG
ATGACAACTAACTGGGAAGCATCCTGCGAGAAGATCATGCCAGTACCTAAGCAGGGGATA
CTGAAAGGGGATGTCTCCATGTACCTCCTTCTGAAGGATGGTGGGCGTTACCGGTGCCAG
TTCGACACAGTTTACAAAGCAAAGTCTGTGCCAAGTAAGATGCCGGAGTGGCACTTCATC
CAGCATAAGCTCCTCCGTGAAGACCGCAGCGATGCTAAGAATCAGAAGTGGCAGCTGACA
GAGCATGCTATTGCATTCCCTTCTGCCTTGGCCTGA

sequences as above, that is one of the sequences.. other sequences similar in length and I have also tried the complete CDS... Thank you!

ADD REPLYlink written 11 days ago by A10

You commands look fine. I have no idea why you got no mapped reads at all. You may try to Blast some reads to all nucleotide databases to see if you could get any hits to other species:

https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome

 

 

ADD REPLYlink written 7 days ago by Wei Shi2.9k

Ok thank you! Yes I am trying to manually blast the sequence to see if its there!

ADD REPLYlink written 7 days ago by A10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 410 users visited in the last hour