how do i read a .seq file?
1
0
Entering edit mode
Luis • 0
@7dbba98c
Last seen 2.1 years ago
Brazil

Hello, I am trying to read a multiples .seq file resulting from sequencing to then find ORFs and translate and align them. I don't know how to read the .seq file, I tried read.table() but it imports it in multiple rows however the entire file contains a single sequence. thanks for your help

Bioconductor • 1.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

It's probably a FASTA file, so you can use readDNAStringSet from Biostrings.

ADD COMMENT
0
Entering edit mode

I already tried to import it as fasta, but it gives me that error:

Error in .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, : reading FASTA file Luis1_001_E06.seq: ">" expected at beginning of line 1

the file contains only the bases, as follows: GTCCCCGCCGAAATTATTACGACTCACTATAGGGGATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAA CTTTAAGAAGGAGATATACCATGGGCAGCAGCCATCACCATCATCACCACAGCCAGGATCCAATCGTGCGCCGCAAGCTG

I could manually add a ">NAME" But I have many files and I don't know how to do that with so many files, manually it would take a lot of time

ADD REPLY
0
Entering edit mode

So here's a contrived example

> mapper <- c("A","C","T","G")
> makeSeq <- function(mapper, n) paste(mapper[sample(1:4, n, TRUE)], collapse = "")

## make a fake .seq file
> notFASTA <- sapply(5:20, function(x) makeSeq(mapper, x))
## and write it out
> cat(notFASTA, file = "notFASTA.seq", sep = "\n")
## now here's what you can do. First read it in
> z <- scan("notFASTA.seq", "c")
Read 16 items
## and convert to a DNAStringSet
> zz <- DNAStringSet(z)
> zz
DNAStringSet object of length 16:
     width seq
 [1]     5 ATCCT
 [2]     6 GTCACT
 [3]     7 CCTCCAA
 [4]     8 GGGAGCAT
 [5]     9 TGTGGATAA
 ...   ... ...
[12]    16 CGTTGACCTAAGAGAA
[13]    17 AACTGCGTAGTTCGGAG
[14]    18 CATCGCCGCGCTCGCCAT
[15]    19 GCAGGCAGTAGGGTGTGGA
[16]    20 GATGCTTAGCTAAGCTGAAC
>
ADD REPLY
0
Entering edit mode

Hello, thank you very much, I did not know the function scan() Before your answer I had found an option to import the file: F1 <- readLines("file.seq") F1 <- paste(F1, collapse = "") F1 <- DNAStringSet(F1) F1 The result was this:

[1] 1267 GTCCCCGCCGAAATTATTACGACTCACTATAGGGGAT...GCGCTTCTTTCTTGTGGTATAGGCATAATGATGATGC

With the code you shared, I got this result:

DNAStringSet object of length 16: width seq

[1] 80 GTCCCCGCCGAAATTATTACGACTCACTATAGGGGAT...GGATAACAATTCCCCTCTAGAAATAATTTTGTTTAA

[2] 80 CTTTAAGAAGGAGATATACCATGGGCAGCAGCCATCA...CACCACAGCCAGGATCCAATCGTGCGCCGCAAGCTG

[3] 80 ACCGGTTATGTTGGCTTCGCTAACCTGCCGAACCAGT...CAAATCCGTGCGCAAAGGTTTCAATTTCAACGTCAT

... ... ...

[16] 67 TGGCGAGGGCGGGAATTAAACAATAGTTTTGCGCTTCTTTCTTGTGGTATAGGCATAATGATGATGC

So I think that way it reads each line as if it were a different sequence, I guess it's because the file contains line breaks. Then I modified the code you shared with me like this:

z <- scan("notFASTA.seq", "c") zz<- paste(zz, collapse = "") zz <- DNAStringSet(z) zz

Thank you very much for your help, a big hug!

*Excuse me, one more question, what would be the difference between scan() and readLines()

ADD REPLY

Login before adding your answer.

Traffic: 784 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6