Question

how do i read a .seq file?

0

Entering edit mode

Luis • 0

@7dbba98c

Last seen 3.2 years ago

Brazil

Hello, I am trying to read a multiples .seq file resulting from sequencing to then find ORFs and translate and align them. I don't know how to read the .seq file, I tried read.table() but it imports it in multiple rows however the entire file contains a single sequence. thanks for your help

Bioconductor • 3.0k views

ADD COMMENT • link 3.2 years ago Luis • 0

score 0 · Answer 1 · 2022-10-19

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 10 days ago

United States

It's probably a FASTA file, so you can use readDNAStringSet from Biostrings.

ADD COMMENT • link 3.2 years ago James W. MacDonald 68k

0

Entering edit mode

I already tried to import it as fasta, but it gives me that error:

Error in .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, : reading FASTA file Luis1_001_E06.seq: ">" expected at beginning of line 1

the file contains only the bases, as follows: GTCCCCGCCGAAATTATTACGACTCACTATAGGGGATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAA CTTTAAGAAGGAGATATACCATGGGCAGCAGCCATCACCATCATCACCACAGCCAGGATCCAATCGTGCGCCGCAAGCTG

I could manually add a ">NAME" But I have many files and I don't know how to do that with so many files, manually it would take a lot of time

ADD REPLY • link 3.2 years ago Luis • 0

0

Entering edit mode

So here's a contrived example

> mapper <- c("A","C","T","G")
> makeSeq <- function(mapper, n) paste(mapper[sample(1:4, n, TRUE)], collapse = "")

## make a fake .seq file
> notFASTA <- sapply(5:20, function(x) makeSeq(mapper, x))
## and write it out
> cat(notFASTA, file = "notFASTA.seq", sep = "\n")
## now here's what you can do. First read it in
> z <- scan("notFASTA.seq", "c")
Read 16 items
## and convert to a DNAStringSet
> zz <- DNAStringSet(z)
> zz
DNAStringSet object of length 16:
     width seq
 [1]     5 ATCCT
 [2]     6 GTCACT
 [3]     7 CCTCCAA
 [4]     8 GGGAGCAT
 [5]     9 TGTGGATAA
 ...   ... ...
[12]    16 CGTTGACCTAAGAGAA
[13]    17 AACTGCGTAGTTCGGAG
[14]    18 CATCGCCGCGCTCGCCAT
[15]    19 GCAGGCAGTAGGGTGTGGA
[16]    20 GATGCTTAGCTAAGCTGAAC
>

ADD REPLY • link 3.2 years ago James W. MacDonald 68k

0

Entering edit mode

Hello, thank you very much, I did not know the function scan() Before your answer I had found an option to import the file: F1 <- readLines("file.seq") F1 <- paste(F1, collapse = "") F1 <- DNAStringSet(F1) F1 The result was this:

[1] 1267 GTCCCCGCCGAAATTATTACGACTCACTATAGGGGAT...GCGCTTCTTTCTTGTGGTATAGGCATAATGATGATGC

With the code you shared, I got this result:

DNAStringSet object of length 16: width seq

[1] 80 GTCCCCGCCGAAATTATTACGACTCACTATAGGGGAT...GGATAACAATTCCCCTCTAGAAATAATTTTGTTTAA

[2] 80 CTTTAAGAAGGAGATATACCATGGGCAGCAGCCATCA...CACCACAGCCAGGATCCAATCGTGCGCCGCAAGCTG

[3] 80 ACCGGTTATGTTGGCTTCGCTAACCTGCCGAACCAGT...CAAATCCGTGCGCAAAGGTTTCAATTTCAACGTCAT

... ... ...

[16] 67 TGGCGAGGGCGGGAATTAAACAATAGTTTTGCGCTTCTTTCTTGTGGTATAGGCATAATGATGATGC

So I think that way it reads each line as if it were a different sequence, I guess it's because the file contains line breaks. Then I modified the code you shared with me like this:

z <- scan("notFASTA.seq", "c") zz<- paste(zz, collapse = "") zz <- DNAStringSet(z) zz

Thank you very much for your help, a big hug!

*Excuse me, one more question, what would be the difference between scan() and readLines()

ADD REPLY • link 3.2 years ago Luis • 0