Working with Arabidopsis Databases
2
1
Entering edit mode
oliwindram ▴ 10
@oliwindram-8536
Last seen 5.3 years ago
United Kingdom

Hi 

I have been trying to follow this tut (http://www.bioconductor.org/help/workflows/gene-regulation-tfbs/#sequence-search) and apply it to working with some gene in Arabidopsis.

I try this:

library(MotifDb)
library(seqLogo)
library(motifStack)
library(Biostrings)
library(GenomicFeatures)
library(org.At.tair.db)
library(BSgenome.Athaliana.TAIR.TAIR9)
library(TxDb.Athaliana.BioMart.plantsmart25)

myb108 <- "AT3G06490"
chrommyb108.loc <- transcriptsBy(TxDb.Athaliana.BioMart.plantsmart25, by="gene") [myb108]
promoter.myb108 <- getPromoterSeq(chrommyb108.loc , Athaliana, upstream=1000, downstream=0)

Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width,  : 
  sequence 3 not found
In addition: Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)

So I had a look at the seqinfo for a few objects namelyTxDb.Athaliana.BioMart.plantsmart25 and Athaliana and I noticed that the chromosome names differ along with the isCircular statement for the last chromosome:

> seqinfo(TxDb.Athaliana.BioMart.plantsmart25)
Seqinfo object with 7 sequences (1 circular) from an unspecified genome:
  seqnames seqlengths isCircular genome
  1          30427671      FALSE   <NA>
  2          19698289      FALSE   <NA>
  3          23459830      FALSE   <NA>
  4          18585056      FALSE   <NA>
  5          26975502      FALSE   <NA>
  Mt           366924       TRUE   <NA>
  Pt           154478      FALSE   <NA>
> seqinfo(Athaliana)
Seqinfo object with 7 sequences (2 circular) from TAIR9 genome:
  seqnames seqlengths isCircular genome
  Chr1       30427671      FALSE  TAIR9
  Chr2       19698289      FALSE  TAIR9
  Chr3       23459830      FALSE  TAIR9
  Chr4       18585056      FALSE  TAIR9
  Chr5       26975502      FALSE  TAIR9
  ChrM         366924       TRUE  TAIR9
  ChrC         154478       TRUE  TAIR9

I wondered if I am perhaps using the wrong TxDb? Are there any others I should be using?

Anyway I could get this to work?

 

bsgenome biomart getPromoterSeq txdb • 1.7k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 14 hours ago
Seattle, WA, United States

Hi,

You're using the right TxDb. It's just that the data in TxDb.Athaliana.BioMart.plantsmart25 is coming from the Ensembl Plants database and the Ensembl Plants folks use different chromosome names than the TAIR people. Normally switching from one chromosome naming convention to the other should be easy, it's just a matter of doing something like:

seqlevelsStyle(TxDb.Athaliana.BioMart.plantsmart25) <- seqlevelsStyle(BSgenome.Athaliana.TAIR.TAIR9)

However this doesn't seem to work properly here so we'll need to look into it a little bit more. In the meantime you can do:

seqlevels(TxDb.Athaliana.BioMart.plantsmart25) <- seqlevels(BSgenome.Athaliana.TAIR.TAIR9)

instead, which is usually not the right way of doing it but it happens to work fine here.

Cheers,

H.

ADD COMMENT
0
Entering edit mode
oliwindram ▴ 10
@oliwindram-8536
Last seen 5.3 years ago
United Kingdom

 

Hi Hervé,

Thanks for your comment,  this does help to a degree it changes the seqnames. However the isCircular statement for ChrC which I imagine is the plastid chromosome still differs between the two? I think this is what is causing the error I am getting now which is:

> promoter.myb108 <- getPromoterSeq(chrommyb108.loc , Athaliana, upstream=1000, downstream=0)
Error in mergeNamedAtomicVectors(isCircular(x), isCircular(y), what = c("sequence",  : 
  sequence ChrC has incompatible circularity flags:
  - in 'x': TRUE
  - in 'y': FALSE

I tried a few commands like these:

> isCircular(TxDb.Athaliana.BioMart.plantsmart25) <- isCircular(BSgenome.Athaliana.TAIR.TAIR9)
Error in .seqinfo.TxDbReplace(x, new2old = new2old, force = force, value) : 
  'new2old' must be specified when replacing the 'seqinfo' of a TxDb object

> seqinfo(TxDb.Athaliana.BioMart.plantsmart25) <- seqinfo(BSgenome.Athaliana.TAIR.TAIR9)
Error in .seqinfo.TxDbReplace(x, new2old = new2old, force = force, value) : 
  'new2old' must be specified when replacing the 'seqinfo' of a TxDb object

but they throw the above errors, also I am also unsure what structure the new2old argument should have?

Any more suggestions are welcome.

 

ADD COMMENT
0
Entering edit mode

Hi,

Unfortunately the isCircular() setter is currently broken on TxDb objects so I'm also going to look at this one. In the meantime, the workaround is to use it on any GRanges or GRangesList object you extract from the TxDb object (e.g. chrommyb108.loc) before you do anything with the object (like calling getPromoterSeq()).

Cheers,

H.

ADD REPLY
0
Entering edit mode

This seems to have solved the problem:

seqlevels(TxDb.Athaliana.BioMart.plantsmart25) <- seqlevels(BSgenome.Athaliana.TAIR.TAIR9)
chrommyb108.loc <- transcriptsBy(TxDb.Athaliana.BioMart.plantsmart25, by="gene") [myb108]
isCircular(chrommyb108.loc) <- isCircular(BSgenome.Athaliana.TAIR.TAIR9)
promoter.myb108 <- getPromoterSeq(chrommyb108.loc, Athaliana, upstream=1000, downstream=0)

 

ADD REPLY
0
Entering edit mode

Hi oliwindram,

Just to let you know that in BioC 3.2 (released 2 days ago),

seqlevelsStyle(TxDb.Athaliana.BioMart.plantsmart25) <- seqlevelsStyle(BSgenome.Athaliana.TAIR.TAIR9)

works and does the right thing.

Modifying the sequence circularity flags of a TxDb object with e.g.

isCircular(TxDb.Athaliana.BioMart.plantsmart25) <- isCircular(BSgenome.Athaliana.TAIR.TAIR9)

is still not supported though but will be soon.

Cheers,

H.

ADD REPLY
0
Entering edit mode

I meant seqlevelsStyle instead of seqlevels in the above code. I edited the code.

H.

ADD REPLY

Login before adding your answer.

Traffic: 177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6