how to get pig genome and for miRNA target scan

0

Entering edit mode

wang peter ★ 2.0k

@wang-peter-4647

Last seen 9.6 years ago

dear all: i have some miRNA sequences like this: AAAUCUCUGCAGGCAAAUGUGA AACAUUCAACCUGUCGGUGAGU AACAUUCAACCUGUCGGUGAGUU AACAUUCAACCUGUCGGUGAGUUU AACCACACAACCUACUACCUCA first, i want to download pig genome(it should include 3'UTR) second, i want to use some R packge to do miRNA target scan who have some pipeline or coding please help me thank you in advances -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253

miRNA miRNA • 1.8k views

ADD COMMENT • link 12.1 years ago wang peter ★ 2.0k

0

Entering edit mode

wang peter ★ 2.0k

@wang-peter-4647

Last seen 9.6 years ago

dear Dr Jos? Afonso thank u for your reply, but i just found seq = getSequence(id="BRCA1", type="hugo", seqType="peptide", mart = mart) getSequence can be used to get one special sequence if i want to get all of 3'UTR from pig genome which function should i use? thank u very much shan

ADD COMMENT • link 12.1 years ago wang peter ★ 2.0k

0

Entering edit mode

2012/3/14 wang peter <wng.peter at="" gmail.com="">: > dear Dr Jos? Afonso > thank u for your reply, but i just found > seq = getSequence(id="BRCA1", type="hugo", seqType="peptide", mart = mart) > getSequence can be used to get one special sequence > if i want to get all of 3'UTR from pig genome > > > which function should i use? I think you have answered your own question. Please read the help page for getSequence. You can specify seqType='3utr' and give the ids for your genes of interest after connecting to the correct mart. Sean

ADD REPLY • link 12.1 years ago Sean Davis 21k

0

Entering edit mode

Hi guys, I try to use ComBat function in SVA package to correct batch effect of my gene expression dataset, but I keep getting error messages. I think my problem may be that the model.matrix is not set correctly. I compiled several breast cancer datasets together and ended up with a big ExpressionSet. The pheno data is like this: Batch Mol_type ER PR Her2 TN Histologyarr01.cel GSE01 Basal ER- PR- Her2- Yes Breast_tumor_NOSarr02.cel GSE02 LumA ER+ NA NA No Breast_tumor_NOS............................................... ...................................................................... ...................................................................... .................................................. There are many NAs. The only columns that do not have a single NA are Batch and Histology columns. I tried to set model.matrix in three ways and all got errors. > pheno <- pData(MyBreastExpressionSet) > edata <- exprs(MyBreastExpressionSet) # 1. use two variables of interest: Mol_type and Histology > mod01 <- model.matrix(~as.factor(pheno$Mol_type)+as.factor(pheno$Histology)) > batch <- pheno$Batch > combat_edata <- ComBat(dat=edata,batch=batch,mod=mod01,numCovs=NULL,par.prior=TRUE) Found 21 batches Found 48 categorical covariate(s) Error in tmp[, i] <- vec == levels(vec)[i + start - 1] : subscript out of bounds In addition: Warning message: In cbind(mod, batch) : number of rows of result is not a multiple of vector length (arg 2) # 2. use one variable of interest: Mol_type > mod02 <- model.matrix(~as.factor(pheno$Mol_type)) > combat_edata <- ComBat(dat=edata,batch=batch,mod=mod02,numCovs=NULL,par.prior=TRUE) Found 21 batches Found 8 categorical covariate(s) Standardizing Data across genes Error in solve(t(design) %*% design) %*% t(design) %*% t(as.matrix(dat)) : non-conformable arguments In addition: Warning message: In cbind(mod, batch) : number of rows of result is not a multiple of vector length (arg 2) # 3. use one variable of interest: Histology > mod03 <- model.matrix(~as.factor(pheno$Histology)) > combat_edata <- ComBat(dat=edata,batch=batch,mod=mod03,numCovs=NULL,par.prior=TRUE) Found 47 batches Found 40 categorical covariate(s) Standardizing Data across genes Error in solve.default(t(design) %*% design) : Lapack routine dgesv: system is exactly singular Only during the last try the program found the correct number of batches (47). I think it may because of the NAs in Mol_type column. But I am not sure. What did I do wrong? Any suggestion is highly appreciated. Thanks, Ying > sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] pamr_1.54 survival_2.36-12 cluster_1.14.2 sva_3.0.3 [5] mgcv_1.7-13 corpcor_1.6.2 Biobase_2.14.0 loaded via a namespace (and not attached): [1] grid_2.14.2 lattice_0.20-6 Matrix_1.0-4 nlme_3.1-103 > [[alternative HTML version deleted]]

ADD REPLY • link 12.1 years ago ying chen ▴ 340

0

Entering edit mode

Hi Ying, The problem is that model.matrix uses only the complete cases when building the model matrix. So any of the samples that have an NA for even one variable are removed from that model matrix. Since there are no NAs for the batch variable, none of the variables are removed. whereas the batch variable has no NA values so none of the samples are removed. One way to fix this problem is to choose a subset of variables for which you have a set of samples with no NAs and run ComBat or SVA on only those samples. An alternative is to remove the batch variable using a linear model (see the section in the SVA vignette for example on how to do this). Then analyze the residuals after subtracting out the batch effect. You should exercise caution when using this approach though, since you will be analyzing a set of data that you have already done one regression on - this reduces variability in the expression values and can lead to overestimated significance. Best, Jeff On Thu, Mar 15, 2012 at 3:24 AM, ying chen <ying_chen@live.com> wrote: > > Hi guys, > I try to use ComBat function in SVA package to correct batch effect of my > gene expression dataset, but I keep getting error messages. I think my > problem may be that the model.matrix is not set correctly. > I compiled several breast cancer datasets together and ended up with a big > ExpressionSet. > The pheno data is like this: > Batch Mol_type ER PR Her2 TN > Histologyarr01.cel GSE01 Basal ER- PR- Her2- Yes > Breast_tumor_NOSarr02.cel GSE02 LumA ER+ NA NA No > > Breast_tumor_NOS.................................................... ...................................................................... ...................................................................... ............................................. > There are many NAs. The only columns that do not have a single NA are > Batch and Histology columns. > I tried to set model.matrix in three ways and all got errors. > > > > > > > > pheno <- > pData(MyBreastExpressionSet) > > > edata <- > exprs(MyBreastExpressionSet) > > # 1. use two variables of interest: Mol_type and Histology > > mod01 <- > model.matrix(~as.factor(pheno$Mol_type)+as.factor(pheno$Histology)) > > batch <- pheno$Batch > > > combat_edata <- > ComBat(dat=edata,batch=batch,mod=mod01,numCovs=NULL,par.prior=TRUE) > > Found 21 batches > > Found 48 categorical covariate(s) > > Error in tmp[, i] <- vec == > levels(vec)[i + start - 1] : > > subscript out of bounds > > In addition: Warning message: > > In cbind(mod, batch) : > > number of rows of result is not a > multiple of vector length (arg 2) > > # 2. use one variable of interest: Mol_type > > > > > > > mod02 <- > model.matrix(~as.factor(pheno$Mol_type)) > > > combat_edata <- > ComBat(dat=edata,batch=batch,mod=mod02,numCovs=NULL,par.prior=TRUE) > > Found 21 batches > > Found 8 categorical covariate(s) > > Standardizing Data across genes > > Error in solve(t(design) %*% design) > %*% t(design) %*% t(as.matrix(dat)) : > > non-conformable arguments > > In addition: Warning message: > > In cbind(mod, batch) : > > number of rows of result is not a > multiple of vector length (arg 2) > > # 3. use one variable of interest: Histology > > > > > > > mod03 <- > model.matrix(~as.factor(pheno$Histology)) > > combat_edata <- > ComBat(dat=edata,batch=batch,mod=mod03,numCovs=NULL,par.prior=TRUE) > > Found 47 batches > > Found 40 categorical covariate(s) > > Standardizing Data across genes > > Error in solve.default(t(design) %*% > design) : > > Lapack routine dgesv: system is > exactly singular > Only during the last try the program found the correct number of batches > (47). I think it may because of the NAs in Mol_type column. But I am not > sure. > What did I do wrong? > Any suggestion is highly appreciated. > Thanks, > > Ying > > > > > > > > > sessionInfo() > > R version 2.14.2 (2012-02-29) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 > LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 > LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 > LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C > LC_NAME=C > > [9] LC_ADDRESS=C > LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 > LC_IDENTIFICATION=C > > > > attached base packages: > > [1] splines stats graphics > grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] pamr_1.54 survival_2.36-12 > cluster_1.14.2 sva_3.0.3 > > [5] mgcv_1.7-13 corpcor_1.6.2 > Biobase_2.14.0 > > > > loaded via a namespace (and not > attached): > > [1] grid_2.14.2 lattice_0.20-6 > Matrix_1.0-4 nlme_3.1-103 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 12.1 years ago Jeff Leek ▴ 650

0

Entering edit mode

Hi Jeff, Thanks a lot for the help! I realized the NA problem so I made a simple pheno table that has no NAs. Batch Histology array1.cel GSE01 IDC array2.cel GSE03 Mixed ........ ........ I thought this is simple, but I still got errors: > combat_edata <- ComBat(dat=edata,batch=batch,mod=mod,numCovs=NULL,par.prior=TRUE) Found 47 batches Found 37 categorical covariate(s) Standardizing Data across genes Error in solve.default(t(design) %*% design) : Lapack routine dgesv: system is exactly singular > I compared my pheno to the pheno file in your bladderbatch dataset. I think they are very similar except the bladderbatch pheno has one extra column for sample name/count. I really do not know what this error message means. For my pheno file, $Batch has 47 levels and $Histology has 38 levels. I am not sure if it has anything to do with the error. Thanks, Ying > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-mingw32/x64 (64-bit)locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] pamr_1.54 survival_2.36-12 cluster_1.14.2 BiocInstaller_1.2.1 [5] sva_3.0.2 mgcv_1.7-13 corpcor_1.6.2 Biobase_2.14.0 From: jtleek@gmail.com Date: Thu, 15 Mar 2012 08:05:19 -0400 Subject: Re: [BioC] how to set model.matrix for SVA/ComBat To: ying_chen@live.com CC: bioconductor@r-project.org Hi Ying, The problem is that model.matrix uses only the complete cases when building the model matrix. So any of the samples that have an NA for even one variable are removed from that model matrix. Since there are no NAs for the batch variable, none of the variables are removed. whereas the batch variable has no NA values so none of the samples are removed. One way to fix this problem is to choose a subset of variables for which you have a set of samples with no NAs and run ComBat or SVA on only those samples. An alternative is to remove the batch variable using a linear model (see the section in the SVA vignette for example on how to do this). Then analyze the residuals after subtracting out the batch effect. You should exercise caution when using this approach though, since you will be analyzing a set of data that you have already done one regression on - this reduces variability in the expression values and can lead to overestimated significance. Best, Jeff On Thu, Mar 15, 2012 at 3:24 AM, ying chen <ying_chen@live.com> wrote: Hi guys, I try to use ComBat function in SVA package to correct batch effect of my gene expression dataset, but I keep getting error messages. I think my problem may be that the model.matrix is not set correctly. I compiled several breast cancer datasets together and ended up with a big ExpressionSet. The pheno data is like this: Batch Mol_type ER PR Her2 TN Histologyarr01.cel GSE01 Basal ER- PR- Her2- Yes Breast_tumor_NOSarr02.cel GSE02 LumA ER+ NA NA No Breast_tumor_NOS............................................... ...................................................................... ...................................................................... .................................................. There are many NAs. The only columns that do not have a single NA are Batch and Histology columns. I tried to set model.matrix in three ways and all got errors. > pheno <- pData(MyBreastExpressionSet) > edata <- exprs(MyBreastExpressionSet) # 1. use two variables of interest: Mol_type and Histology > mod01 <- model.matrix(~as.factor(pheno$Mol_type)+as.factor(pheno$Histology)) > batch <- pheno$Batch > combat_edata <- ComBat(dat=edata,batch=batch,mod=mod01,numCovs=NULL,par.prior=TRUE) Found 21 batches Found 48 categorical covariate(s) Error in tmp[, i] <- vec == levels(vec)[i + start - 1] : subscript out of bounds In addition: Warning message: In cbind(mod, batch) : number of rows of result is not a multiple of vector length (arg 2) # 2. use one variable of interest: Mol_type > mod02 <- model.matrix(~as.factor(pheno$Mol_type)) > combat_edata <- ComBat(dat=edata,batch=batch,mod=mod02,numCovs=NULL,par.prior=TRUE) Found 21 batches Found 8 categorical covariate(s) Standardizing Data across genes Error in solve(t(design) %*% design) %*% t(design) %*% t(as.matrix(dat)) : non-conformable arguments In addition: Warning message: In cbind(mod, batch) : number of rows of result is not a multiple of vector length (arg 2) # 3. use one variable of interest: Histology > mod03 <- model.matrix(~as.factor(pheno$Histology)) > combat_edata <- ComBat(dat=edata,batch=batch,mod=mod03,numCovs=NULL,par.prior=TRUE) Found 47 batches Found 40 categorical covariate(s) Standardizing Data across genes Error in solve.default(t(design) %*% design) : Lapack routine dgesv: system is exactly singular Only during the last try the program found the correct number of batches (47). I think it may because of the NAs in Mol_type column. But I am not sure. What did I do wrong? Any suggestion is highly appreciated. Thanks, Ying > sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] pamr_1.54 survival_2.36-12 cluster_1.14.2 sva_3.0.3 [5] mgcv_1.7-13 corpcor_1.6.2 Biobase_2.14.0 loaded via a namespace (and not attached): [1] grid_2.14.2 lattice_0.20-6 Matrix_1.0-4 nlme_3.1-103 > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 12.1 years ago ying chen ▴ 340

0

Entering edit mode

Ying, I'm not sure how many samples you have, but with 47 batches and 37 categorical covariates, one problem could be that you have more variables than samples. This would lead to the error you are reporting. You may need to choose a smaller number of variables to include in your analysis. An alternative would be to only include the variable you will be testing and perform sva instead of ComBat. SVA is designed to identify surrogates for the most important covariates and adjust for them. Best, Jeff On Thu, Mar 15, 2012 at 10:36 AM, ying chen <ying_chen@live.com> wrote: > Hi Jeff, > > Thanks a lot for the help! > > I realized the NA problem so I made a simple pheno table that has no NAs. > > Batch Histology > array1.cel GSE01 IDC > array2.cel GSE03 Mixed > ........ > ........ > > I thought this is simple, but I still got errors: > > > combat_edata <- > ComBat(dat=edata,batch=batch,mod=mod,numCovs=NULL,par.prior=TRUE) > Found 47 batches > Found 37 categorical covariate(s) > > Standardizing Data across genes > Error in solve.default(t(design) %*% design) : > Lapack routine dgesv: system is exactly singular > > > > I compared my pheno to the pheno file in your bladderbatch dataset. I > think they are very similar except the bladderbatch pheno has one extra > column for sample name/count. I really do not know what this error message > means. For my pheno file, $Batch has 47 levels and $Histology has 38 > levels. I am not sure if it has anything to do with the error. > > Thanks, > > Ying > > > sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-pc-mingw32/x64 (64-bit) > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 > LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > base > other attached packages: > [1] pamr_1.54 survival_2.36-12 cluster_1.14.2 > BiocInstaller_1.2.1 > [5] sva_3.0.2 mgcv_1.7-13 corpcor_1.6.2 > Biobase_2.14.0 > > ------------------------------ > From: jtleek@gmail.com > Date: Thu, 15 Mar 2012 08:05:19 -0400 > Subject: Re: [BioC] how to set model.matrix for SVA/ComBat > To: ying_chen@live.com > CC: bioconductor@r-project.org > > > Hi Ying, > > The problem is that model.matrix uses only the complete cases when > building the model matrix. So any of the samples that have an NA for even > one variable are removed from that model matrix. Since there are no NAs for > the batch variable, none of the variables are removed. whereas the batch > variable has no NA values so none of the samples are removed. > > One way to fix this problem is to choose a subset of variables for which > you have a set of samples with no NAs and run ComBat or SVA on only those > samples. > > An alternative is to remove the batch variable using a linear model (see > the section in the SVA vignette for example on how to do this). Then > analyze the residuals after subtracting out the batch effect. You should > exercise caution when using this approach though, since you will be > analyzing a set of data that you have already done one regression on - this > reduces variability in the expression values and can lead to overestimated > significance. > > Best, > > Jeff > > > > On Thu, Mar 15, 2012 at 3:24 AM, ying chen <ying_chen@live.com> wrote: > > > Hi guys, > I try to use ComBat function in SVA package to correct batch effect of my > gene expression dataset, but I keep getting error messages. I think my > problem may be that the model.matrix is not set correctly. > I compiled several breast cancer datasets together and ended up with a big > ExpressionSet. > The pheno data is like this: > Batch Mol_type ER PR Her2 TN > Histologyarr01.cel GSE01 Basal ER- PR- Her2- Yes > Breast_tumor_NOSarr02.cel GSE02 LumA ER+ NA NA No > > Breast_tumor_NOS.................................................... ...................................................................... ...................................................................... ............................................. > There are many NAs. The only columns that do not have a single NA are > Batch and Histology columns. > I tried to set model.matrix in three ways and all got errors. > > > > > > > > pheno <- > pData(MyBreastExpressionSet) > > > edata <- > exprs(MyBreastExpressionSet) > > # 1. use two variables of interest: Mol_type and Histology > > mod01 <- > model.matrix(~as.factor(pheno$Mol_type)+as.factor(pheno$Histology)) > > batch <- pheno$Batch > > > combat_edata <- > ComBat(dat=edata,batch=batch,mod=mod01,numCovs=NULL,par.prior=TRUE) > > Found 21 batches > > Found 48 categorical covariate(s) > > Error in tmp[, i] <- vec == > levels(vec)[i + start - 1] : > > subscript out of bounds > > In addition: Warning message: > > In cbind(mod, batch) : > > number of rows of result is not a > multiple of vector length (arg 2) > > # 2. use one variable of interest: Mol_type > > > > > > > mod02 <- > model.matrix(~as.factor(pheno$Mol_type)) > > > combat_edata <- > ComBat(dat=edata,batch=batch,mod=mod02,numCovs=NULL,par.prior=TRUE) > > Found 21 batches > > Found 8 categorical covariate(s) > > Standardizing Data across genes > > Error in solve(t(design) %*% design) > %*% t(design) %*% t(as.matrix(dat)) : > > non-conformable arguments > > In addition: Warning message: > > In cbind(mod, batch) : > > number of rows of result is not a > multiple of vector length (arg 2) > > # 3. use one variable of interest: Histology > > > > > > > mod03 <- > model.matrix(~as.factor(pheno$Histology)) > > combat_edata <- > ComBat(dat=edata,batch=batch,mod=mod03,numCovs=NULL,par.prior=TRUE) > > Found 47 batches > > Found 40 categorical covariate(s) > > Standardizing Data across genes > > Error in solve.default(t(design) %*% > design) : > > Lapack routine dgesv: system is > exactly singular > Only during the last try the program found the correct number of batches > (47). I think it may because of the NAs in Mol_type column. But I am not > sure. > What did I do wrong? > Any suggestion is highly appreciated. > Thanks, > > Ying > > > > > > > > > sessionInfo() > > R version 2.14.2 (2012-02-29) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 > LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 > LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 > LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C > LC_NAME=C > > [9] LC_ADDRESS=C > LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 > LC_IDENTIFICATION=C > > > > attached base packages: > > [1] splines stats graphics > grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] pamr_1.54 survival_2.36-12 > cluster_1.14.2 sva_3.0.3 > > [5] mgcv_1.7-13 corpcor_1.6.2 > Biobase_2.14.0 > > > > loaded via a namespace (and not > attached): > > [1] grid_2.14.2 lattice_0.20-6 > Matrix_1.0-4 nlme_3.1-103 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]

ADD REPLY • link 12.1 years ago Jeff Leek ▴ 650

0

Entering edit mode

dear sean? The 3' UTR data can be retrieved but some are missing, i donot know why dataset1 <-useDataset(dataset="sscrofa_gene_ensembl",mart=mart) attributes <- listAttributes(dataset1) seq = getSequence(chromosome=1,start=1, end=50000, type="ensembl_gene_id", seqType="3utr", mart = dataset1) "3utr" "ensembl_gene_id" "1" "Sequence unavailable" "ENSSSCG00000004013" "2" "Sequence unavailable" "ENSSSCG00000004017" "3" "CTTGTTCCACGTACCAAAGAGTGCTTGCTTCTCACCTTTTCAGAAACACACGCCATTATAT TATAGAGTTGGAAACATCACA" "ENSSSCG00000004009" "4" "CCTGAGCCGGGGACTGTGACAGGCTCCTCCTCCTGCCCGTGGCTGGCTGTACCTACGCCCC CTCCTGCTGGTGGCCTTCCTGCTCCCTTCAAGGGTCCCCCTTATTGCGAAAGGAGAAATGAATGGCATCC GGGACTTCTGCACAGAATTTGGTAAAGTTAAAATAAAGAAAAAAAAACAA" "ENSSSCG00000004008" "5" "GGT" "ENSSSCG00000004019" "6" "Sequence unavailable" "ENSSSCG00000004014" "7" "Sequence unavailable" "ENSSSCG00000004018" "8" "Sequence unavailable" "ENSSSCG00000004012" "9" "Sequence unavailable" "ENSSSCG00000004015" "10" "Sequence unavailable" "ENSSSCG00000004007" -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253

ADD REPLY • link 12.1 years ago wang peter ★ 2.0k

0

Entering edit mode

Not all transcripts have defined UTRs, this is normal. Regards, A. On Thu, Mar 15, 2012 at 12:59, wang peter <wng.peter at="" gmail.com=""> wrote: > dear sean? > > The 3' UTR data can be retrieved > but some are missing, i donot know why > > dataset1 <-useDataset(dataset="sscrofa_gene_ensembl",mart=mart) > attributes <- listAttributes(dataset1) > seq = getSequence(chromosome=1,start=1, end=50000, > type="ensembl_gene_id", seqType="3utr", mart = dataset1) > > > > "3utr" ?"ensembl_gene_id" > "1" ? ? "Sequence unavailable" ?"ENSSSCG00000004013" > "2" ? ? "Sequence unavailable" ?"ENSSSCG00000004017" > "3" ? ? "CTTGTTCCACGTACCAAAGAGTGCTTGCTTCTCACCTTTTCAGAAACACACGCCATTAT ATTATAGAGTTGGAAACATCACA" ? ?"ENSSSCG00000004009" > "4" ? ? "CCTGAGCCGGGGACTGTGACAGGCTCCTCCTCCTGCCCGTGGCTGGCTGTACCTACGCC CCCTCCTGCTGGTGGCCTTCCTGCTCCCTTCAAGGGTCCCCCTTATTGCGAAAGGAGAAATGAATGGCAT CCGGGACTTCTGCACAGAATTTGGTAAAGTTAAAATAAAGAAAAAAAAACAA" "ENSSSCG00000004008" > "5" ? ? "GGT" ? "ENSSSCG00000004019" > "6" ? ? "Sequence unavailable" ?"ENSSSCG00000004014" > "7" ? ? "Sequence unavailable" ?"ENSSSCG00000004018" > "8" ? ? "Sequence unavailable" ?"ENSSSCG00000004012" > "9" ? ? "Sequence unavailable" ?"ENSSSCG00000004015" > "10" ? ?"Sequence unavailable" ?"ENSSSCG00000004007" > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253

ADD REPLY • link 12.1 years ago José Afonso Guerra-Assunção ▴ 10

Login before adding your answer.