Bioconductor Forum

how do I take into account "strand" when retreiving the sequences? Thanks again, Eric > peaks = RangedData(IRanges(start=c(SeqTest$start), end=c(SeqTest$end), names=c(SeqTest$peakID)), space...AM, Zhu, Julie <julie.zhu@umassmed.edu> wrote: Hi Eric, Please try the following code. You had the gene names in space which needs to contain chromosome names. peaks = RangedData(IRanges(st…

Biostrings ChIPpeakAnno Biostrings ChIPpeakAnno

updated 15.8 years ago • Julie Zhu

<div class="preformatted">Dear all: I'm trying to do a mapping from probe sequence to genome using Biostrings. The array platform I used is arabidopsis tiling array 1.0R. However, I have problems to...div class="preformatted">Dear all: I'm trying to do a mapping from probe sequence to genome using Biostrings. The array platform I used is arabidopsis tiling array 1.0R. However, I ha…

probe probe

updated 16.5 years ago • zhen tao

<pre> > e.coli <- readAAStringSet(filepath = "uniprot-proteome_UP000000625.fasta") > summary(e.coli) Length Class Mode 4306 AAStringSet S4 > some.peptide <- AAString("DYWRALQNRIREGHVEDVYAYRRRQ") > summary(some.peptide) Length Class Mode 25 AAString S4 > x <- matchPDict…

biostrings AAStringset

updated 9.1 years ago • tobias.kockmann

A. I already had a count table, and would like to use rpkm() in edgeR, but first I have to get a gene length vector. My question is how to count gene length from an "Ensembl.gtf" file by taking into account the following: 1...Gene 1 is much longer than Gene 2 if including both exon and intron. But Gene 1 only has 3 exons, and Gene 2 has 10 exons --> for the...transcripts, Gene2>Ge…

RNASeq edgeR rpkm

updated 11.2 years ago • shirley zhang

Hi I'm working on RNA-Seq analysis to get differentially expressed genes between two sample conditions. I'm following the new Tuxedo pipeline- HISAT STRINGTIE BALLGOWN. My concern is about...Hi I'm working on RNA-Seq analysis to get differentially expressed genes between two sample conditions. I'm following the new Tuxedo pipeline- HISAT STRINGTIE BALLGOWN. My concern is about using...don't und…

r normalization

updated 5.4 years ago • lakshmi9c

div class="preformatted">Dear List, Can anyone advise me how to add a list of significant genes onto a gene ontology table so that I can see which of my differentially expressed genes belong to a given GO group? I would...this GO pathway. Having read the vignettes I have been able to generate most of this table but not the last column containing the Affy_Ids (or ideally gene symbols). I wou…

GO affy GO affy

updated 19.9 years ago • Quentin Anstee

samples are either RNA, or RNA with selective depletion of some forms of RNA. In short, the relative abundance in the second group of samples should always be equal to or smaller than that in the control, but never higher. The...difference in abundance might concern a substantial fraction of mRNAs (10-50%). Naturally, when the samples are normalised, since the total...transcript abundance in the…

qPCR limma qPCR limma

updated 14.8 years ago • January Weiner

HI, When I type the following in edgeR: > \# Process raw sequences from fastq file > x = processAmplicons("Index2.Plate\_10.fastqsanger", barcodefile="Samples1.txt", +  &nbsp...Index2.Plate\_10.fastqsanger", barcodefile = "Samples1.txt",  :   Barcode sequence length is set to 5, there are barcode sequence not with specified length. …

processamplicons edger barcode

updated 9.7 years ago • lucia.caceres

Hello! I'm interested in plotting the expression values of my samples for a certain gene. I'm a little confused on how to do this. In my pipeline, I used kallisto to estimate transcript abundance and then used tximport...and the counts(dds,normalized=TRUE) function in DESeq2 is that TPM just corrects for transcript length, and counts(dds,normalized=TRUE) corrects for transcript length and librar…

DESeq2

updated 24 months ago • mp52226

am interesting in clustering my single cell data to identify clusters and then figure out if certain genes differ in expression profiles and then try to link them to developmental stages.  1. So, to be able to compare gene to...gene expression, I suppose the expression scores should be corrected for gene length. Does SC3 do this somehow? I don't pass in...a gene length argument at any p…

sc3 scater single-cell rna-seq biclustering

updated 8.4 years ago • rmf

Following the vignette "Example using Negative Binomial in Microbiome Differential Abundance Testing" - my test results table, however, has the sequence rather than an OTU number in the first column. How would I

deseq2

updated 7.7 years ago • irene.yang

each), compare each protein's amino acid sequence in SeqData1 to each amino acid sequence from SeqData2, compute an alignment score, and if the score is >90% I concatenate...a list of the protein names that match and the sequence of the SeqData2 protein. Is there a more efficient solution to my currently-projected runtime...of 1 month? Maybe a solution involving a growing data.table? Im n…

Alignment genomes Alignment genomes

updated 12.1 years ago • Guest User

Hi, I have performed RNAseq analysis (filtering, normalization, and interested group comparisons using `EdgeR` package. Further, I am also interested in performing unsupervised Hierarchical clustering `heatmap` (maybe using `ComplexHeatmap` or `coolmap`). Here, unsupervised refers to employing a list of genes that is not identified through group comparisons (i.e. not informed by grouping labels)…

supervised edgeR variance RNASeq heatmaps

updated 3.1 years ago • mohammedtoufiq91

RNAseq experiments as a summarized experiment. Every raw is one specific triplet on a certain gene. What I want to do is calculate the distance of that position to the next GGUC sequence. So: how far is the certain triplet...away from the next GGUC sequence? I have 0 idea how to start. I do know how to get the sequence of the certain gene but I thought there might me a smart shortcut

SummarizedExperiment Genetics

updated 2.8 years ago • Marcus

I'm using `tximport` to combine RSEM `.genes.results` output for downstream DEG analysis, and I get `abundance` and `counts` in the tximport object. So I'm wondering how `tximport` calculate `abundance`, is it similar to `FPKM`? Any suggestions

tximport abundance

updated 5.5 years ago • chendianyu

When I did RNA-Seq analysis, the GTF file I used was from NCBI. The output of cuffdiff replaced the Gene symbol (official gene symbol) with XLOC's such as: LOC110534079 LOC110534540 LOC110537830 LOC110485322 LOC110487655...LOC110500236 LOC110502506 Example : LOC110537830 (ID) = mknk1 (Gene symbol) Is the…

RNA-seq / XLOC cuffdiff

updated 5.7 years ago • mg.mahabad1365

I tried using the processAmplicons function from edgeR where the hairpin sequence is at start in the fastq file and the barcode towards the end. While there are 100% matches with barcodes, I'm getting...0% for the hairpins. However the hairpin sequences are present in the fastq file. As the updated version of this function now allows both structured and variable...I am not able to specify hairpin…

edgeR

updated 3.5 years ago • Claire.Prince

UniProt.ws, keys, columns, kt) res >UNIPROTKB >1 Q9UNQ0 >FEATURES >1 Alternative sequence (2); Chain (1); Disulfide bond (2); Domain (2); Frameshift (2); Glycosylation (1); Mutagenesis (11); Natural variant (18); Nucleotide binding...1); Sequence conflict (9); Site (2); Topological domain (7); Transmembrane (6) And I would like to have all the information for every…

UniProt.ws UniProt.ws

updated 12.0 years ago • Guest User

Dear altruists, the following code is running properly when I'm trying to compare AXT (from UCSC) sequence pairs and keep only [A, T, C, G] uppercases and compute the matched sequence lengths. But somehow, this function is producing...Dear altruists, the following code is running properly when I'm trying to compare AXT (from UCSC) sequence pairs and keep only [A, T, C, G] uppercases and compute t…

DNASeq CNEr

updated 4.1 years ago • Md Abrar

for bulk RNA-seq of the whole brain of drosophila (Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf). I did the annotation using Rsubread and got a file with gene symbol. However, there are some genes of one spelling...but the first letter is either uppercase or lowercase. They are with different gene ID (e.g. Crc and crc). However, when I search NCBI's homepage for "Crc", I am …

Drosophila Rsubread dm6

updated 3.7 years ago • Chise

Hi all, I'm working now on a method to try and plot a protein sequence after digested with one (or more) restriction enzyme(s). the goal is to use a specific cleavage points and cut the protein...sequence into snippets. After filtering the too long and too shots snippets, I would like to plot the remained peptides and...plotRanges(subseqs.filt) AAcovered <- sum(as.data.frame(subseqs.f…

biostrings iranges sequence alignment protein

updated 9.0 years ago • Assa Yeroslaviz

Dear all, I have a list of peaks from ChIPseq experiments. Now I am trying to find to over which genes these peaks overlap (and extract the gene name). I'm sure this should be pretty easy, but I am just starting with bioconductor...quietly = TRUE) txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene ee <- exonsBy(txdb, "gene") # Load an subset of my peaks for the sake of the example finalP…

ChIPSeq chipseq ChIPSeq chipseq

updated 12.5 years ago • Patrick Schorderet

in dataset is Ensembl_ID. You could use getBM function in biomaRt package to convert ensembl_ID to gene name or other IDs if needed. Best regards, Julie On 1/19/11 2:10 PM, "Pablo Echeverria" <pablo.echeverria at="" unige.ch=""> wrote: &gt...that are described in your paper (those are already > working), but also I need to retrieve gene names associated to my peaks. > …

Annotation annotate convert ChIPpeakAnno Annotation annotate convert ChIPpeakAnno

updated 15.0 years ago • Julie Zhu

div class="preformatted">Hi group, I am interested in retrieving about 2000 sequences with the specific chromosome number,start and end site. I was thinking of using BSgenome package for this. >source...Hsapiens,full.df$chromosome,start=10000,end=10020) #but then when I use start=full.df$Start. It naturally throws an error saying 'start' must be a vector of integers Questions: How D…

BSgenome BSgenome BSgenome BSgenome

updated 14.4 years ago • viritha kaza

<span style="line-height:1.6">I was following a protocol of RUVseq for a method RUVg. After performing a first pass of edger differential analysis to identify the most non-differential expressed genes I took a look on my table top and found out that I had only 7 genes with FDR < 0.9 and all others genes have an FDR of >0.999. The concept of RUVg is to take the most undifferent…

ruvseq

updated 10.1 years ago • tonja.r

Hello, I used DESeq2 to see which ASVs were differentially abundant on 16S metabarcoding data. I now want to plot the relative abundance (in %) of those ASVs. However, I am unsure which data

DESeq2

updated 4.5 years ago • Marion

AnnotationDbi::select() on org.Ss.eg.db returns two unique matches for UNIPROT P59083, (PHP14_PIG, MAMDC4_PIG). I expect one. Crosschecking at uniprot.org: P59083 comes up as PHPT1 (PHP14_PIG), and...at https://www.ncbi.nlm.nih.gov/search/ the ENTREZID's shown below map identically. So the UNIPROT mapping appears wrong, but does not appear to come from Uniprot.org. A BLAST of P59083 Fasta …

MAMDC4 org.Ss.eg.db AnnotationDbi

updated 4 months ago • munhalla

of nucleic acts string sets): __> extraemm\_DNA\_untrim1__ A DNAStringSet instance of length 5     width seq                                       &nbsp...nbsp;       …

biostrings decipher

updated 7.9 years ago • reubenmcgregor88

div class="preformatted">Hi, I was trying to map gene symbols to gene names using the org.Hs.eg.db package. I first convert the gene symbol to an entrez id, and then convert...that to a gene name (example code below). However, during this process I can't get the gene names for some of the genes: -------------------------- library(org.Hs.eg.db...First two genes are ok... symbols <- c(…

convert convert

updated 11.7 years ago • Tim Smith

gt; library("KEGGREST") > keggGet(c("hsa:10458"),"ntseq") A DNAStringSet instance of length 1 width seq names [1] 1659 ATGTCTCTGTCTCGCTCAGAGGA...CCCGCACCCTGGCTGGAAGATGA hsa:10458 K05627 ... However, how can I remove...the ellipsis and get the complete sequence and results? Thank you

keggrest biostrings

updated 8.0 years ago • xieduo

__Edit__: One of my bam files has size 0 on disk! I get the following error when I call summarizeOverlpas:   <code>Error in names(res) <- nms :<br/>   'names' attribute [15] must be the same length as the vector [2]<br/> Calls: summarizeOverlaps ... .dispatchBamFiles -> bplapply -> bplapply -> bplapply -> bplapply<br/&…

software error summarizeoverlaps biocParallel

updated 9.4 years ago • chriad

I am trying to use the PAPi package but I get the same error all the time:  <pre> 'names' attribute [1] must be the same length as the vector [0] In addition: Warning messages: 1: In if (getpath != 0) { : the condition has length...gt; 1 and only the first element will be used 2: In if (getpath != 0) { : the condition has length > 1 and only the first element will…

PAPi metabolomics pathway analysis raphael

updated 7.5 years ago • maryke.wijma

1) Is RNA-Seq data even appropriate for "standard" cluster analysis due to its discrete nature? What normalization should be done beforehand? We tend to perform length and TMM normalization of our data. 2) If we perform...some sort of clustering of RNA-Seq data, and then obtain a gene list from a cluster (e.g. all genes in a cluster) and then want to perform gene set enrichment analysis on thi…

Normalization Clustering Normalization Clustering

updated 13.8 years ago • Julie Leonard

Hello,  I performed an analysis of differential expression with SAM method, obtaining the genes that are up and down. I used these scripts: <pre> > samfit = SAM(exprdc, group, resp.type="Two class unpaired", fdr.output=.01...Hello,  I performed an analysis of differential expression with SAM method, obtaining the genes that are up and down. I used these scripts:…

annotation software error microarray oligo annotationdbi

updated 10.5 years ago • santi.cabellos

pubmed EC:2.5.1.3 1 Camiener, G.W.; Brown, G.M.: The biosynthesis of thiamine. 2. Fractionation of enzyme system and identification of thiazole monophosphate and thiamine monophosphate as intermediates. J. Biol. Chem...A.P.; Sousa, M.C.: Crystal structure of Escherichia coli ArnA (PmrI) decarboxylase domain. A key enzyme for lipid A modification with 4-amino-4-deoxy-L-arabinose and polymyxin resi…

brendaDb

updated 3.5 years ago • Lluís Revilla Sancho

with the baseMean, log2 fold change, and pvalues, except that the rows are numbered rather than named by gene. I would assume that the genes are listed in the same order that they appeared in the countData, ie I could annotate...the results file with gene names by simply assigning it the countData row names.  I hesitate to do this, however, in case some other ordering occurred...entirel…

deseq2

updated 8.5 years ago • kalaga

here][1] I would like to use my protein coding genes as the background sequences (i.e. ExBackgroundSeqs.fasta) and ribosomal proteins as my query sequences (e.g. ExSeqs.fasta...gt; cT_all = codonTable(x=all_proteins) # There are 10 ribosomal sequences > length(cT_ribosomal@len) [1] 10 # There are 500 background sequences > length(cT_all@len) …

software codon usage bias dna

updated 6.1 years ago • jol.espinoz

Hi there I have a question about the appropriate salmon-generated gene counts, which take into account gene length, for the purpose of running eqtl analysis. I would much appreciate your suggestions...on this. I used tximport to obtain gene-level counts/abundance estimates for our RNAseq data (estimated using default countsFromAbundance=no). The downstream...eqtl analysis. With …

salmon tximport gene length eqtl TMM

updated 7.7 years ago • VSM

file is on NCBI and my questions are below. 1. How can I trace back to the original source of raw sequence data on NCBI based on the information on metadata ? (Ex: If I want to know the exact raw sequence data of AsnicarF_2017...source on NCBI). 2. Where can I find more detailed information about processing steps from raw sequence data to usable data (e.g. relative abundance, gene counts, …

curatedMetagenomicData

updated 20 months ago • Edward

training set, I then used this model for prediction as follows. The RSS data set is contained in gene segments, typically one or two RSS per gene segment. The gene segments are often much larger than the RSS. These are 12RSS...so each RSS is of length 28. I took all the gene segments I could find that contained an RSS, and selected from them all contiguous sequences of...length 28. The current t…

updated 13.0 years ago • Faheem Mitha

The mock community exists in 2 basic versions - one in which each taxon is (supposed to be) equally abundant, and one in which one of the species has inflated abundance (at several different levels).  We would like to use the...datasets to test our wet lab and bioinformatics pipeline. Our idea was to construct the OTU abundance table (this is, IMHO, the equivalent of the gene counts in …

edger deseq2

updated 10.3 years ago • Nick N

Apologies, I know this is not the most appropriate forum to post this as it's a community ecology question more than anything else, but I'm having trouble even...to see if there are any pairs of OTUs that tend NOT to be found in the same sample or whose relative abundance in a sample is negatively correlated (leading to the hypothesis that one OTU competitively excludes the other

phyloseq

updated 9.5 years ago • am39

div class="preformatted">Hello, I used getSeq to retrieve human genomic sequences from the BSgenome data package: library(BSgenome.Hsapiens.UCSC.hg18) # cpg is GTF file chr6 <- cpg[grep("6", cpg[[5]]),] # chrom...6 length(unique(chr6[,6])) # 10683 genomic positions # loop to match "CG" and print 100bp to file for (i in 1:10683) { if (getSeq(Hsapiens, "chr6...getSeq(Hsapiens, "chr6",…

updated 8.8 years ago • Tiandao Li

How to get gene names for the probe id's using  <pre> pd.hta.2.0?</pre

bioconductor

updated 7.3 years ago • sunandinisharma

I have a huge list of gene names, and I'd like to map corresponding gene IDs to each name. I've tried using this R library: `` org.Hs.eg.db ``, but it creates...more IDs than names, making it hard to map the results together, especially if the list is long. Example of an input file (7 gene names): RPS6KB2...read input file GeneCol = as.character(input$Gene.name) #a…

r bioinformatics org.Hs.eg.db genetics

updated 7.6 years ago • Bayram Sarilmaz

When I using DESeq2, I encounter a hurdle. The gene I knocked down has a global influence on mRNA degradation. According to previous studies base on microarray, about 15...of all gene was up-regulated after it was knocked down. So, I think the default parameters DESeq2 used maybe not suitable. I think if...DESeq2 could normalized counts base on that most genes were not up-regulated or down-…

deseq2

updated 6.0 years ago • zhechen

Dear list I'm trying to do a rank porducts analysis on an expression set and I cannot get the gene names onto the results table. The error is 'gene.names should have the same length as the gene vector' but I'm giving topGene...column 0 of the data matrix I gave RP so how can they have different lengths? What exactly is the 'gene vector' mentioned in the error? My working below. Many thanks fo…

updated 15.7 years ago • John Coulthard

using MSGFplus, MSnBase and MSnID. I now have a final combined MSnExp. When running MSGFplus, I used uniprot data which brings in accession numbers etc. Seperatly, i've used uniprot to download EntrezID, symbol and gene names

bioconductor proteomics msnset

updated 7.9 years ago • lolli.langan

<div class="preformatted">Hi all, I have dataset of 120 Affy arrays, 60 males and 60 females. The expression profiles of the 2 groups differs dramatically, i.e. if I run a standard RMA + limma, I have ~90% of the genes differentially expressed. Also, downregulated genes are twice as many than upregulated genes, although if I impose a cutoff of two-fold difference in expression, they are al…

Normalization affy limma Normalization affy limma

updated 16.7 years ago • Paolo Innocenti

div class="preformatted">Hi everyone, Does anyone know how to go from gene name to ENSEMBL ID? I'm using lumi to analyze my microarray data, however the names get changed from NuID to gene name when

Microarray GO lumi Microarray GO lumi

updated 12.2 years ago • Kripa R

ended, contextual, and difficult to phrase. I am looking for clarity on what statistical test is most appropriate for my biological question. I suspect that a chi-square test is most suitable, but I am not certain. I would like...125 upstream genomic ranges of all tx's (n=10,897) in a genome) have reliable differences in A,T,C,G abundances compared to A,T,C,G abundances from the entire genome…

GenomicRanges

updated 23 months ago • mat149

Hi all, I would like to compare, statistically, the expression levels of orthologous genes across different species. The main issue that arise with this is that TPM may be influenced by gene length and, because...Hi all, I would like to compare, statistically, the expression levels of orthologous genes across different species. The main issue that arise with this is that TPM may be influe…

normalization

updated 5.8 years ago • cossardg

that was what I want to know from > RNA-capture sequencing of 83 genes. The sequencing depth could be > normalized by RPKM, the traditional RNA-seq gene expression normalization...gt; method which normalize gene expression by dividing gene length and total > reads number. > âYou can go ahead with a differential expression analysis...83 candidate >> gen…

Sequencing Normalization GO DESeq Sequencing Normalization GO DESeq

updated 12.4 years ago • Michael Love

samples, and it may further cause the mis-quantification of transcripts, for example, in the same gene, but we obtained a seemingly correct but actually erroneous expression level for transcripts with high sequence similarity...and assigned non-existent transcripts in each sample to 0. Second, I loaded all the transcript abundance from lr-kallisto into R using tximport, and was trying to condu…

DESeq2

updated 5 months ago • 泓朴

The method is similar to standard eukaryotic transcriptomics approaches (map reads to genome/genes), but humann2 calculates gene family abundances as weighted sums of the alignments from each read, normalized by gene...length and alignment quality. There's a lot of posts stating that EdgeR and DESeq2 should (usually) only be used on raw counts...In the context of metagenomics, would it be appr…

edger deseq2

updated 8.1 years ago • nyoungb2

gt; described in : > * > * > *Stem cell transcriptome profiling via massive-scale mRNA sequencing* > *Nicole Cloonan et al* > *NATURE METhODS | VOL.5 NO.7 | JULY 2008 | 613* > http://www.nature.com/nmeth/journal/v5/n7/abs...gt; To calculate differential expression of SQRL tag data we analyzed the > normalized gene signals (tags per Refseq transcrip…

HapMap limma HapMap limma

updated 14.1 years ago • Gordon Smyth

path:map00620 4.1.1.32 path:map01100 4.1.1.32 path:map01110 4.1.1.32 df_db$enzyme<-gsub("ec:","",df_db$enzyme) db_final<-df_db %>% dlply( "pathway", `[[`, "enzyme" ) %>% c database_pathway <- db_final[!duplicated(names...df_unique) df1 <- filter(df_na, log2FoldChange != 0) geneList <- df1[,2] name…

R fgsea

updated 4.7 years ago • julie.hardy

rownames(V16V8filt_lfc_up)),]</pre> I want to subsequently perform GOseq analysis on this set (65 genes). Questions: 1.For GOseq, should my set of background assayed genes be all the genes assayed in __both__ contrasts? or something...V16G16[union(rownames(V16G16),rownames(V16V8)),]</pre> 2. As far as I know HTseq can’t generate a gene length file. I thus created my gene length fr…

deseq2 goseq htseqcounts rnaseq plant

updated 9.6 years ago • Ben Mansfeld

and the total edit distance between original barcode and error-corrected barcoded. When using the 'Sequence-Levensthein' distance, the barcode that is in the original read may have a different length than the error-corrected...barcode (since Sequence-Levensthein allows for correction of insertions and deletions up to a certain edit distance). Is there an easy way...other than doing a Smith-Waterm…

single-cell sequencing barcodes

updated 6.7 years ago • Philip Lijnzaad

I have a kalliso-created abundance.h5 file but the filename (ES\_1\_high\_28881\_CGATGT\_abundance.h5) includes information about the sample. When I try to import this file using tximport v1.10.0 I get an error. tximport thinks that the file is a .tsv file.  However, if I create a symlink named abundance.h5, I can load that perfectly well.  Is there a way to tell&…

tximport kallisto

updated 7.0 years ago • Lucas Carey