Bioconductor Forum

samples and between samples. So far I have come up with the following plan: 1. Using CPM to compare gene/transcript expression within each sample sequenced with nanopore. For example, comparing if gene.X transcripts are...a good option since our nanopore runs do not have transcript length bias. Does this sound like a good strategy? 2. Using TPM to compare gene/transcript expression within each…

NanoporeRNASeq LongRead Normalization ShortRead IlluminaRNASeq

updated 3.0 years ago • Bernardo

Hello, Sorry for that very basic question: I have raw RNA-seq count data. The row names are genes, the column names are short sequences (e.g., AAACCTGCAATCTACG.1). Aren't these supposed to be sample names? What...is the name of such a file format? (Couldn't find anything online, though don't know what I have to search for.) The ultimate goal is to have...a count matrix of genes vs. samples.…

RNASeq

updated 4.4 years ago • Anne

Hello, I aim to correlate taxa-specific relative/total abundances (derived from 16S Illumina MiSeq Sequencing) of soil microbes with metadata along an environmental soil gradient...sequencing depth (i.e. 18k reads). These ISS-normalized abundances I had used as input for subsequent correlation analysis...DESeq2 (function varianceStabilizingTransformation()) as a precursor for subsequent differ…

vst relativeabundance Normalization DESeq2

updated 2.6 years ago • dwas

What is the best way to convert uniprot accessions to entrez gene identifiers? What is the best way to reverse the map org.Hs.eg.db::org.Hs.egUNIPROT ? Is

org.hs.eg.db uniprot accessions entrez gene identifiers

updated 10.4 years ago • Aditya

Dear all, I have a problem with the PSICQUIC package while mapping gene name synonyms. Here are the command lines : \# data fetching tbl.big <- PSICQUIC::interactions(psicquic, species="9606",provider...However, the last step now fails with the message:   Error in getBM(filters = filter, values = uniprots, attributes = columns,  :  Invalid attribute(s): uni…

psicquic

updated 10.6 years ago • anais.baudot

I'm performing an analysis of MRE-Seq (methylation-sensitive restriction enzyme digest + sequencing) data and would like to perform a Pileup count of the first 3 nucleotides (left or 5') of each mapped fastq...read. The fastq reads all start with 'CGG', as this is the site where the enzyme(s) cut. By performing a pileup of the first 3 nucleotides, we can get a measure of methylation at this site.…

pileup rsamtools bam genomicalignments

updated 10.6 years ago • Sam Buckberry

or answering a simple question please : would you please advise, what is the simplest and most reliable way to extract the mRNA sequences of the canonical RefSeq genes in human or mouse genomes ? thanks a lot, -- bogdan

Biostrings

updated 5.1 years ago • Bogdan

Hi, This is my first time posting here. I have sequenced 16s rRNA from faecal samples. I am looking at differential abundance analysis (DAA) using DESeq2. I wanted to know...if there are significant DAA between older and younger aged people. I got over 80 OTU sequences as being significantly DAA (P<0.05). Can I use baseMean to select out OTU which are most prevalent/ abundant? I.e.…

DESeq2 diferentialabundance 16srRNA baseMean

updated 2.8 years ago • ms.roshanipatel

How can I convert entry names such as LYSC_HUMAN into HUGO gene symbols, such as LYZ? UNIPROTKB seems to have another format of identifiers. ```r head(keys...Q5TD94" "Q9HA92" "Q9UHA2" ``` These are [accessions][1], whereas I want to convert [entry names][2]. `keytypes(up)` does not list anything that looks like entry name as a possible key to use. Can any other Bioconductor package

Proteomics ProteomicsWorkflow UniProt.ws

updated 5.0 years ago • Dario Strbenac

I am using `rtracklayer` package to access rs ID and restriction enzyme sites in a genomic interval so that I can make genotyping decisions. library(rtracklayer) session <- browserSession...genome(session) <- "mm9" trackNames(session) ## list the track names # for Restr Enzymes use "cutters" query <- ucscTableQuery(session,"cutters" , …

rtracklayer

updated 5.9 years ago • prabin.dm

in treatment B). These samples were the same type of tissue and were prepared for RNA-Seq. Prior to sequencing, an external spike-in was added to each sample (the same amount per sample). The tissue mass was measured before processing...However, is it possible to use the spike-in information to scale the TMM factor by an absolute abundance factor, so that we are working on 'absolute' counts? What…

edgeR SpikeIn RNASeq

updated 4.9 years ago • robert.chen

Hey! I have a set of sequences that I got from differential ChIP-seq analysis. I would like to use MEME-ChIP for motif discovery on this set of sequences...as input it requires sequences of the same length, however, for this trimming I would like to take the 400bp on the center of each of the sequences...from my list of different length sequences (ranging from 450 to 1400 approx). Could anybody…

Biostrings

updated 5.2 years ago • ferbecneu

Homo sapiens human HOXA5 NM_019102.2 Homo sapiens hsa-miR-130a It looks like miRNAs naming convebtion is the same for BioMart and miRecords databases My problem is the apparently different genes naming convention...Sean Davis Cc: bioconductor@stat.math.ethz.ch Subject: [BioC] how to find the VALIDATED pair (miRNA, gene-3'UTR- sequence) Thank you very much. I believe I can use biomaRt …

miRNA Homo sapiens biomaRt miRNA Homo sapiens biomaRt

updated 16.6 years ago • mauede@alice.it

Hi I am doing the following to get the tximport count  matrix with gene name in the first column <pre> txdf <- transcripts(EnsDb.Mmusculus.v79, return.type = "DataFrame") txdf$symbol <- mapIds...salmon", tx2gene=tx2gene, ignoreTxVersion=TRUE,dropInfReps=TRUE)   However when I do head(txi$abundance)      &…

tximport

updated 8.2 years ago • tanyabioinfo

Hello I am having trouble retrieving FASTA sequences for a some uniprot identifiers. It seems that in most cases this is due to the accession number now being a 'secondary...accession number'. Is there a way to retrieve sequences using these secondary accession numbers with uniprot.ws? Thanks -Brett

uniprot.ws

updated 10.8 years ago • bengelmann

div class="preformatted">Thank you very much. I believe I can use biomaRt functions to get the 3'UTR sequences through providing the crhomosome name and start/end sequence coordinates. However I am not sure that the text file...VALIDATED or computationally PREDICTED ? At he time being I definitely need the (miRNA,gene-3'UTR-sequences) experimentally VALIDATED pairs. Please, correct me if I am …

miRNA Homo sapiens biomaRt miRNA Homo sapiens biomaRt

updated 16.6 years ago • mauede@alice.it

intron retention events. To test for functional enrichment I build a dataframe that contains gene\_id, the length of all retained introns in a gene, the sum of baseMeans for those exons and whether or not at least on of those...as the bias data: genes <- retained_genes$sig names(genes) <- retained_genes$gene_id bias.data <- retained_genes$length …

goseq dexseq

updated 8.4 years ago • i.sudbery

and index creation are time- and memory-consuming steps. Best, Daniel ``` # Download uniprot trembl fasta sequences # to server with ~100GB memory # dl_link <- "ftp://ftp.uniprot.org//pub/databases/uniprot/current_release...human protein # and retrieve "recno" index fai_trembl[grepl("Q5HYB6_HUMAN", desc)] # read the sequence of this protein from file # using the precompu…

Biostrings

updated 6.3 years ago • daniel.magnus.bader

div class="preformatted">Hi All, I am trying to extract promoter sequences for a few ENTREZ IDS. The problem I am having is that there exists multiple transcripts for same gene. So this gives...me multiple promoter sequences for same gene. Can I filter out the redundant promoter sequences? Here is my code: ids.ok = c("67665" ,"13198" ,"110196","15368...coordinates of transcript #####&gt…

updated 11.4 years ago • deepti anand

preformatted">Hi all, I am working on the evolutionary aspects of metabolic net works(enzyme centric). This network will consists of nodes that are enzymes and any two enzymes are linked if they share a metabolite

Network KEGGSOAP Network KEGGSOAP

updated 16.8 years ago • anupam sinha

Davis Cc: bioconductor at stat.math.ethz.ch Subject: [BioC] how to find the VALIDATED pair (miRNA, gene-3'UTR- sequence) Thank you very much. I believe I can use biomaRt functions to get the 3'UTR sequences through providing the...Cc: bioconductor at stat.math.ethz.ch Oggetto: Re: [BioC] how to find the validated pair (miRNA, gene-3'UTR- sequence) On Wed, Jun 24, 2009 at 11:45 AM, <mauede al…

miRNA Homo sapiens biomaRt miRNA Homo sapiens biomaRt

updated 16.6 years ago • michael watson IAH-C

https://support.bioconductor.org/p/44366/). Like the OP I'm seeing a large number of intronic sequences in the most differentially expressed genes. using toptable to get the top 100 results (not correcting for multiple...testing) I get between 65 to 81 of the 100 being intronic sequences dependant on which pairwise comparison I'm doing. As an explanation of the experiment, I'm looking at the ef…

limma microarray normalization oligo differential gene expression

updated 10.3 years ago • ben_cossins

I have encounterd with one question about tximport. I have used tximport to import trancript-level abundance from Salmon to gene-level. now I want to choose some of significant genes for confirmation by RT-PCR. but I am not sure...may I choose the gene or it's transcript. because each gene has number of transcript. whether tximport summerize all trasncript of one gene...to gene-level or even one …

tximport

updated 7.2 years ago • lkianmehr

In metatranscriptomics, we have to do the regular normalizations found in RNA-Seq (ie. gene sequence length, and sample-differences in total expression). However, there is an additional normalization required...to account for changes in species abundance. I would like to analyze differences in species-level transcript abundance, so I've done all these normalizations

deseq2

updated 8.4 years ago • jspmccain

Atlas Enables Mapping of Homeostatic Cellular Shifts in the Adult Human Breast][1], [Figure 3][2], *Nature Genetics*, which is neighbourhood group A always differentially abundance buy group N never is? Both groups consist

miloR

updated 18 months ago • Dario Strbenac

I was discussing some preliminary data analysis and observations, I cam across a question about the gene length. Does EdgeR trimmed mean of M values (TMM) account for gene length along with the sequencing depth and RNA composition...resources (links below): - List item [EdgeR trimmed mean of M values (TMM)][1] - accounts for sequencing depth, RNA composition, and gene length, - List it…

Normalization edgeR RNASeq

updated 24 months ago • Sabiha

Hello, I have a question about DESeq2 count normalization in the context of 3'-end RNA sequencing. I understand that the median of rations normalization used by DESeq2 is not suitable for within-sample comparison...Hello, I have a question about DESeq2 count normalization in the context of 3'-end RNA sequencing. I understand that the median of rations normalization used by DESeq2 is not sui…

DESeq2

updated 3.2 years ago • theophile

gene_name in the TxDb objects from the GenomicFeatures package. Is the decision of not including gene names something that could be brought up again? I think we can all agree that in the end the majority of end users would like...gene names associated with their analysis as these ids are what supply the link to biological knowledge for most people. This...main advantages of Bioconductor is…

Bioconductor GenomicFeatures TxDB

updated 5.2 years ago • k.vitting.seerup

mapped to respective genomes with STAR, and then obtained read counts with featureCounts, using the most recent GTF gene annotations for the 3 species. I then obtained the lists of orthologue genes from ENSEMBL, and overlapped...how to perform the next steps properly, and would welcome any advice. I realize that the gene lengths (sum of exon lengths) of orthologuous genes can be different in dif…

deseq2 normalization

updated 7.7 years ago • akozlenkov

ask questions and learn from scratch. 1) We wonder if you could consider adding Zhang et al 2015 Nature Medicine study on RA (https://www.nature.com/articles/nm.3914) for inclusion in the next release? Metagenomic sequencing...per https://waldronlab.io/curatedMetagenomicData/index.html. Therefore, I did not see those most recent curated studies. Could you advise how could I access the most re…

curatedMetagenomicData

updated 2.9 years ago • yingliu3

I'm running DESeq2\_1.6.3 in R\_3.1.2 to compare expression levels between two conditions, with a dozen individuals in each condition.  The basic code looks like this: <pre> complete_table <- merge(control_table_trimmed, exp_table_trimmed, by=0, all = TRUE) completeCondition <- data.frame(condition=factor(c(rep("control", length(control_files)), rep("experimental",…

deseq2 normalization fitType parametric

updated 9.7 years ago • stwestreich

I have a file with DNA sequence of length 900 characters. I use Biostrings package to read the sequence from the file into R. <pre> library(Biostrings...filepath)</pre>   I get something like   <pre> A DNAStringSet instance of length 1 width seq names [1] 900 AACTGGTTACCTGCCGTGAGTAAATTAAAATT...GACGCAACGGTT…

r biostrings DNAsequence

updated 9.8 years ago • Agaz Hussain Wani

from UCSC. In quant.sf i have the Ref-seq transcript IDs. I now want to use tximport to aggregate to gene level. However "TxDb.Hsapiens.UCSC.hg19.knownGene" is of little use, as they neither contain RefSeq Ids nor Gene Symbols...tx2gene2_clean, reader = read_tsv) reading in files 1 Parsed with column specification: cols( Name = col_character(), Length = col_integer(), EffectiveLength …

salmon hg38 tximport tx2gene.csv

updated 8.9 years ago • seb.boegel

For my metagenomics dataset, I would like to retain only genes that are are >0.1% in abundance, for plotting and for subsetting differentially abundant genes I applied VST on the...raw counts. Is it OK to convert variance stabilised counts to relative abundances? or is it better to do this filtering by first transforming the raw counts matrix to relative abundances? Edit. Since...I trans…

deseq2

updated 9.3 years ago • adityabandla

div class="preformatted">Hi BioC List from {sunny}San Diego, CA! [Question]: * How do you map KEGG gene IDs to textual gene names, gene descriptions via BioC? For example, I am interested in knowing which genes are involved...pathway in rattus norvegicus, so I did: > library(KEGG) > # map pathway id to pathway name > KEGGPATHID2NAME$"04020" [1] "Calcium signaling pat…

Rattus norvegicus Rattus norvegicus

updated 18.1 years ago • Elliot Kleiman

Hi Everyone!! I was trying to convert the Plasmodium Uniprot IDs to Gene Symbols (or Gene IDs) but my `Converted` object is coming empty. What am I doing wrong? Is there an alternative

biomaRt

updated 3.5 years ago • rohitsatyam102

div class="preformatted">Hi, We have some arrays where most of the genes are turned on under certain conditions. This violates the assumption that most normalization methods make...I'm wondering, if LOESS or Quntile normalization can be used on a (small) subset of invariable genes and expand to the whole array? If so, is there such a tool in BioC? Thank you very much! Yiwen He DCB/CIT/NIH &…

updated 16.7 years ago • He, Yiwen NIH/CIT

Hi I'm trying to achieve a Differential Transcript Usage (DTU) analysis, in order to find genes that are differently alternatively spliced between my two conditions. The package RATs seems to allow this and I saw...that two methods were used to find DTU genes: > At the gene level, RATs compares the set of each gene’s isoform abundances between the two conditions to identify if...…

RATs RNA-Seq Isoforms DTU expression

updated 6.5 years ago • UserAnonyme

**Or "Everything You Always Wanted to Know About RNA-Seq (But Were Afraid to Ask) Part 2"** In RNA-Seq, it is common practice to compare the abundance of transcripts within the same sample after some form of intrasample normalizations (e.g., TPM) that take into account both transcript length and sequencing depth (although only the former is strictly necessary as long as no other samples are co…

RNA-Seq normalization PCR_efficiency bias intrasample_comparison

updated 2.7 years ago • FedeXander

div class="preformatted">Hi All, Why don't the numbers from summary(hgu95av2probe):199084 and length(pbn): 201800 don't match?!! There are more sequences than probe names?! I am including code snippet. Thanks, Hrishi data(hgu95av2probe...gt; summary(hgu95av2probe) sequence x y Probe.Set.Name Length:199084 Min. : 1.0 Min. : 1.0 Length:199084 Cla…

probe probe

updated 20.8 years ago • hrishikesh deshmukh

Hi, I have a problem when working with GOSeq. There is support for mm10 genome but not Gene ID geneSymbol I am trying to get length information by following the Goseq manual but I still dont understand. So, could...you please show me a little details how to get the length information for mm10 genome and geneID geneSymbol ? >genes = as.integer(all.genes %in% F.genes) > names(genes)…

goseq goseq

updated 12.2 years ago • Thanh Hoang

Hi, I have performed the abundance estimation using RSEM which outputted the genes.results and isoforms.results. I would like to import the isoforms...I followed the tximport pipleine for RSEM like following but when I checked the rownames, it gave me gene names instead of transcript names  <pre> rsem.files=list.files(".","*.isoforms.result") txi.rsem=tximport(rsem.files

tximport rsem

updated 8.3 years ago • deena

Hi, in my lab we have captured and sequenced L1 and ALU retrotransposons form many tissue samples from different donors/conditions. We're now running GOstats...Hi, in my lab we have captured and sequenced L1 and ALU retrotransposons form many tissue samples from different donors/conditions. We're now running GOstats using the list of detected somatic insertions withing Refseq genes +/- 1Kb in…

GO gene ontology bias retrotransposition

updated 8.9 years ago • mujupas

hub, c("OrgDb","Homo sapiens")) AnnotationHub with 1 record \# snapshotDate(): 2016-08-15  \# names(): AH49582 \# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ \# $species: Homo sapiens \# $rdataclass: OrgDb \# $title: org.Hs.eg.db.sqlite...ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/pub/current\_fasta \# $sourcelastmodifieddate: NA \# $source…

hub annotationhub

updated 5.0 years ago • wssdandan2009

annotation of a species i am working on, i need to use a large fasta file (>150,000 protein sequences). I have split this fasta up into 20 individual fasta's, and am trying to figure out how to write a loop over the initial...and then call the next fasta and run through the full list of files again while outputting relatable names? The idea would be that parameter 1 would concur with fasta…

MSGF+ Proteomics r

updated 6.3 years ago • laural710

Hi, I want to compare expression level of genes with the same samples and across species (different gene lengths). So I needed a way to normalize both within sample (similar...gt;assays(dds)[["avgTxLength"]] <- length.mat dds <- DESeq(dds) can I then compare genes within the same sample like I could do with Transcripts Per Million? E.g. high count of a gene will mean high exp…

RNA-seq normalization deseq2

updated 6.0 years ago • urjaswita

goseq up to date**? I've tried to make my own genome/object for use with goseq. I have a vector of gene names and lengths, but I cannot figure out how to incorporate GO Terms and then make a suitable 'object' that can be fed into...assigned properly, so a bit concerned. 1) Is `org.Sc.sgd.db` going to assign GO Terms to all of my genes - or do I need to use a different naming convention? 2) Ar…

clusterProfiler AnnotationDbi OrgDb goseq

updated 5.0 years ago • vanbelj

result looks great. There's just one thing I am having difficulties fine-tuning, which is the **length/dimension** of the **connectors** of gene labels. I've searched around and I found a couple posts on this but the functions...are not working. Could you please help me understand what I might be doing wrong? When I set the length of connectors (lengthConnectors = unit(1, 'npc')), the only thin…

Transcriptomics EnhancedVolcano

updated 2.8 years ago • Peter

geneB -6.33 -5.32 -5.6 -4.88 -5.39 geneC -6.15 -6.07 -5.6 -4.88 -5.9 geneD -6.57 -6.11 -6.36 -5.36 -5.96 geneD -6.74 -6.2 -5.49 -5.35 -5.95 geneE -6.75 -6.24 -5.73 -5.63 -6.02 Created as follows: geneA<-c(-6.19...geneB<-c(-6.33, -5.32, -5.6, -4.88, -5.39) geneC<-c(-6.15, -6.07, -5.6, -4.88, -…

updated 16.3 years ago • David

into the sub-compartment. Due to the nature of the sample prep, and there will inevitably be some contamination of total extract sequences in the sub-compartment...sequences. There are also some sRNA sequences that are natively present in the samples and are highly concentrated relative...to the total extract (which also contain them). The focus is *not* on these sequences. The intention is to no…

Sequencing edgeR Sequencing edgeR

updated 12.4 years ago • Kenlee Nakasugi

paths <- list.dirs(path = "/SampleData/TestData/", recursive = FALSE) for (i in 1:length(paths)) assign(paste0("sce_",i), loadSCE(paths[i])) sce=0 for (i in 1:length(paths)) sce[i]<-print(noquote(paste0("sce_",i))) t_list <- list...lt;- mget(ls(pattern="sce</em>\d+")) for(i in seq_along(t_list)) { metadata(t_list[[i]])["name"] <- …

Scater LoomR SCopeloomR

updated 7.1 years ago • Abhishek Singh

div class="preformatted">Hello, In working with HTqPCR lately I noticed that the gene/feature names are not produced in the output of limmaCtData, even if they are present in the input. This isn't an issue with...puzzled by how I would annotate multiple comparisons, so it would definitely be helpful if the gene names appeared in the output. Unless I'm doing something wrong, but that doesn't se…

annotate ddCt HTqPCR annotate ddCt HTqPCR

updated 11.9 years ago • Cornwell, Adam

Given a very simple DNAStringSet, built like this: afastafile <- DNAStringSet(c("GCAAATGGG", "CCCGGGTT", "AAAGGGTT", "TTTGGGCC")) names(afastafile) <- c("ABC1\_1", "ABC2\_1", "ABC3\_1", "ABC1\_2") I would get a DNAStringSetList where the list elements are grouped by a...built like this: afastafile <- DNAStringSet(c("GCAAATGGG", "CCCGGGTT", "AAAGGGTT", "TTTGGGCC")) na…

dnastringset dnastringsetlist seqnames

updated 7.2 years ago • s.ghignone

For the param file are not the same of different enzyme libraries. How to combine two replilcates of two enzyme libraries into pairs. And how to do normalization between different...enzyme libraries and test the difference by diffHic.  &nbsp

diffhic

updated 8.6 years ago • 516356412

I’m experimenting with the ability of fry/mroast functions to include gene weights. I have two use cases:   __1) Comparing the result of a previous experiment in mouse, with a current experiment...sure which is best: * Set gene.weights to be the observed logFCs in mouse. This will result in most genes having a weight set. * Only set weights for genes that were DE (FDR < 0.0…

limma mroast fry

updated 7.3 years ago • maltethodberg

I have saliva derived WGS data that I'm trying to remove *all* non-human contamination from. Two tools that I've found for this are [DeconSeq][1] and [DecontaMiner][2]. Both tools require known reference genomes for which you build a BWA database for alignment with BWA-SW. To begin with, I used the Human Oral Microbiome Database (FASTA); however, my PI's suggestion was to do a more exhaustive …

biostrings

updated 5.6 years ago • moldach

Hello everyone, I started using the Bioconductor package for R 3.6 (BiocManger). I want to use the package MSGFplus to identify proteins from mass spectrometer data (.mzXML) and to do so you can specify parameter for the search via msgfPar(). One parameter defines the enzyme which was used to digest the proteins. Here comes my question. Is it possible to specify more than one protein which w…

ms gf plus enzyme R

updated 6.5 years ago • ro4175ko-s

I would like to get the Arabidopsis thaliana protein functional relations extracted by standard sequence analysis techniques such as Phylogenetic profile, Rosetta Stone, Gene neighbor and Gene cluster. There is such information...database. In case there isn't, I don't know if there are Bioconductor packages that implement such sequence analysis techniques to get the most up-to-date information fo…

Arabidopsis thaliana Arabidopsis thaliana

updated 15.7 years ago • Javier Pérez Florido

Hello, I would like to use EDASeq for gene length normalization for RNA-Seq analysis. However, I have a basic question: According to the manual, the "getGeneLengthAndGCContent...help us retrieve the gene length and GC content of our gene of interest but, how can I use it to download all the genes information instead of some...genes for human? Thanks

edaseq r rna-seq

updated 5.7 years ago • karla.ruizce30

but what if I want to make comparisons (with some measure of confidence) between isoform abundances *within conditions*. Could I just use the estimated abundance of each transcript from Salmon quantification corresponding...to a gene of interest? If so, what statistical test would be appropriate for the comparison? The null hypothesis, here, is that all...expressed (vs. annotated?) isoforms contr…

rnaseqDTU

updated 6.4 years ago • nicog