The miRvestigator framework is designed to take as input a list of co-expressed genes and will return the most likely miRNA regulating these genes. It does this by searching for an over-represented sequence motif in the 3' untranslated regions (UTRs) of the genes using Weeder and then comparing this to the miRNA seed sequences in miRBase using our custom built miRvestigator hidden Markov model (HMM).
Post-transcriptional regulation by a miRNA is mediated through binding to complementary sequences in the 3' untranslated regions (UTRs) of transcripts and generates a biological context-specific signature of down-regulation for these transcripts, which is called a co-expression signature. Co-expression signatures observed in transcriptome studies are thus assumed to be of biological origins whereby the expression of genes are regulated by shared factors (transcription factors, miRNAs, genetics or environmental factors). There are many methods available to identify co-expression signatures, and the miRvestigator framework does not bias toward any particular method.
The input for miRvestigator is a gene list (e.g. 79073, 900, 29957, 11167, 4154, 84270, 84061, 4001, 26503, 4086, 51585, 4734, 5873) and thus does not require quatitative expression data as input. However, it is assumed that some other information was integrated to identify that set of genes, such as gene co-expression, specific miRNA perturbation, etc.
miRvestigator currently supports:
For those who study the effects of viral miRNAs on host gene expression we have added the ability to include viral miRNAs. By selecting the Yes radio button the viral miRNAs will be searched along with the host species miRNAs. Note that perfectly overlapping seed seqeunces of a viral miRNA with a host miRNA will be concatenated as we cannot separate these influences. (NOTE: All viral miRNAs are included so you will have to filter them for those that you are interested in, please consult miRBase for the species codes.)
Currently the miRvestigator framework is able to take Entrez gene IDs, Ensembl gene IDs, RefSeq transcript IDs and official gene sympbols as input. To convert any other type of identifier to Entrez gene IDs please use the DAVID bioinformatics resource gene ID conversion tool. The DAVID gene ID conversion tool has a great help page please visit it if you have any questions.
You can enter the gene IDs separated by commas, spaces, tabs or newlines.
Load the data by simply clicking the button with the corresponding miRNA on it, which will load the data and setup any parameters. Then click the submit button at the bottom of the page.
For the H. sapiens (human) sample data are co-expression signatures reduced to Entrez ID gene lists from studies where a miRNA was experimentally perturbed and the resultant effect ascertained by transcriptome measurements (e.g. gene expression microarray). To provide a working sample dataset for other species where an experimentally perturbed datasets were not yet available we utilized TargetScan miRNA target mRNA predictions for miR-1.
Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.Cell 120(1):15-20 (2005).
Hendrickson, D.G., Hogan, D.J., Herschlag, D., Ferrell, J.E. & Brown, P.O. Systematic identification of mRNAs recruited to argonaute 2 by specific microRNAs and corresponding changes in transcript abundance. PLoS ONE 3, e2126 (2008).
Grimson, A. et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91-105 (2007).
Linsley, P.S. et al. Transcripts targeted by the microRNA-16 family cooperatively regulate cell cycle progression. Mol. Cell. Biol 27, 2240-2252 (2007).
Malzkorn, B. et al. Identification and functional characterization of microRNAs involved in the malignant progression of gliomas. Brain Pathol 20, 539-550 (2010).
Krützfeldt J, Rajewsky N, Braich R, Rajeev KG, Tuschl T, Manoharan M, Stoffel M. Silencing of microRNAs in vivo with 'antagomirs'. Nature. 2005 Dec 1;438(7068):685-9.
Laganŕ A, Forte S, Russo F, Giugno R, Pulvirenti A, Ferro A. Prediction of human targets for viral-encoded microRNAs by thermodynamics and empirical constraints. J RNAi Gene Silencing. 2010 May 24;6(1):379-85.
Yang R, Dai Z, Chen S, Chen L. MicroRNA-mediated gene regulation plays a minor role in the transcriptomic plasticity of cold-acclimated zebrafish brain tissue. BMC Genomics. 2011 Dec 14;12:605.
Giraldez AJ, Mishima Y, Rihel J, Grocock RJ, Van Dongen S, Inoue K, Enright AJ, Schier AF. Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science. 2006 Apr 7;312(5770):75-9.
Weeder is an enumerative algorithm used to identify over-represented sequence motifs in the miRvestigator framework.
Weeder identifies both 6 base-pair (bp) and 8bp motifs for each run. Using this option it is possible to choose which motif size is subsequently used in the miRvestigator HMM is set by this option. Picking only one of the motif sizes will make the run faster. By default we have chosen to use 8bp motifs. (Note: It is possible that the 6bp motif and the 8bp motif will be different.)
Weeder uses the background model to determine whether or not an oligo is enriched above background. Each different species has a different background model based upon the sequence composition of the specified region in that species. The Weeder documentation states that the "Default Weeder Model" should be sufficient for 3' UTRs even though it was built based up on sequences upstream from the start codon, which was consistent with our findings. Although in certain cases a model built upon the annotated 3' UTR sequences from UCSC was helpful. Therefore, by default the "Default Weeder Model" is chosen although it may be best to try the "3' UTR Specific Model" as well. (Note: It is likely that the 3' UTR background model would improve given more complete 3'UTR annotations.)
The way Weeder is setup to run in the miRvestigator framework it identifies motifs based upon allowing at max 2bp of mismatch. The canonical view in the field of miRNAs is that 7mer and 8mer matches to the seed sequence are the most efficacious. If the target site quality threshold is set at 100% Identity then only perfect 8mer or 6mer matches (depending on the motif size) are returned, and if the target site quality threshold is set at 95% identity this typically retrieves matches with 1bp discrepancy to the motif. Therefore to be consistent with the canonical view of the activity of miRNAs we set the target site quality threshold by default to 95% identity. (Note: When using 6bp motifs the target quality threshold ought be set to 100% identity, and the efficacy of the 6bp predicted binding sites would need to be extensively tested experimentally.)
The miRvestigator hidden Markov model (HMM) is a method designed to take an over-represented sequence motif (in this case from Weeder) and compare it to all the miRNA seed sequences from miRBase using the Viterbi algorithm. The over-represented sequence motif is turned into a profile HMM and each seed sequence in miRBase is aligned and a probability computed using the Viterbi algorithm. Then a Complementarity p-value is calculated for each miRNA by comparing it to an exhaustive distribution of Viterbi probabilities.
What are the seed models?
Base-pairing models for the seed regions of a miRNA to the 3' UTR of target transcripts. The 8mer, 7mer-m8, and 7mer-a1 models are the canonical models of miRNA to mRNA base-pairing. The 6mer models are considered marginal models as they typically have a reduced efficacy and are more likely to occur by chance alone. By default all of the seed models are used. The seed models are described in this figure:
The miRvestigator HMM can also model G:U wobble base-pairing which has been observed in miRNA to target transcript 3' UTR complementarity. The sequence motif is unlikely to be comprised completely of instances where a G:U wobble was used in the exact same spot. However, it may be possible and we do not want to exclude the possibility. Thus the miRvestigator framework can be enabled to model these G:U base pairings once the G or U nucleotide frequency in the sequence motif column passes a specified threshold. This threshold can be set with the "Min. Freq. of G or U". By default the wobble base-pairing is not modeled but if wobble base-pairing is enabled we recommend a threshold of 0.25 for the "Min. Freq. of G or U".