Tuesday, July 31, 2007

learning note about ncRNA

  • siRNA and miRNA
Both are kinds of ncRNA(non-coding RNA);
Both are small, and short, ~20nt in length;
Both are involved in the RNA interference (RNAi) pathway where the siRNA interferes with the expression of a specific gene;
Difference in structure:
siRNA is a short (usually 21-nt) double-strand of RNA (dsRNA) with 2-nt 3' overhangs on either end.
while, miRNA is single-stranded RNA molecules of about 21-23 nucleotides in length with step-loop secondary structure.

Another essential differcence I guess is siRNA is exogenous, while miRNA is endogenous. So, when people use this techonology(I think siRNA is just kind of technology, not RNA in cell naturally like miRNA) to do RNAi, two issues (innate immunity, and off-targeting) are chanllenging us.
Direct transfection of an exogenous siRNA can be problematic, since the gene knockdown effect is only transient, particularly in rapidly dividing cells. One way of overcoming this challenge is to modify the siRNA in such a way as to allow it to be expressed by an appropriate vector, e.g. a plasmid. This is done by the introduction of a loop between the two strands (like a miRNA in form), thus producing a single transcript, which can be processed into a functional siRNA. People also use this way to avoid the non-specific effects by siRNA; converting siRNA into a miRNA. MicroRNAs occur naturally, and by harnessing this endogenous pathway it should be possible to achieve similar gene knockdown at comparatively low concentrations of resulting siRNAs. This should minimize non-specific effects.
  • about miRNA
miRNAs are first transcribed as part of a primary microRNA (pri-miRNA). This is then processed by the Drosha with the help of Pasha/DGCR8 (=Microprocessor complex) into pre-miRNAs. The ~75nt pre-miRNA is then exported to the cytoplasm by exportin-5, where it is then diced into 21-23nt siRNA-like molecules by Dicer. In some cases, multiple miRNAs can be found on the pri-miRNA.
It's hard to know how long and what's the original form of pri-miRNA on genome, since it's not stable after transcribed. Could be very long, ~several kb, connected with many stem-loop structure. (Is it possible to contain intron, either? I guess it's possible. And also alternative transcription, why not?)
pri-miRNA then is Drosha-proceeded into pre-miRNA. pre-miRNA is ~60-100nt long, hairpin structure (only one loop?) . Then it's exported outside to cytoplasm, where they are digested into a sequence-specific single-strand mature miRNA by Dicer.
The mature miRNA then binds to a complex called RNA-Induced Silencing Complex(RISC). The RISC-bound miRNA then binds to specific mRNA by significantly but not completely complementary to the mRNA.
The ways that miRNA inhibits gene expression are different in plant and animal. In plant, the formation of the double-stranded RNA(dsRNA) through the binding of the miRNA triggers the degradation of the mRNA transcript through a process similar to RNA interference (RNAi); while in animal, it prevents translation without causing the mRNA to be degraded. Animal miRNAs are usually (partially) complementary to a site in the 3' UTR whereas plant miRNAs are usually (highly) complementary to coding regions (also found in 5' UTR and 3' UTR) of mRNAs.

Monday, July 30, 2007

ISMB2007 Summary

==============================
SIG-AS
==============================
1. Alan Zahler from UCSC
uses evolutionary conservation of sequences in introns flanking alternative spliced exons to identify splicing regulatory elements;
statistical analysis to find significant pentamer and hexamer
use microarray to detect out 400 high-confidence exon-skipping event (developmental stage-specific splicing regulation)
Browser: the Intronerator

2. Jamal Tazi from Montpellier II (France)
An interesting story.
use some small chemical molecules to target the splicing factor to correct the aberrant splicing caused by the mutation which could cause disease.
examples:
AS and Disease (ASF/SF2 on SR protein factor)
AS and development (eye/ey2a)

3. Uwe Ohler @ Duke University
transcript diversity on the 5' and 3' ends
for 5': use EST libraries, CAGE data?
for 3': use PolyA_DB

4. Mihaela Zavolan from Erik van Nimwegan's group
Title:Computational Evidence for the association btw transcription initiation and internal splicing
use FANTOM3 (mouse cDNA) and H-Invitational (human cDNA)

5. Erik @ Micro-SIG same time
Comparative genomic inference of bacterial regulatory systems
Interested point: develop a method to quantify selection at non-coding positions genome-wide from multi-alignments of clades of related bacterial genomes.
use simplified Halpern-Bruno model of site specific evolution (quick flash, not too much detail)
found: whereas silent sites evolve according to a neutral background model, intergenic regions show significant evidence of selection in all clades with consistently more selection upstream than downstream of gene.
strong avoidance of RNA 2nd structure in the region immediatelly around the translation start site (probably due to the selection for translation initiation efficiency)

==============================
Tutourial
==============================
1. Phylogenetic workflow using BioPerl by Jason
multiple alignment using bioperl/EnsEMBL; tree-construction using PhyLip; Molecular Evolution analysis using PAML/HyPHY; build gene family using MCL / OrthoMCL (for orthologous family); gene family size change (Computational Analysis of gene Family Evolution: CAFE)

2. Genome Browser and database by Peter Schattner
http://genome-test.cse.ucsc.edu

==============================
Main Meeting
==============================
1. RNA special session
Michael Zhang: Insulator (CTCF / BORIS)

Q: whether it's possible to observe the insulators like CTCF around breakpoint in GRB?
Two parts in GRB could be seperated due to the looser strength of pressure caused by the CTCF; We could really screen the evolution of GRB, to show the cases/events;

2. RNA keynote from John Mattick (?)
one of sharking(at least for me) points is "~98% transcripts output is non-coding RNA";
show lots of RNA papers, including some BIG or intertesting ones:
-Rapid evolution of noncoding RNA, K.C.Pang, 2006
-Widely distributed noncoding purifying selection in the human genome, PNAS, 2007 July, Saurabh Asthana... John A. Stamatoyannopoulos (@Washington University)
He also mentioned that 1300/1600 ncRNA expressed in brain, in some paper.

Q: whether to see ncRNA in GRB?
The hypothesis is that ncRNA overlapping(or anti-sensing) with the regulator gene makes the gene as a 'bystander' gene

3. Keynote from Michael Eisen (@Berkeley Lab)
For me, one important msg from his report is to use high level feature linked to function to re-define conservation, not just simply seqeunce similarity.
BTW, I just got to know that the director of Berkeley Lab is the famous Chinese scientist Steve Chu(朱棣文)

Friday, July 27, 2007

papers to read

  • Recent papers(June, July) about CNEs
1. Adaptive evolution of conserved non-coding elements in mammals. Su Yeon Kim, Jonathan K Pritchard. PLoS Genetics

Su
developed a statistical method called the 'shared rates test' (SRT) to identify CNCs that show significant variation in substitution rates across branches of a phylogenetic tree, and they applied the method on 98910 CNEs from Hs:Ch:Dog:Mm:Rat alignment. 68% of them are constrainedly evolved, while the rest (32%) show departure, including some fast evolving ones. The author claimed it as evidence of adaptive evolution in these CNEs.

2. Comprehensive characterization of the cis-regulatory code responsible for the spatio-temporal expression of olSix3.2 in the developing medaka forebrain. Ivan Conte and Paola Bovolenta from Spain, Genome Biology

Ivan investigated the CNEs around the Six3 gene from fish alignment, 10 CNEs blocks flanking 5' of the gene, with 2 enhancers (D, I ), 2 silencers(A, G) and 2 silencers blockers(E, H). They demonstrated that the entire expression of the newly identified olSix3.2 is orchestrated by the combinatorial use of seven different cis-regulatory modules that at least part of this regulation is conserved in the Six3 locus of vertebrates other than fishes.

I guess it's important to show the regulatory code in a spatio-temporal AND combinational way. As the paper said, “one limitation of previous studies that have used transgenic analysis to test the function of highly conserved non-coding sequences is the identification of single enhancers uprooted from possible interactions with the remaining regulatory elements“.

3. Statistical information characterization of conserved non-coding elements in vertebrates.
From Elger Greg's group.

Can not open the full text, just viewing from the abstract, no so much surprising result expected. I guess this paper could be categorized together with one of their previous papers at Trends Genetics: Striking nucleotide frequency pattern at the borders of highly conserved vertebrate non-coding sequences.

4. A large family of ancient repeat elements in the human genome is under strong selection. PNAS, 2006. Michael Kamal, Xiaohui Xie, Eric S. Lander (@ Harvard)

I guess the paper mainly offered two messages useful for me.
  1. The discovery that a large CNEs family fall into the MER121 repeat class (with 1/4 of 115 50-mer perfect conserved instances). And given the exceptional conservation properties of MER121, itis clear that it must have an important function that has beenunder purifying selection for 200 million years. That's the methodology how the title could be proved. This idea of observing purifying selection on ARs which is depositing in CNEs was applied/amplified by David Haussler (@ Stanford) and Gill Bejerano (@UCSC) in 2007. Their PNAS paper shows "thousands of human mobile element fragments undergo strong purifying selection near developmental genes".
  2. The other thing that I could learn from the paper is the method line to extract the Ancient Repeat sequences, or more generally speaking the Neutral Evolving sequences. For the first paper, they got the AR sequences with method in the mouse sequencing Nature paper (method). The 2nd one use a model of neutral evolution computed by PhyloP from 4-fold degenerate sites in the ENCODE regions. But I can not get the application of PhyloP (published on RECOMB 2006).
Information about RepeatSequence
  1. RepBase / Repeat Masking / Repeat Map @ http://www.girinst.org/ with username of xianjun
  2. Repeatmasker, a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences, @ http://repeatmasker.org
  3. A good article "DNA repeat sequence and disease" @ http://www.neuro.wustl.edu/NEUROMUSCULAR/mother/dnarep.htm
5. Widely distributed noncoding purifying selection in the human genome, PNAS, 2007 July, Saurabh Asthana... John A. Stamatoyannopoulos (@Washington University)

This paper supposed to answer the question that "to what extent noncoding sites outside of CNEs are functionally significant in modern humans", by using SNPs data and CNEs data. They conclude that the noncoding purifying selection pressure is more widely distributed in the genome, instead of concentrated in CNEs. From the following figure, we could see that most of the four-genome conserved bases(up to 96.5%) occur outside of CNEs.

The author validated this partition method(conserved or non-conserved) by testing the selection pressure in coding exon.
Then partition the regions into 3 parts: coding, non-CNE noncoding, CNE; and test the SNP diversity(allele frequency) difference btw groups. use subsample method to check the reliability. Additionally, they check that the selective effect was independent of CNEs definition, population demographic history, heterogeneity in mutation rate, local G+C content, 4GCBs density, and substitution type. (Very strong!!)
They then estimated the proportion of noncoding bases in the human genome under selection, by using a model named "infinite number of sites model" (ref. to two papers[1, 2] and a book). About neutral theory of molecular evolution, ref. to this wiki page. The result is : a minimum of 18.5% of nucleotide positions conserved across four genomes must be under pressure of negative selection. "Our results indicate that at a minimum 3.5-fold more noncoding nucleotides (2.8% of nucleotides) are under selection than estimates based on CNSs, and that 71.4% of positions under selection (2% of nucleotides) lie outside CNSs."

6. Purifying Selection Maintains Highly Conserved Noncoding Sequences in Drosophila, by Bergman CM. MBE 2007.

The paper use a model to test predictions of the mutational cold spot model of CNEs evolution in the genus Drosophila. Some models/data in this paper are similar as the above paper.
  • News from RNA world
1. The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures.

From Vienna University, seems related with one of the best posters. I am not sure they offer API for access.

2. Promoter-associated RNA is required for RNA-directed transcriptional gene silencing in human cells

I guess it's an important paper to understand the mechanism how siRNA silent the gene expression at the posttranscriptional stage in human cell. Similar as Yeast(?), the siRNA recognizes the promoter-associated RNAs transcribed through RNAPII promoters, these promoter RNAs function as a recognition motif to direct epigenetic silencing complexes to the corresponding targeted promoters to mediate transcriptional silencing in human cells.
  • Interesting story
1.Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function, PNAS, Nekrutenko A
a funny story to show the function of an overlap region for two proteins, possible intrinsic property of the dual-coding exon. Also, they mentioned 90 newly identified genes with similar dual-coding structure. It's interesting to see the common feature of these genes. ... which reminds me the similar cases I observed in GRB study.

There are some related papers about dual-coding genes from Nekrutenko's group.

2. Identification of a locus control region for quadruplicated green-sensitive opsin genes in zebrafish, PNAS

Shoji group presents a 0.5-kb region located 15 kb upstream of the RH2 gene array(RH2-1, RH2-2, RH2-3, and RH2-4) is an essential regulator for their expression. Lots of experimental data, but did not say much about the ortholog case, whether similar/different story happened in other fish/mammals. Today's Science Editors' Choice put this in the list.

3. Global analysis of patterns of gene expression during Drosophila embryogenesis
4. The new mutation theory of phenotypic evolution
5. Nucleosome positioning signals in genomic DNA
6. Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs)
7. Housekeeping genes tend to show reduced upstream sequence conservation