Wednesday, October 10, 2007

MiRNA learning note (1)

MicroRNA-143 and -145 in colon cancer. [DNA Cell Biol. 2007]
MicroRNAs (miRNAs) are endogenous, small non-coding RNAs (20-22 nucleotides) that negatively regulate gene expression at the translational level by base pairing to the 3' untranslated region of target messenger RNAs.
"It is predicted that 30% of protein-encoding genes are regulated by miRNAs."

Principles of microRNA regulation of a human cellular signaling network : Article : Molecular Systems Biology: "By analyzing the interactions between miRNAs and a human cellular signaling network, we found that miRNAs predominantly target positive regulatory motifs, highly connected scaffolds and most downstream network components such as signaling transcription factors, but less frequently target negative regulatory motifs, common components of basic cellular machines and most upstream network components such as ligands."

Global analysis of microRNA target gene expression reveals that miRNA targets are lower expressed in mature mouse and Drosophila tissues than in the embryos -- Yu et al. 35 (1): 152 -- Nucle: "We found that the expression levels of miRNA targets are lower in all mouse and Drosophila tissues than in the embryos. We also found miRNAs more preferentially target ubiquitously expressed genes than tissue-specifically expressed genes. These results support the current suggestion that miRNAs are likely to be largely involved in embryo development and maintaining of tissue identity."

NB: This kind of expression survey at different ontogenetic stages is very important, because it covers a blind spot in analyses that depend on functional categories. For example GO analyses include categories for "development", but as Yu and colleagues point out, many genes change in expression during development that are not part of the "developmental" categories. (from John Hawks's weblog)

Identification of specific sequence motifs in the ...[Comput Biol Chem. 2007] - PubMed Result: "The significantly reduced frequency of occurrence of all 20 motifs in the regions 2000 bp upstream of 23,570 human RefSeq genes demonstrated that these motifs were specific to the upstream miRNA sequences. The most frequently observed motif M1 (GTGCTTMTAGTGCAG), with a MEME E-value of 3.8e-57 was distributed within 500 bp upstream of stem-loop sequences and was also miRNA-specific."

Regulatory circuit of human microRNA biogenesis. [PLoS Comput Biol. 2007] - PubMed Result: "Newly identified regulatory motifs occur frequently and in multiple copies upstream of miRNAs. The motifs are highly enriched in G and C nucleotides, in comparison with the nucleotide composition of miRNA upstream sequences. Although the motifs were predicted using sequences that are upstream of miRNAs, we find that 99% of the top-predicted motifs preferentially occur within the first 500 nucleotides upstream of the transcription start sites of protein-coding genes; the observed preference in location underscores the validity and importance of the motifs identified in this study. Our study also raises the possibility that a considerable number of well-characterized, disease-associated transcription factors (TFs) of protein-coding genes contribute to the abnormal miRNA expression in diseases such as cancer."

"Further analysis of predicted miRNA-protein interactions lead us to hypothesize that TFs that include c-Myb, NF-Y, Sp-1, MTF-1, and AP-2alpha are master-regulators of miRNA expression."

Spatial regulation of microRNA gene expression in the Drosophila embryo: "we investigate the possibility that localized expression is mediated by tissue-specific enhancers, comparable to those seen for protein-coding genes."

mir-309–6 polycistron (8-miR) : An 800-bp 5′ enhancer was identified that recapitulates this complex pattern when attached to a RNA polymerase II core promoter fused to a lacZ-reporter gene.

mir-1 gene: a mesoderm-specific enhancer located ≈5 kb 5′ of the miR-1 transcription unit.

Evidence is presented that the 8-miR enhancer is regulated by the localized Huckebein repressor, whereas miR-1 is activated by Dorsal and Twist. These results provide evidence that restricted activities of the 8-miR and miR-1 miRNAs are mediated by classical tissue-specific enhancers.

Monday, October 8, 2007

Purifying Selection Maintains Highly Conserved Noncoding Sequences in Drosophila -- Casillas et al. 24 (10): 2222 -- Molecular Biology and Evolution

Purifying Selection Maintains Highly Conserved Noncoding Sequences in Drosophila -- Casillas et al. 24 (10): 2222 -- Molecular Biology and Evolution: "We find that point mutations in intronic and intergenic CNSs exhibit a significant reduction in levels of divergence relative to levels of polymorphism, as well as a significant excess of rare derived alleles, compared with either the nonconserved spacer regions between CNSs or with 4-fold silent sites in coding regions"

TOADD: more about the methods.

//Long time, no reading~

Tuesday, September 11, 2007

assembly error or additional rearrangement?

1. Rearrangement Rate following the Whole-Genome Duplication in Teleosts -- Sémon and Wolfe 24 (3): 860 -- Molecular Biology and Evolution: "Rearrangement Rate following the Whole-Genome Duplication in Teleosts"

check out to see whether there is higher RR in zebrafish, theoretically.

2. ScienceDirect - Genomics : Phylogenetic analysis of three complete gap junction gene families reveals lineage-specific duplications and highly supported gene classes: "Note that in one of the zebrafish clusters the orientation of the two genes is inverted (cx41.8, cx44.1), suggesting an additional chromosomal rearrangement on the zebrafish chromosome."



3. Ancient duplicated conserved noncoding elements in vertebrates: A genomic and functional analysis -- McEwen et al. 16 (4): 451 -- Genome Research: "In only two cases were all dCNE family members found to be located in the introns of paralogous genes (NBEA and LRBA) that were not the likely target genes. In these two specific cases the predicted target genes, MAB21L1 and MAB21L2, are also located in introns of NBEA and LRBA, respectively."

very interesting and important case! New paper in PLoS One, "Ancient Origin of the New Developmental Superfamily DANGER" also mentioned the Mab21L1/2. Check out!

Sunday, September 9, 2007

papers about fish evolution

1. The Evolutionary Fate and Consequences of Duplicate Genes, Science 10 November 2000:
http://www.sciencemag.org/cgi/content/full/290/5494/1151

2. Genome evolution and biodiversity in teleost fish, Heredity (2005) 94, 280–294.
http://www.nature.com/hdy/journal/v94/n3/full/6800635a.html

a review from J-N Volff

3. Functional Divergence of Two Zebrafish Midkine Growth Factors Following Fish-Specific Gene Duplication, Genome Res. 13:1067-1081, 2003
http://www.genome.org/cgi/content/full/13/6a/1067

Instance about "mdka and mdkb underwent functional divergence after duplication".

4. Comparative genomics of ParaHox clusters of teleost fishes: gene cluster breakup and the retention of gene sets following whole genome duplications, BMC Genomics 2007, 8:312
http://www.biomedcentral.com/1471-2164/8/312

new paper from Axel Meyer's group. Could be kind of explanation to GRB evolution.

Tuesday, August 7, 2007

new papers summary

New papers to read from the last week

1. Non-coding RNAs in Ciona intestinalis.
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii77-8.
PMID: 16204130 [PubMed - in process]

2. Into the heart of darkness: large-scale clustering of human non-coding DNA.
Gill Bejerano1, David Haussler and Mathieu Blanchette
Bioinformatics.
2004 Aug 4;20 Suppl 1:i40-8.
PMID: 15262779 [PubMed - in process]

* I don't know why myNCBI sent me this paper until now. It's a paper in 2004. My Godsh~ But, it's def a good paper, which is obvious from the author list.

3. Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy.
Bioinformatics. 2004 Aug 4;20 Suppl 1:i334-41.
PMID: 15262817 [PubMed - in process]


4.
Regulation of the Gene Encoding GPR40, a Fatty Acid Receptor Expressed Selectively in Pancreatic beta Cells.
JBC
could be a story about subfunctionality by tissue/cell.

5. Ultraconserved non-coding sequence element controls a subset of spatiotemporal GLI3 expression.
Dev Growth Differ. 2007 Aug;49(6):543-53.
PMID: 17661744 [PubMed - in process]

6.
Dissecting the action of an evolutionary conserved non-coding region on renin promoter activity.
Nucleic Acids Res. 2007 Jul 26; [Epub ahead of print]
PMID: 17660193 [PubMed - as supplied by publisher]

Tuesday, July 31, 2007

learning note about ncRNA

  • siRNA and miRNA
Both are kinds of ncRNA(non-coding RNA);
Both are small, and short, ~20nt in length;
Both are involved in the RNA interference (RNAi) pathway where the siRNA interferes with the expression of a specific gene;
Difference in structure:
siRNA is a short (usually 21-nt) double-strand of RNA (dsRNA) with 2-nt 3' overhangs on either end.
while, miRNA is single-stranded RNA molecules of about 21-23 nucleotides in length with step-loop secondary structure.

Another essential differcence I guess is siRNA is exogenous, while miRNA is endogenous. So, when people use this techonology(I think siRNA is just kind of technology, not RNA in cell naturally like miRNA) to do RNAi, two issues (innate immunity, and off-targeting) are chanllenging us.
Direct transfection of an exogenous siRNA can be problematic, since the gene knockdown effect is only transient, particularly in rapidly dividing cells. One way of overcoming this challenge is to modify the siRNA in such a way as to allow it to be expressed by an appropriate vector, e.g. a plasmid. This is done by the introduction of a loop between the two strands (like a miRNA in form), thus producing a single transcript, which can be processed into a functional siRNA. People also use this way to avoid the non-specific effects by siRNA; converting siRNA into a miRNA. MicroRNAs occur naturally, and by harnessing this endogenous pathway it should be possible to achieve similar gene knockdown at comparatively low concentrations of resulting siRNAs. This should minimize non-specific effects.
  • about miRNA
miRNAs are first transcribed as part of a primary microRNA (pri-miRNA). This is then processed by the Drosha with the help of Pasha/DGCR8 (=Microprocessor complex) into pre-miRNAs. The ~75nt pre-miRNA is then exported to the cytoplasm by exportin-5, where it is then diced into 21-23nt siRNA-like molecules by Dicer. In some cases, multiple miRNAs can be found on the pri-miRNA.
It's hard to know how long and what's the original form of pri-miRNA on genome, since it's not stable after transcribed. Could be very long, ~several kb, connected with many stem-loop structure. (Is it possible to contain intron, either? I guess it's possible. And also alternative transcription, why not?)
pri-miRNA then is Drosha-proceeded into pre-miRNA. pre-miRNA is ~60-100nt long, hairpin structure (only one loop?) . Then it's exported outside to cytoplasm, where they are digested into a sequence-specific single-strand mature miRNA by Dicer.
The mature miRNA then binds to a complex called RNA-Induced Silencing Complex(RISC). The RISC-bound miRNA then binds to specific mRNA by significantly but not completely complementary to the mRNA.
The ways that miRNA inhibits gene expression are different in plant and animal. In plant, the formation of the double-stranded RNA(dsRNA) through the binding of the miRNA triggers the degradation of the mRNA transcript through a process similar to RNA interference (RNAi); while in animal, it prevents translation without causing the mRNA to be degraded. Animal miRNAs are usually (partially) complementary to a site in the 3' UTR whereas plant miRNAs are usually (highly) complementary to coding regions (also found in 5' UTR and 3' UTR) of mRNAs.

Monday, July 30, 2007

ISMB2007 Summary

==============================
SIG-AS
==============================
1. Alan Zahler from UCSC
uses evolutionary conservation of sequences in introns flanking alternative spliced exons to identify splicing regulatory elements;
statistical analysis to find significant pentamer and hexamer
use microarray to detect out 400 high-confidence exon-skipping event (developmental stage-specific splicing regulation)
Browser: the Intronerator

2. Jamal Tazi from Montpellier II (France)
An interesting story.
use some small chemical molecules to target the splicing factor to correct the aberrant splicing caused by the mutation which could cause disease.
examples:
AS and Disease (ASF/SF2 on SR protein factor)
AS and development (eye/ey2a)

3. Uwe Ohler @ Duke University
transcript diversity on the 5' and 3' ends
for 5': use EST libraries, CAGE data?
for 3': use PolyA_DB

4. Mihaela Zavolan from Erik van Nimwegan's group
Title:Computational Evidence for the association btw transcription initiation and internal splicing
use FANTOM3 (mouse cDNA) and H-Invitational (human cDNA)

5. Erik @ Micro-SIG same time
Comparative genomic inference of bacterial regulatory systems
Interested point: develop a method to quantify selection at non-coding positions genome-wide from multi-alignments of clades of related bacterial genomes.
use simplified Halpern-Bruno model of site specific evolution (quick flash, not too much detail)
found: whereas silent sites evolve according to a neutral background model, intergenic regions show significant evidence of selection in all clades with consistently more selection upstream than downstream of gene.
strong avoidance of RNA 2nd structure in the region immediatelly around the translation start site (probably due to the selection for translation initiation efficiency)

==============================
Tutourial
==============================
1. Phylogenetic workflow using BioPerl by Jason
multiple alignment using bioperl/EnsEMBL; tree-construction using PhyLip; Molecular Evolution analysis using PAML/HyPHY; build gene family using MCL / OrthoMCL (for orthologous family); gene family size change (Computational Analysis of gene Family Evolution: CAFE)

2. Genome Browser and database by Peter Schattner
http://genome-test.cse.ucsc.edu

==============================
Main Meeting
==============================
1. RNA special session
Michael Zhang: Insulator (CTCF / BORIS)

Q: whether it's possible to observe the insulators like CTCF around breakpoint in GRB?
Two parts in GRB could be seperated due to the looser strength of pressure caused by the CTCF; We could really screen the evolution of GRB, to show the cases/events;

2. RNA keynote from John Mattick (?)
one of sharking(at least for me) points is "~98% transcripts output is non-coding RNA";
show lots of RNA papers, including some BIG or intertesting ones:
-Rapid evolution of noncoding RNA, K.C.Pang, 2006
-Widely distributed noncoding purifying selection in the human genome, PNAS, 2007 July, Saurabh Asthana... John A. Stamatoyannopoulos (@Washington University)
He also mentioned that 1300/1600 ncRNA expressed in brain, in some paper.

Q: whether to see ncRNA in GRB?
The hypothesis is that ncRNA overlapping(or anti-sensing) with the regulator gene makes the gene as a 'bystander' gene

3. Keynote from Michael Eisen (@Berkeley Lab)
For me, one important msg from his report is to use high level feature linked to function to re-define conservation, not just simply seqeunce similarity.
BTW, I just got to know that the director of Berkeley Lab is the famous Chinese scientist Steve Chu(朱棣文)

Friday, July 27, 2007

papers to read

  • Recent papers(June, July) about CNEs
1. Adaptive evolution of conserved non-coding elements in mammals. Su Yeon Kim, Jonathan K Pritchard. PLoS Genetics

Su
developed a statistical method called the 'shared rates test' (SRT) to identify CNCs that show significant variation in substitution rates across branches of a phylogenetic tree, and they applied the method on 98910 CNEs from Hs:Ch:Dog:Mm:Rat alignment. 68% of them are constrainedly evolved, while the rest (32%) show departure, including some fast evolving ones. The author claimed it as evidence of adaptive evolution in these CNEs.

2. Comprehensive characterization of the cis-regulatory code responsible for the spatio-temporal expression of olSix3.2 in the developing medaka forebrain. Ivan Conte and Paola Bovolenta from Spain, Genome Biology

Ivan investigated the CNEs around the Six3 gene from fish alignment, 10 CNEs blocks flanking 5' of the gene, with 2 enhancers (D, I ), 2 silencers(A, G) and 2 silencers blockers(E, H). They demonstrated that the entire expression of the newly identified olSix3.2 is orchestrated by the combinatorial use of seven different cis-regulatory modules that at least part of this regulation is conserved in the Six3 locus of vertebrates other than fishes.

I guess it's important to show the regulatory code in a spatio-temporal AND combinational way. As the paper said, “one limitation of previous studies that have used transgenic analysis to test the function of highly conserved non-coding sequences is the identification of single enhancers uprooted from possible interactions with the remaining regulatory elements“.

3. Statistical information characterization of conserved non-coding elements in vertebrates.
From Elger Greg's group.

Can not open the full text, just viewing from the abstract, no so much surprising result expected. I guess this paper could be categorized together with one of their previous papers at Trends Genetics: Striking nucleotide frequency pattern at the borders of highly conserved vertebrate non-coding sequences.

4. A large family of ancient repeat elements in the human genome is under strong selection. PNAS, 2006. Michael Kamal, Xiaohui Xie, Eric S. Lander (@ Harvard)

I guess the paper mainly offered two messages useful for me.
  1. The discovery that a large CNEs family fall into the MER121 repeat class (with 1/4 of 115 50-mer perfect conserved instances). And given the exceptional conservation properties of MER121, itis clear that it must have an important function that has beenunder purifying selection for 200 million years. That's the methodology how the title could be proved. This idea of observing purifying selection on ARs which is depositing in CNEs was applied/amplified by David Haussler (@ Stanford) and Gill Bejerano (@UCSC) in 2007. Their PNAS paper shows "thousands of human mobile element fragments undergo strong purifying selection near developmental genes".
  2. The other thing that I could learn from the paper is the method line to extract the Ancient Repeat sequences, or more generally speaking the Neutral Evolving sequences. For the first paper, they got the AR sequences with method in the mouse sequencing Nature paper (method). The 2nd one use a model of neutral evolution computed by PhyloP from 4-fold degenerate sites in the ENCODE regions. But I can not get the application of PhyloP (published on RECOMB 2006).
Information about RepeatSequence
  1. RepBase / Repeat Masking / Repeat Map @ http://www.girinst.org/ with username of xianjun
  2. Repeatmasker, a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences, @ http://repeatmasker.org
  3. A good article "DNA repeat sequence and disease" @ http://www.neuro.wustl.edu/NEUROMUSCULAR/mother/dnarep.htm
5. Widely distributed noncoding purifying selection in the human genome, PNAS, 2007 July, Saurabh Asthana... John A. Stamatoyannopoulos (@Washington University)

This paper supposed to answer the question that "to what extent noncoding sites outside of CNEs are functionally significant in modern humans", by using SNPs data and CNEs data. They conclude that the noncoding purifying selection pressure is more widely distributed in the genome, instead of concentrated in CNEs. From the following figure, we could see that most of the four-genome conserved bases(up to 96.5%) occur outside of CNEs.

The author validated this partition method(conserved or non-conserved) by testing the selection pressure in coding exon.
Then partition the regions into 3 parts: coding, non-CNE noncoding, CNE; and test the SNP diversity(allele frequency) difference btw groups. use subsample method to check the reliability. Additionally, they check that the selective effect was independent of CNEs definition, population demographic history, heterogeneity in mutation rate, local G+C content, 4GCBs density, and substitution type. (Very strong!!)
They then estimated the proportion of noncoding bases in the human genome under selection, by using a model named "infinite number of sites model" (ref. to two papers[1, 2] and a book). About neutral theory of molecular evolution, ref. to this wiki page. The result is : a minimum of 18.5% of nucleotide positions conserved across four genomes must be under pressure of negative selection. "Our results indicate that at a minimum 3.5-fold more noncoding nucleotides (2.8% of nucleotides) are under selection than estimates based on CNSs, and that 71.4% of positions under selection (2% of nucleotides) lie outside CNSs."

6. Purifying Selection Maintains Highly Conserved Noncoding Sequences in Drosophila, by Bergman CM. MBE 2007.

The paper use a model to test predictions of the mutational cold spot model of CNEs evolution in the genus Drosophila. Some models/data in this paper are similar as the above paper.
  • News from RNA world
1. The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures.

From Vienna University, seems related with one of the best posters. I am not sure they offer API for access.

2. Promoter-associated RNA is required for RNA-directed transcriptional gene silencing in human cells

I guess it's an important paper to understand the mechanism how siRNA silent the gene expression at the posttranscriptional stage in human cell. Similar as Yeast(?), the siRNA recognizes the promoter-associated RNAs transcribed through RNAPII promoters, these promoter RNAs function as a recognition motif to direct epigenetic silencing complexes to the corresponding targeted promoters to mediate transcriptional silencing in human cells.
  • Interesting story
1.Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function, PNAS, Nekrutenko A
a funny story to show the function of an overlap region for two proteins, possible intrinsic property of the dual-coding exon. Also, they mentioned 90 newly identified genes with similar dual-coding structure. It's interesting to see the common feature of these genes. ... which reminds me the similar cases I observed in GRB study.

There are some related papers about dual-coding genes from Nekrutenko's group.

2. Identification of a locus control region for quadruplicated green-sensitive opsin genes in zebrafish, PNAS

Shoji group presents a 0.5-kb region located 15 kb upstream of the RH2 gene array(RH2-1, RH2-2, RH2-3, and RH2-4) is an essential regulator for their expression. Lots of experimental data, but did not say much about the ortholog case, whether similar/different story happened in other fish/mammals. Today's Science Editors' Choice put this in the list.

3. Global analysis of patterns of gene expression during Drosophila embryogenesis
4. The new mutation theory of phenotypic evolution
5. Nucleosome positioning signals in genomic DNA
6. Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs)
7. Housekeeping genes tend to show reduced upstream sequence conservation

Thursday, June 14, 2007

Functional diversification of shh paralog enhancers

Genome Biology | Abstract | gb-2007-8-6-r106 | Functional diversification of sonic hedgehog paralog enhancers identified by phylogenomic reconstruction: "We demonstrate that the sonic hedgehog a (shha) paralogs sonic hedgehog b (tiggy winkle hedgehog; shhb) genes of fishes have a modified ar-C enhancer which specifies a diverged function at the embryonic midline. We have identified several conserved motifs indicative of putative transcription factor binding sites by a local alignment of ar-C enhancers of numerous vertebrate sequences. To trace the evolutionary changes among paralog enhancers, phylogenomic reconstruction was carried out and lineage-specific motif changes were identified. The relevance of the motif composition to observed developmental differences was studied through transgenic functional analyses. Altering and exchanging motifs between paralog enhancers resulted in the reversal of enhancer specificity in the floor plate and notochord. A model reconstructing enhancer divergence during vertebrate evolution was developed."
Comments (by sterding):
1. Enhancers ar-C in two paralogs of zebrafish shh gene are diverse in sequence conservation (fig.1) and in functionality (fig. 3); ar-C in shha enhances the expression of reporter gene in notochord, while reporter with ar-C in sshb shows expression in floor plate, additionally.

2. Local alignment of ar-C shows that 4 conserved motifs(C1,C2,C3,C4) in shha, while only 2 of them(C1,C3) present in sshb branch (fig. 4).

3. Experiment with mutation in specific motif shows that C1 are critical for notochord specificity, C3 is not so important from current result; C2 and C4 are floor plate repressor (fig. 5,6).

Thursday, May 24, 2007

diversity of Pcdh genes among teleosts

  • In mammals, the protocadherins are encoded by three closely-linked clusters (α, β and γ) of tandem genes and are hypothesized to provide a molecular code for specifying the remarkably-diverse neural connections in the central nervous system.
  • Like mammals, the coelacanth, a lobe-finned fish, contains a single protocadherin locus, also arranged into α, β and γ clusters.
  • Zebrafish, however, possesses two protocadherin loci that contain more than twice the number of genes as the coelacanth, but arranged only into α and γ clusters.
  • Fugu contains two unlinked protocadherin loci, Pcdh1 and Pcdh2, that collectively consist of at least 77 genes. The fugu Pcdh1 locus has been subject to extensive degeneration, resulting in the complete loss of Pcdh1γ cluster.The fugu Pcdh genes have undergone lineage-specific regional gene conversion processes that have resulted in a remarkable regional sequence homogenization among paralogs in the same subcluster.
  • Besides the 'fish-specific' whole genome duplication, the evolution of protocadherin genes in teleost fishes is influenced by lineage-specific gene losses, tandem gene duplications and regional sequence homogenization.
Comparison of the fugu (FrPcdh1 and FrPcdh2), zebrafish (DrPcdh1 and DrPcdh2)and coelacanth (LmPcdh) protocadherin clusters. Variable exons in each paralog group are shown in different colors. Orthologs between fugu and zebrafish as well as the inter-locus paralogs between the two Pcdh loci in fugu or zebrafish are shown in the same colors. 'Teleost Pcdh1' and 'Teleost Pcdh2' are the Pcdh clusters predicted in the common ancestor of fugu and zebrafish, and 'Fish Pcdh ancestor' is the single Pcdh cluster predicted in the ray-finned fish prior to the 'fish-specific' whole genome duplication. The corresponding exons in the 'Fish Pcdh ancestor' and the inter-locus paralogs between 'Teleost Pcdh1' and 'Teleost Pcdh2' are shown in the same color except the 'αIV', which represents a common ancestor for fugu FrPcdh2α8–25 and zebrafish DrPcdh2α8–25. Among the exons predicted in the 'Fish Pcdh ancestor', those present in the Pcdh loci of both fugu and zebrafish are labeled with an asterisk.

a -22k CNE Controls Ifng Gene Expression by T Cells and NK Cells

"Chromatin dynamics that regulate Ifng gene expression are incompletely understood. By using cross-species comparative sequence analyses, we have identified conserved noncoding sequences (CNSs) upstream of the Ifng gene, one of which, located −22 kb from the transcriptional start site, contains clustered consensus binding sequences of transcription factors that function in T cell differentiation. CNS−22 was uniquely associated with histone modifications typical of accessible chromatin in both T helper 1 (Th1) and Th2 cells and demonstrated significant and selective T-bet (T-box transcription factor expressed in T cells, Tbx21)-dependent binding and enhancer activity in Th1 cells. Deletion of CNS−22 in the context of an Ifng reporter transgene ablated T cell receptor-dependent and -independent Ifng expression in Th1 effectors and similarly blocked expression by cytotoxic T lymphocytes and natural killer cells. Thus, a single distal element may be essential for Ifng gene expression by both innate and adaptive immune effector cell lineages."

Molecular biology: RNA in control

Cheah et al.[Nature] show that expression of the NMT1 gene is regulated at the level of pre-mRNA alternative splicing by a riboswitch that binds to thiamine pyrophosphate (TPP). a, At low concentrations of TPP, the TPP-binding (aptamer) region of the riboswitch base-pairs with sequences surrounding a splice site (red blocking line) in a nearby non-coding sequence, and prevents its selection by the splicing machinery. A distal splice site (green arrow) is selected, however, resulting in the generation of a shorter NMT1 mRNA with a coding sequence, or open reading frame (ORF), that translates into a functional NMT1-encoded protein (green signal). b, At high TPP levels, the aptamer undergoes a conformational rearrangement so that the region that was previously bound to the nearby splice site is now used to bind to TPP. This and other conformational changes (not shown) generate a longer mRNA splice variant that contains short, 'decoy' ORFs (red signal), preventing functional NMT1 expression.

Wednesday, May 23, 2007

Evolution of hydra, a recently evolved testis-expressed gene with nine alternative first exons in Drosophila melanogaster

"We describe here the Drosophila gene hydra that appears to have originated de novo in the melanogaster subgroup and subsequently evolved in both structure and expression level in Drosophila melanogaster and its sibling species. D. melanogasterhydra encodes a predicted protein of ~300 amino acids with no apparent similarity to any previously known proteins. The syntenic region flanking hydra on both sides is found in both D. ananassae and D. pseudoobscura, but hydra is found only in melanogaster subgroup species, suggesting that it originated less than ~13 million years ago. Exon 1 of hydra has undergone recurrent duplications, leading to the formation of nine tandem alternative exon 1s in D. melanogaster. Seven of these alternative exons are flanked on their 3' side by the transposon DINE-1 (Drosophila Interspersed Element-1). We demonstrate that at least four of the nine duplicated exon 1s can function as alternative transcription start sites. The entire hydra locus has also duplicated in D. simulans and D. sechellia. D. melanogasterhydra is expressed most intensely in the proximal testis, suggesting a role in late-stage spermatogenesis. The coding region of hydra has a relatively high Ka/Ks ratio between species but the ratio is less than one in all comparisons, suggesting that hydra is subject to functional constraint. Analysis of sequence polymorphism and divergence of hydra shows that it has evolved under positive selection in the lineage leading to D. melanogaster. The dramatic structural changes surrounding the first exons do not affect the tissue specificity of gene expression: hydra is expressed predominantly in the testes in D. melanogaster, D. simulans and D. yakuba. However, we have found that expression level changed dramatically (~>20 fold) between D. melanogaster and D. simulans. While hydra initially evolved in the absence of nearby transposable element insertions, we suggest that the subsequent accumulation of repetitive sequences in the hydra region may have contributed to structural and expression-level evolution by inducing rearrangements and causing local heterochromatinization. Our analysis further shows that recurrent evolution of both gene structure and expression level may be characteristics of newly evolved genes. We also suggest that late-stage spermatogenesis is the functional target for newly evolved and rapidly evolving male-specific genes."

use transition/transversion (κ) ratio test to detect functional polypeptide

"To confirm our results using an independent nucleotide-based approach (as opposed to the codon-based test described earlier), we applied the transition/transversion (κ) ratio test to make inferences about biological significance of ARFs. The test is based on the following reasoning: in most standard protein-coding regions (with only one reading frame), κ at the third codon position (κ3) is significantly different (higher) than at the first and second codon positions (κ12), so that κ12 < κ3 [15]. This is because most substitutions at the third codon position are synonymous, whereas in the first codon position all but eight substitutions are nonsynonymous, and all substitutions in the second codon position are nonsynonymous. By contrast, in overlapping reading frames, codon positions are codependent. For example, in a +1 ARF, the third codon positions correspond to the first codon positions of the canonical frame. Thus, almost every change in the third codon position of the ARF is guaranteed to change amino acids encoded in the canonical frame. However, if the ARF encodes a truly functional product, purifying selection would resist such changes, and the condition κ12 < κ3 would not hold. This gives us the opportunity to test functionality of ARF in our dataset by contrasting two hypotheses: H0: κ12 = κ3 (ARF does encode functional polypeptide) and HA: κ12 < κ3 (ARF does not encode functional polypeptide). To perform this test, we used a maximum likelihood framework to test κ12 and κ3 for equality [16]. Application of the test to our list of dual-coding genes identified 18 candidates" PLoS Computational Biology - A First Look at ARFome: Dual-Coding Genes in Mammalian Genomes

300bp intronic regulatory element in hand gene

"Herein we describe the identification of a regulatory region in the hand gene essential and sufficient for the expression in the visceral mesoderm during embryogenesis. We found that hand expression in the circular visceral mesoderm is abolished in embryos mutant for the FoxF domain containing transcription factor Biniou. Furthermore we demonstrate that Biniou regulates hand expression by direct binding to a 300 bp sequence element, located within the 3rd intron of the hand gene. This regulatory element is highly conserved in different Drosophila species. In addition, we provide evidence that Hand is dispensable for the initial differentiation of the embryonic visceral mesoderm." (Hand is a direct target of the forkhead transcription factor Biniou during Drosophila visceral mesoderm differentiation.)

Regulatory conservation of protein coding and miRNA genes in vertebrates: lessons from the opossum genom

"Analysis of 145 intergenic microRNA and all protein coding genes revealed that the upstream sequences of the former are up to twice as conserved as the latter amongst mammals, except in the first 500 bp where the conservation is similar. Comparison of the promoter conservation in 513 protein coding genes and related transcription factor binding sites (TFBSs) showed that 41% of the known human TFBSs are located in the 6.7% of promoter regions that are human-opossum conserved. Some core biological processes showed significantly smaller number of conserved TFBSs in human-opossum comparisons, suggesting greater functional divergence. A new measure of efficiency in multi-genome phylogenetic footprinting (BRPR) shows that including human-opossum conservation increases the specificity in finding human TFBSs."

Friday, May 18, 2007

The first marsupial genome sequence [Nature Rev Gen]

"One surprising finding from comparing the opossum and human genomes is that most sequence innovation in the human genome following the eutherian split from the metatherian lineage has occurred in non-coding sequences (20% being lineage-specific in eutherians), rather than in coding sequences (only 1% are absent in metatherians). Many of these non-coding sequences are in regions surrounding important developmental genes, indicating that they are functional regulatory elements. The authors found a high degree of overlap between eutherian-specific sequences and transposable elements (16%), which might have served as a driving force in the evolution of the eutherian genome." The first marsupial genome sequence : Article : Nature Reviews Genetics

Open my new Blog

Only for Research!