The ability to regulate gene expression is of central importance for the adaptability of living organisms to changes in their external and internal environment. At the transcriptional level, binding of transcription factors (TFs) in the promoter region can modulate the transcription rate, hence making TFs central players in gene regulation. For some model organisms, information about the locations and identities of discovered TF binding sites have been collected in continually updated databases, such as RegulonDB for the well-studied case of E. coli. In order to reveal the general principles behind the binding-site arrangement and function of these regulatory architectures we propose a random promoter architecture model that preserves the overall abundance of binding sites to identify overrepresented binding site configurations. This model is analogous to the random network model used in the study of genetic network motifs, where regulatory motifs are identified through their overrepresentation with respect to a ‘‘randomly connected’’ genetic network. Using our model we identify TF pairs which coregulate operons in an overrepresented fashion, or individual TFs which act at multiple binding sites per promoter by, for example, cooperative binding, DNA looping, or through multiple binding domains. We furthermore explore the relationship between promoter architecture and gene expression, using three different genome-wide protein copy number censuses. Perhaps surprisingly, we find no systematic correlation between the number of activator and repressor binding sites regulating a gene and the level of gene expression. A position-weight-matrix model used to estimate the binding affinity of RNA polymerase (RNAP) to the promoters of activated and repressed genes suggests that this lack of correlation might in part be due to differences in basal transcription levels, with repressed genes having a higher basal activity level. This quantitative catalogue relating promoter architecture and function provides a first step towards genome-wide predictive models of regulatory function.
The influence of promoter architectures and regulatory motifs on gene expression in Escherichia coli