Wittkopp, P. J. & Kalay, G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2011).
Hill, M. S., Vande Zande, P. & Wittkopp, P. J. Molecular and evolutionary processes generating variation in gene expression. Nat. Rev. Genet. 22, 203–215 (2021).
Fuqua, T. et al. Dense and pleiotropic regulatory information in a developmental enhancer. Nature 587, 235–239 (2020).
de Visser, J. A. G. M. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genet. 15, 480–490 (2014).
Kondrashov, D. A. & Kondrashov, F. A. Topological features of rugged fitness landscapes in sequence space. Trends Genet. 31, 24–33 (2015).
de Visser, J. A. G. M., Elena, S. F., Fragata, I. & Matuszewski, S. The utility of fitness landscapes and big data for predicting evolution. Heredity 121, 401–405 (2018).
Weirauch, M. T. & Hughes, T. R. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet. 26, 66–74 (2010).
Orr, H. A. The genetic theory of adaptation: a brief history. Nat. Rev. Genet. 6, 119–127 (2005).
Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. 23, 700–707 (2013).
Venkataram, S. et al. Development of a comprehensive genotype-to-fitness map of adaptation-driving mutations in yeast. Cell 166, 1585–1596 (2016).
Keren, L. et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell 166, 1282–1294 (2016).
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).
Ogden, P. J., Kelsic, E. D., Sinai, S. & Church, G. M. Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science 366, 1139–1143 (2019).
Pitt, J. N. & Ferré-D’Amaré, A. R. Rapid construction of empirical RNA fitness landscapes. Science 330, 376–379 (2010).
Shultzaberger, R. K., Malashock, D. S., Kirsch, J. F. & Eisen, M. B. The fitness landscapes of cis-acting binding sites in different promoter and environmental contexts. PLoS Genet. 6, e1001042 (2010).
Mustonen, V., Kinney, J., Callan, C. G. & Lässig, M. Energy-dependent fitness: a quantitative model for the evolution of yeast transcription factor binding sites. Proc. Natl Acad. Sci. USA 105, 12376–12381 (2008).
Hartl, D. L. What can we learn from fitness landscapes? Curr. Opin. Microbiol. 0, 51–57 (2014).
Otwinowski, J. & Nemenman, I. Genotype to phenotype mapping and the fitness landscape of the E. coli lac promoter. PLoS ONE 8, e61570 (2013).
Sinai, S. & Kelsic, E. D. A primer on model-guided exploration of fitness landscapes for biological sequence design. Preprint at https://arxiv.org/abs/2010.10614 (2020).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. Proc. 34th International Conference on Machine Learning 3145–3153 (2017).
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Fragata, I., Blanckaert, A., Louro, M. A. D., Liberles, D. A. & Bank, C. Evolution in the light of fitness landscape theory. Trends Ecol. Evol. 34, 69–82 (2019).
Payne, J. L. & Wagner, A. The causes of evolvability and their evolution. Nat. Rev. Genet. 20, 24–38 (2019).
de Boer, C. G. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 38, 56–65 (2020).
Crocker, J. et al. Low affinity binding site clusters confer hox specificity and regulatory robustness. Cell 160, 191–203 (2015).
Habib, N., Wapinski, I., Margalit, H., Regev, A. & Friedman, N. A functional selection model explains evolutionary robustness despite plasticity in regulatory networks. Mol. Syst. Biol. 8, 619 (2012).
Gillespie, J. H. Molecular evolution over the mutational landscape. Evolution 38, 1116–1129 (1984).
Jerison, E. R. & Desai, M. M. Genomic investigations of evolutionary dynamics and epistasis in microbial evolution experiments. Curr. Opin. Genet. Dev. 35, 33–39 (2015).
Sæther, B.-E. & Engen, S. The concept of fitness in fluctuating environments. Trends Ecol. Evol. 30, 273–281 (2015).
Vaswani, A. et al. in Advances in Neural Information Processing Systems 30 (eds. Guyon, I. et al.) 5998–6008 (Curran Associates, 2017).
Weirauch, M. T. et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126–134 (2013).
Yang, N. & Bielawski, N. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15, 496–503 (2000).
Moses, A. M. Statistical tests for natural selection on regulatory regions based on the strength of transcription factor binding sites. BMC Evol. Biol. 9, 286 (2009).
Rifkin, S. A., Houle, D., Kim, J. & White, K. P. A mutation accumulation assay reveals a broad capacity for rapid evolution of gene expression. Nature 438, 220–223 (2005).
Peter, J. et al. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature 556, 339–344 (2018).
Erb, I. & van Nimwegen, E. Transcription factor binding site positioning in yeast: proximal promoter motifs characterize TATA-less promoters. PLoS One 6, e24279 (2011).
Gilad, Y., Oshlack, A. & Rifkin, S. A. Natural selection on gene expression. Trends Genet. 22, 456–461 (2006).
Alhusaini, N. & Coller, J. The deadenylase components Not2p, Not3p, and Not5p promote mRNA decapping. RNA 22, 709–721 (2016).
Yang, J.-R., Maclean, C. J., Park, C., Zhao, H. & Zhang, J. Intra and interspecific variations of gene expression levels in yeast are largely neutral: (Nei Lecture, SMBE 2016, Gold Coast). Mol. Biol. Evol. 34, 2125–2139 (2017).
Chen, J. et al. A quantitative framework for characterizing the evolutionary history of mammalian gene expression. Genome Res. 29, 53–63 (2019).
Payne, J. L. & Wagner, A. Mechanisms of mutational robustness in transcriptional regulation. Front. Genet. 6, 322 (2015).
Shoval, O. et al. Evolutionary trade-offs, Pareto optimality, and the geometry of phenotype space. Science 336, 1157–1160 (2012).
van Dijk, D. et al. Finding archetypal spaces using neural networks. IEEE International Conference on Big Data 2634-2643 (2019).
He, X., Duque, T. S. P. C. & Sinha, S. Evolutionary origins of transcription factor binding site clusters. Mol. Biol. Evol. 29, 1059–1070 (2012).
Cliften, P. F. et al. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11, 1175–1186 (2001).
Heinz, S., Romanoski, C. E., Benner, C. & Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16, 144–154 (2015).
Lehner, B. Selection to minimise noise in living systems and its implications for the evolution of gene expression. Mol. Syst. Biol. 4, 170 (2008).
Metzger, B. P. H., Yuan, D. C., Gruber, J. D., Duveau, F. & Wittkopp, P. J. Selection on noise constrains variation in a eukaryotic promoter. Nature 521, 344–347 (2015).
Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA. 110, 14024–14029 (2013).
Shalem, O. et al. Systematic dissection of the sequence determinants of gene 3’ end mediated expression control. PLoS Genet. 11, e1005147 (2015).
Kinney, J. B., Murugan, A., Callan, C. G. Jr & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA. 107, 9158–9163 (2010).
Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).
Kwasnieski, J. C., Mogno, I., Myers, C. A., Corbo, J. C. & Cohen, B. A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl Acad. Sci. USA 109, 19498–19503 (2012).
Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019).
Townsley, K. G., Brennand, K. J. & Huckins, L. M. Massively parallel techniques for cataloguing the regulome of the human brain. Nat. Neurosci. 23, 1509–1521 (2020).
Renganaath, K. et al. Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross. eLife 9, e62669 (2020).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Travers, C. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).
Avsec, Ž. et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600 (2019).
Quang, D. & Xie, X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods 166, 40–47 (2019).
Zhou H. et al. Towards a better understanding of reverse-complement equivariance for deep learning models in genomics. Proc. 16th Machine Learning in Computational Biology meeting 165, 1–33 (2022).
Morrow, A. et al. Convolutional kitchen sinks for transcription factor binding site prediction. Preprint at https://arxiv.org/abs/1706.00125 (2017).
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
Koo, P. K., Majdandzic, A., Ploenzke, M., Anand, P. & Paul, S. B. Global importance analysis: an interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput. Biol. 17, e1008925 (2021).
Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. International Conference on Learning Representations (Poster) (2015).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogenous systems. Software available from https://www.tensorflow.org/ (2015).
Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. In Proc. 44th Annual International Symposium on Computer Architecture 1–12 (2017).
Li, J., Pu, Y., Tang, J., Zou, Q. & Guo, F. DeepATT: a hybrid category attention neural network for identifying functional effects of DNA sequences. Brief. Bioinform. 22, bbaa159 (2020).
Ullah, F. & Ben-Hur, A. A self-attention model for inferring cooperativity between regulatory features. Nucleic Acids Res. 49, e77 (2021).
Clauwaert, J., Menschaert, G. & Waegeman, W. Explainability in transformer models for functional genomics. Brief. Bioinform. 22, bbab060 (2021).
Hinton, G. & Tieleman, T. Lecture 6.5—RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 26–31 (2012).
Sinai, S. et al. AdaLead: a simple and robust adaptive greedy search algorithm for sequence design. Preprint at https://arxiv.org/abs/2010.02141 (2020).
Linder, J., Bogard, N., Rosenberg, A. B. & Seelig, G. A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences. Cell Syst. 11, 49–62 (2020).
Brookes, D., Park, H. & Listgarten, J. Conditioning by adaptive sampling for robust design. Proc. Mach. Learn. Res. 97, 773–782 (2019).
Killoran, N., Lee, L. J., Delong, A., Duvenaud, D. & Frey, B. J. Generating and designing DNA with deep generative models. Neurips Computational Biology Workshop (2017).
Fortin, F.-A., Rainville, F.-M. D., Gardner, M.-A., Parizeau, M. & Gagné, C. DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012).
Jaeger, S. A. et al. Conservation and regulatory associations of a wide affinity range of mouse transcription factor binding sites. Genomics 95, 185–195 (2010).
Tanay, A. Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res. 16, 962–972 (2006).
Sniegowski, P. D. & Gerrish, P. J. Beneficial mutations and the dynamics of adaptation in asexual populations. Phil. Trans. R. Soc. B 365, 1255–1263 (2010).
Szendro, I. G., Franke, J., de Visser, J. A. & Krug, J. Predictability of evolution depends nonmonotonically on population size. Proc. Natl Acad. Sci. USA 110, 571–576 (2013).
Orr, H. A. The population genetics of adaptation: the adaptation of DNA Sequences. Evolution 56, 1317–1330 (2002).
Bailey, T. L. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).
de Boer, C. G. & Hughes, T. R. YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities. Nucleic Acids Res. 40, D169–D179 (2012).
Kent, W. J. BLAT—the BLAST-Like Alignment Tool. Genome Res. 12, 656–664 (2002).
Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
Smith, J. D., McManus, K. F. & Fraser, H. B. A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers. Mol. Biol. Evol. 30, 2509–2518 (2013).
Liu, J. & Robinson-Rechavi, M. Robust inference of positive selection on regulatory sequences in the human brain. Sci. Adv. 6, eabc9863 (2020).
Rice, D. P. & Townsend, J. P. A test for selection employing quantitative trait locus and mutation accumulation data. Genetics 190, 1533–1545 (2012).
Denver, D. R., Morris, K., Lynch, M. & Thomas, W. K. High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 430, 679–682 (2004).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Thompson, D. A. et al. Evolutionary principles of modular gene regulation in yeasts. eLife 2, e00603 (2013).
Yassour, M. et al. Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species. Genome Biol. 11, R87 (2010).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Wapinski, I., Pfeffer, A., Friedman, N. & Regev, A. Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–61 (2007).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Yates, A. D. et al. Ensembl 2020. Nucleic Acids Res. 48, D682–D688 (2020).
DiCarlo, J. E. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR–Cas systems. Nucleic Acids Res. 41, 4336–4343 (2013).
Fleiss, A. et al. Reshuffling yeast chromosomes with CRISPR/Cas9. PLoS Genet. 15, e1008332 (2019).
Horwitz, A. A. et al. Efficient multiplexed integration of synergistic alleles and metabolic pathways in yeasts via CRISPR–Cas. Cell Syst. 1, 88–96 (2015).
Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods 25, 402–408 (2001).
Vandesompele, J. et al. Accurate normalization of real-time quantitative RT–PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, research0034.1 (2002).
Teste, M.-A., Duquenne, M., François, J. M. & Parrou, J.-L. Validation of reference genes for quantitative expression analysis by real-time RT–PCR in Saccharomyces cerevisiae. BMC Mol. Biol. 10, 99 (2009).
Mardones, W. et al. Rapid selection response to ethanol in Saccharomyces eubayanus emulates the domestication process under brewing conditions. Microb. Biotechnol. https://doi.org/10.1111/1751-7915.13803 (2021).
Ibstedt, S. et al. Concerted evolution of life stage performances signals recent selection on yeast nitrogen use. Mol. Biol. Evol. 32, 153–161 (2015).
Rich, M. S. et al. Comprehensive analysis of the SUL1 promoter of Saccharomyces cerevisiae. Genetics 203, 191–202 (2016).
Rest, J. S. et al. Nonlinear fitness consequences of variation in expression level of a eukaryotic gene. Mol. Biol. Evol. 30, 448–456 (2013).
Bergen, A. C., Olsen, G. M. & Fay, J. C. Divergent MLS1 promoters lie on a fitness plateau for gene expression. Mol. Biol. Evol. 33, 1270–1279 (2016).
Alstott, J., Bullmore, E. & Plenz, D. Powerlaw: a Python package for analysis of heavy-tailed distributions. PLoS One 9, e85777 (2014).