Unraveling the Complex Genetics of Living Past 105

Ben Goertzel
SingularityNET
Published in
16 min readOct 8, 2019

--

By Ben Goertzel and Michael Duncan

Why do some people live healthy and satisfying lives well past 100, whereas others die much younger?

Aging is a complex matter and a question like this will have no single answer. Lifestyle, circumstances, personal relationships, diet, exercise and numerous other factors are all likely to play a role in any particular person’s lifespan, long or short, healthy or not. But it is tempting to conjecture that, alongside other factors, the extremely long-lived may have some genetic “secret sauce” — something special in their DNA that helps them flourish past the century mark.

Studies of identical twins separated at birth demonstrate that there is a significant but not overwhelming correlation between their lifespans. This shows that generally speaking, there is genomic contribution to longevity — and also that the genome is only part of the story.

We know that aging, in general, is caused by a great variety of different factors. Longevity pioneer Aubrey de Grey speaks of the Seven Pillars of Aging.To a certain extent that living into one’s second century is a matter of dodging these seven species of bullets, one would expect it to involve a variety of different factors. The genetic, environmental and bodily-self-organizational factors that make your body good at avoiding mitochondrial damage might well be different from the ones that make your body good at avoiding cardiovascular trouble, etc. On the other hand, there are also both obvious and subtle interdependencies between the multiple factors, which relates to the well known old-age dynamic of rapid systemic deterioration in which “one thing goes, and then suddenly a bunch of other things start going….”

As bio-AI researchers, whenever we see a property of humans or other organisms that depends on a complex combination of a variety of different genetic factors, we also see something different — we see an opportunity for AI to play a role. Teasing out complex combinations of multiple factors that together produce a complex result — this is one of the things AI tends to be good at, if one can supply the AI with the right type and quantity of data.

Toward this end, we have been working for more than a year now with a dataset of whole-genome sequences from 27 subjects of European ancestry documented to have reached ages between 105 and 117 (mean 110.3 years). The DNA data regarding these supercentenarians was provided by supercentenarians.com, and was gathered over years by a team led by James Clement and involving many others including biology visionary George Church.

This data from the supercentenarian.com cohort was generously provided to us by James Clement, who put in great effort to gather the data from the individuals involved and has also analyzed the data himself using various methodologies.

Our work on the genomics of supercentenarians is an ongoing endeavor and we are discovering new things each month. Later this fall we will begin writing up our results so far for formal academic publication. However, at this stage things have gotten interesting enough that it seems worthwhile to write up some of our preliminary results in this informal blog format, as well as describing some of the research directions we’re exploring and that we will expect to yield even more exciting results in the fairly near future.

AI Against Aging: The Broader Quest

James Clement, like us, has a passion for healthspan extension — for using science and technology to help, in the phrase of our old friend and colleague Bruce Klein, “abolish the plague of involuntary death.” We are part of a rapidly growing global community of individuals who enjoy living tremendously and would prefer to continue doing so indefinitely and would prefer that everyone else who shares this desire can also keep on living as long as they want to.

Like James Clement and Aubrey de Grey, we have been involved in the quest for radical human longevity since well before it began enjoying its current relative popularity. Now with Google Calico, Human Longevity Incorporated and other such ventures, the quest for a scientific “fountain of youth” has become mainstream. There are multiple conferences on the topic each year, such as the SENS events that dig deep into the science and the Raadfest (RAAD = Revolution Against Aging and Death) that mixes science with practical therapy and the motivational feel of a religious revival.

Much of the recent progress toward human lifespan extension doesn’t use AI at all. Greg Fahy recently published a landmark study showing that a small group of individuals, taking a carefully selected combination of therapies over a period, was able to “roll back aging” by years, in the sense that at the end of the experiment their biomarkers indicated their “biological age” was 1.5 years younger than at the start.

However, it has seemed to us for a long time, that progress can be massively accelerated by bringing AI more fully into the picture. It’s toward this end that SingularityNET is launching a spinoff project, Rejuve.io, aimed specifically at the application of decentralized AI technologies to longevity data, and the use of a tokenomic model to incent individuals and clinics to provide health data to feed the SingularityNET longevity-AI analysis engines. And it’s toward this end that we’ve been working to apply SingularityNET-based AI to the supercentenarian data gathered by James Clement and his colleagues.

When we started applying AI to genomic data in 2001, there was already significant academic literature on the topic, but it was hard to interest pharma companies or government labs in something so cutting-edge. Nowadays it’s different and there are hundreds of startups applying AI to various sorts of biomedical data including genomics, and every major biomedical research institution has machine learning researchers in-house.

However, in some ways, we are still in very primitive times regarding the application of AI to complex biomedical phenomena like aging and lifespan. Most work applying AI on biological datasets is in the vein of “supervised pattern recognition” using single algorithms — basically you take a number of data-packages corresponding to people with a certain condition, and then a number of similar data-packages corresponding to “controls” (people without that condition), and ask an AI algorithm to learn a “model” that predicts whether a person has that condition or not by looking at their data-package. In the case of DNA analysis, the data-package corresponding to a person may be a list of the SNPs (Single-Nucleotide Polymorphisms, i.e. mutations or other genetic variations) in their personal genome.

This sort of analysis can be extremely valuable, but also has its limits. It’s very weak at uncovering the why underlying disease or other biomedical condition. It’s not good at unraveling the complex multi-scale network dynamics that drive most of what’s interesting in biological systems. Just as modern neural machine translation systems can translate from English to Spanish without understanding the meaning of what they’re translating, so modern bio-AI systems can predict who is more likely to develop a certain disease based on DNA data without understanding how and why particular DNA factors lead to the advent of the disease.

For AI to play a large role in curing diseases and conquering aging, it needs to go beyond predictive modeling and go further toward understanding the complex biological processes underlying its predictions. We believe the path toward achieving lies in the integration of multiple AI techniques, including some that are able to represent and manipulate knowledge at high levels of abstraction.

This is what motivates us to work on the analysis of biological data using the OpenCog AI system, which integrates multiple AI algorithms in a common representational and dynamical framework; and the SingularityNET platform, which enables heterogeneous AI tools to interact flexibly in a common “Society of Minds” type marketplace. Some of this work has involved the bio-AI tools created by the Hong Kong firm Mozi AI Health, which leverage both OpenCog and SingularityNET toward AI-driven systems bioinformatics — and are fully in the spirit of the “integrative AI” philosophy.

Supercentenarians, Like Methuselah Flies, Display an Unusual Number of Genetic Variants

We have compared the 27 supercentenarian genomes from James Clement’s dataset with a control group using 503 subjects from the European superpopulation of the 1000 Genomes cohort.

The first major thing one notices when studying this dataset is that there’s no simple silver bullet for understanding it. I.e. there is no single genetic variation — no single SNP, no single gene — that is glaringly different in the supercentenarians from the controls.

Put formally: A genome-wide association study (GWAS) of single nucleotide polymorphisms in this dataset shows no hits of genome-wide significance (association p value < 5 x 10–8), which is unsurprising for a data set this small and unbalanced (ncase << ncontrol) contrasting a complex phenotype.

However, there are still some dramatic differences between the supercentenarians and the controls. For instance, a comparison of the mean number of SNPs per sample reveals significantly more in the supercentenarians than in the controls (21% larger, p < 2.2e-16). On average, the supercentenarians have a lot more — 21% more — significant genetic variations than the controls. This strongly suggests there is something special going on with their genomes — it’s just not something centered in one mutation or one gene.

This observation excited us because it reminded us of our earlier work with Genescient Corp. on the genomics of Michael Rose’s long-lived fruit flies, which were evolved via experimental evolution to live 4–5x longer than their ancestor flies. The Methuselah flies showed 23% larger (p < 2.2e-16) mutational burden than corresponding controls.

The following graph shows the number of SNPs (genetic variations) per chromosome for supercentenarians versus controls, and Methuselah flies vs. controls.

Further, these SNPs are distributed throughout the genome. This corroborates the “omnigenic” model of complex diseases:

…gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways.[1]

According to this way of thinking, most genes expressed in cell types relevant to a certain complex phenotype — such as aging, or longevity — are involved in its generation. This, in turn, explains why genetic variants found by GWAS studies for complex diseases have low reproducibility and account for so little observed phenotype variation including longevity (the missing heritability problem).

Anyone who has worked with Michael Rose has heard him say time and time again words to the effect that: There is no single gene or pathway, and no small set of genes or pathways, that explains aging or longevity or any other complex disease or trait. Biological networks don’t work like Hollywood movies with a small set of protagonists and antagonists, easy to follow plots and story arcs. There is no narrative. There is just a complex network with a huge number of different biological actors on different scales, coordinating in nonlinear, self-organizing and sometimes chaotic patterns, ongoingly shaped and tuned by evolutionary processes. It’s not an easy sort of thing for the human mind to understand, but so it goes.

Since aging and extreme longevity manifests in all tissues, this also implies that “housekeeping genes” active in all cell types should be affected. Known longevity associated genes are heavily represented in general cell maintenance-related pathways involved in proteostasis, the maintenance of appropriate types and quantities of proteins and removal of damaged ones, and autophagy, the removal of large multi-protein complexes and organelles. This also agrees with the hypothesis that accumulation of repair-resistant extracellular matrix protein cross-linking is the limiting process, or at least one of the limiting processes, determining maximum human lifespan.

Aubrey de Grey’s Seven Pillars provide a simple way to think about the diversity of factors and dynamics underlying aging. However, one wouldn’t expect the genetic variations in the supercentenarians to neatly partition into seven subgroups corresponding to the seven issues Aubrey has identified — or the nine issues identified in the 2013 paper “Hallmarks of Aging”, which follows and elaborates Aubrey’s analysis. When one digs into the underlying biological networks, one sees all sorts of interconnections between these high-level issues. To a substantial extent, they are going to involve overlapping sets of genes, pathways and processes — according to complex patterns that defy simplistic understanding in terms of narratives and protagonists.

In sum: The genetic differences are there, but they’re not focused in any one place and they’re not simple. So where can we go to explore the genomic underpinnings of longevity further, using this supercentenarian data? Is it all hopelessly complex for our petty human minds, so that we need to wait for a transhuman AGI to crunch the data for us?

Fortunately not. There are many kinds of further and deeper questions we can meaningfully ask of this data using current AI and bioinformatics technologies, including the one we have been pursuing lately: Searching for meaningful subnetworks of the overall genomic network, that play important roles in the supercentenarians’ longevity.

That is, ask:

Can we find subsets of interacting SNPs over-represented in the genomes of extremely long-lived subjects that could affect the gene networks of longevity associated pathways to produce the phenotype?

To find meaningful biological subnetworks relevant to longevity, one could proceed in a fully automated way, but it’s also interesting to explore a more biology-driven approach. One can look at specific hypotheses regarding what kinds of subnetworks might underlie aging, and see whether subnetworks matching these hypotheses seem to be significant in the supercentenarians’ longevity according to the data available.s

Supercentenarians May Have a Significantly Different “Midlife Longevity Switch”

When the paper “Longevity‐related molecular pathways are subject to midlife “switch” in humans” by James Timmons, Claes Wahlestedt and colleagues came out April of this year, we were intrigued. What they found was that, in the sixth decade of a typical human life, a certain linear protein‐coding RNA response to aging is essentially switched off. That is, the expression of these RNA change linearly with respect to age up till about age 60, and then after that the linear dependence goes away.

Specifically, looking at RNA in human muscle and brain tissue, Timmons, Wahlestedt et al identify ~800 RNA transcripts that are expressed at levels proportional to age in people 30–60 years, but not in people aged 60–90 years.[2]

Furthermore, this change — this “midlife switch” — is mediated by a large number of genes that are well-known as regulators of aging and longevity in various non-human animals. Analysis shows that the action of lifespan-extension drugs like Rapamycin on animals directly impacts the animal orthologues of many of these midlife longevity switch genes. There also seems to be a significant role for non-coding DNA (DNA that doesn’t code for genes directly, which was called “junk DNA” back when we were in university, but is now known to have huge importance for cell regulation, function and growth) in this process.

Conjecturing just a bit, these results suggest a coherent “module” of interacting gene networks, including the known longevity-related IGF1/PI3K/mTOR pathway, is disrupted after a cumulative level of age-related damage is reached. Could this module play a large role in precipitating the accelerated functional decline and frailty characteristic of old age?

The midlife switch is surely not the whole story underlying human aging, or anywhere near. But a complex network can have subnetworks with particular causal force in specific contexts. Could this midlife switch sunetwork be especially impactful as regards to aging?

This paper suggested to us a natural hypothesis: Maybe there is something unusual in our supercentenarians, relative to this midlife switch?

To explore this, we looked at a set of 38,000 SNPs taken from the exons of genes corresponding to the RNA in the midlife switch RNA transcript set.

We were interested not in the impact of any of these SNPs individually, but rather in combinational effects — in ways that multiple of these genetic variations might combine together to help the supercentenarians live so long.

Given these 38,000 SNPs, we created 2 binary variables for each variant, heterozygous and homozygous for the alternate allele, resulting in 76,000 features to be studied across the supercentenarians and controls. We then used this feature-set as input to OpenCog’s MOSES automated program learning engine, using MOSES to learn Boolean classification trees distinguishing supercentenarians from controls based on combinations of these midlife switch related SNPs. MOSES has been used for genomic data analysis in a number of prior studies, and has proven especially effective at learning complex, statistically valid models on imbalanced datasets where one category is very small.

Doing 10 cross-validated runs of the MOSES algorithm yielded 70 models, involving 57 SNPs spread across 68 genes. Out-of-sample accuracy statistics for the models were highly significant:

· balanced accuracy 0.73–1.0, F1 > 0.98

· positive predictive value 0.1 – 1.0

· negative predictive value > 0.9986

Two example models from the ensemble are shown below:

Qualitatively, what these results suggest is: There may be something unusual going on in supercentenarians, as related to the midlife switch and associated processes.

Possible Super-Fly Experiments

Given that both supercentenarians and Methuselah flies display generally increased SNP density, and that we already know there are functionally important age “switches” in Drosophila (e.g. selective midlife induction of mitochondrial fission via over‐expression of the GTPase gene, drp1 (DNM1L in humans), extends lifespan), it is natural to hypothesize that maybe

· a general “linear RNA” midlife switch exists in flies as well as in humans

· in Methuselah flies, perhaps this linear dependence extends much longer in life

This would be relatively easy to explore via gathering time-series gene expression data from existing populations of Methuselah flies.

The corresponding hypothesis regarding human supercentenarians would also be feasible to explore over a period of years, via gathering blood samples and doing gene expression analysis on a population of, say, supercentenarians and centenarians and nonagenarians.

It would also be very interesting to learn classification models for Methuselah fly vs. control fly, on features derived from fly midlife switch related genes, and then use OpenCog inference techniques to explore transferring these models to the human case. This would be a measure of how biologically similar the midlife switches are in the two organisms, beyond what one can tell by counting orthologous genes.

Next Steps for AI Analytics: Creative Interpretation

What we are working on now is using additional AI methods from the OpenCog, SingularityNET and Mozi toolsets to interpret these results and understand their meaning and implications.

The Mozi Gene Annotation Service, available as an AI service on the SingularityNET Beta V2 network, automatically annotates the SNPs appearing in the models found by MOSES with their effects on gene expression and the resulting protein interaction networks as represented in OpenCog’s Atomspace knowledge hypergraph. This is enabled via advance preparation of a special “bio-Atomspace” stocked with carefully-integrated knowledge from multiple biology databases. What the Atomspace allows is simultaneous integrated cross-referencing of MOSES AI analysis results with multiple cross-referenced databases in a common framework.

Screenshot of the Gene Annotation service being used to explore the connections among genes found relevant in an experiment

Once the results of MOSES analysis of a genomic dataset are represented in the Atomspace along with an abundance of biological background knowledge, these results can then be interpreted and post-analyzed using more sophisticated OpenCog tools such as the Probabilistic Logic Networks inference engine and the Hypergraph Pattern Miner.

The Pattern Miner scans through an Atomspace full of general biology knowledge and knowledge of the results from particular experiments and tries to find combinational patterns that seem surprising or interesting in some way. Some patterns it flags as surprising seem not so much:

X is in the nucleolus, Y is in the nucleolus, and X interacts with Y

Others point to combinations of potential real biomedical interest:

X interacts with Genes:

AKT1 and MAP2K3 and STK11 and LATS2

Why would there be a lot of genes that, in the supercentenarian population, interact with these 4 other genes in particular? Well, this is a question one could put to a human biologist, or — the approach we are currently taking — instead of to the Probabilistic Logic Networks reasoning engine, running on the bio-Atomspace.

The Pattern Miner is being integrated into the SingularityNET BetaV2 portal as I write this, it should be available for public use on the platform within a week. PLN will follow not too long after. One of our goals with the SingularityNET platform is to make tools like this as widely and easily available as possible. There is still a long way to go before our OpenCog / SingularityNET / Mozi software ecosystem is a really slick pipeline for biomedical data analysis, but we’re getting there step by step — and we feel this is at least as critical as our work solving specific problems like the genomics of supercentenarians.

There are many, many datasets out there relevant to healthy long life, and many AI and bioinformatics tools to be applied and integrated to understand all this data in a comprehensive way. Data analytics needs to be closely coupled with simulation modeling and automated operation of lab equipment — integration that can naturally be done in SingularityNET and OpenCog as well. But such integration is most effectively done in the context of concrete projects rather than in the abstract. We will be building out this toolset ongoingly, with the help of the SingularityNET community, in the context of our ongoing work using it to unravel the complex genomics and broader biology of healthy human longevity.

[1] Boyle EA, Yang I. Li Y, et al. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, June 15, 2017 a 2017 Elsevier Inc. 1177–86

[2] Timmons JA, Volmar CH, et al. Longevity‐related molecular pathways are subject to midlife “switch” in humans. Aging Cell. 2019;18:e12970

Join Us

SingularityNET has a passionate and talented community which you can connect with by visiting our Community Forum. Feel free to say hello and to introduce yourself here.

We hope you will join us in our mission to democratize AI and to use its transformative powers to enact real, positive and lasting change. The best way forward is to come together and work practically towards creating a better future.

We are proud of our developers and researchers that are actively publishing their research for the benefit of the community; you can read the research here. For any additional information, please refer to our roadmaps and subscribe to our newsletter to stay informed about all of our developments. You can connect with our passionate and talented community by visiting our Community Forum.

--

--