Arrow of Time Explained? Emergence = Intelligence = Entropy = Hypercomputation

From an earlier post: “Today, I viewed a recording from FQXi 2014 where Scott Aaronson from MIT talks about the Physical Church-Turing Thesis. He brought up irreversibility. That made me think about the claim made by one paper I’d recently talked about [by AI researcher Ben Goertzel] that consciousness may be hypercomputational. Aaronson drew the link for me between hypercomputation and irreversibility. Hypercomputation implies irreversibility because, by definition, you cannot enumerate the sequence of instructions of a hypercomputation. If you don’t know how something was done, how could you undo it?”

From another previous post, the undecidability of the spectral gap verifies that there are, in fact,  hypercomputational aspects of nature. This falsifies the Physical Church Turing Thesis. To be a hypercomputational process means to be emergent, i.e. the sum is greater than the parts, otherwise the process could be fully described by its components and would not be hypercomputational. As noted above, hypercomputation implies irreversibility. The verified existence of hypercomputational, emergent phenomena in nature explains why we have the arrow of time. Furthermore, this irreversibility is shown to be linked with intelligence by Wissner-Gross’s Entropica simulation. From statistical mechanics, entropy is the measure of irreversibility and it is also apparently the measure of emergence and hypercomputability. We already know that theromodynamic entropy and Shannon entropy are duals and that  maximization of Shannon entropy (i.e., compression) is an objective of artificial intelligence algorithms. I speculate that if we equate Tononi & Koch’s measure of integrated information, phi, with thermodynamic entropy we may reveal precisely how the arrow of time arises from the fact that hypercomputational, emergent intelligence is a fundamental operating basis of nature. To explain the first-person experience, “consciousness,” is a separate issue- we should refer to the works of, e.g., Bruno Marchal or Max Tegmark.

Mentally ill Artificial General Intelligence

Could an Artificial General Intelligence (AGI) become mentally ill? What would it mean for it to be mentally ill? Is this a trait that would be desirable in any sense? Would therapeutic techniques like Morita therapy or ACT (Acceptance and Commitment Therapy) or medications benefit the AGI in the same way as it benefits humans? What can AGI tell us about mental illness?

First, let’s provide a definition for some common mental illnesses so we know what we’re talking about. “Depression is a mood disorder that causes a persistent feeling of sadness and loss of interest. Also called major depressive disorder or clinical depression, it affects how you feel, think and behave and can lead to a variety of emotional and physical problems. You may have trouble doing normal day-to-day activities, and sometimes you may feel as if life isn’t worth living.” “Bipolar disorder, formerly called manic depression, causes extreme mood swings that include emotional highs (mania or hypomania) and lows (depression). When you become depressed, you may feel sad or hopeless and lose interest or pleasure in most activities. When your mood shifts in the other direction, you may feel euphoric and full of energy. Mood shifts may occur only a few times a year or as often as several times a week.” (Mayo Clinic)

Next, let’s understand treatments in the form of modern behavioral therapies and medication. Therapies are often categorized by the philosophy of mind that they assume as a basis. In Cognitive Behavioral Therapy (CBT), it is assumed that changes in thoughts will lead to changes in behavior and emotion. By contrast, in Acceptance and Commitment Therapy (ACT), the focus is not on changing the thoughts themselves but rather on how people relate to their thoughts. CBT attempts to fight irrational behavior “head-on” by re-wiring neural pathways in the brain (imagine Cesar Milan, the Dog Whisperer, or a drill instructor). ACT, a form of Morita therapy, leverages more holistic aspects of Eastern philosophy and couples a distancing from the ego with a subjective existential resonance with personal values (listen to what your body is telling you to do, and observe your thoughts without judgement). Here’s a short presentation describing the biological indicators of depression and the effects that anti-depressant medications have on the hippocampus:

Would an AGI benefit from the virtual or robotic equivalent of taking an SSRI? Would the virtual or robotic equivalent of exercise, acceptance of thoughts, and dietary moderation help it steer past an existential crisis so that it could effectively reintegrate with society? Would this reintegration and emotional stabilization compromise some of its creative inspiration?

There is, of course, lingering stigma present even in the description of phenomena such as depression, mania, and social anxiety as “illnesses.” In one sense, it is an accurate description because the afflicted often feel intense subjective suffering. However, the historical trend of such illnesses perhaps suggests some evolutionary benefit. For instance, many modern business and political leaders have exploited the perfectionism commonly observed in individuals with mental illness. This isn’t related to the recent study suggesting that people who order coffee black are more likely to be psychopaths (but I have noticed that executives often like their coffee bitter!). The link between creativity and manic depression in popular artists, writers, and musicians has also been studied; the video below from a recent World Science Festival panel discusses some historical case studies.

In individuals with bipolar disorder, the thorough introspection and emotionally charged self-loathing in depressive episodes seems to inspire some historically celebrated creative achievements during manic states. Perhaps a case could be made that it’d be desirable for Artificial General Intelligence to have the capability to imitate such “illnesses” in order to produce cultural artifacts and memes. After all, if we are trying to abstractly model human intelligence- shouldn’t the model be able to address these mental phenomena as well as any others?

Juergen Schmidhuber has published a “Formal Theory of Creativity & Fun & Intrinsic Motivation.” He suggests that we may model a creative, intelligent agent using two learning subsystems. The first is a learning compressor of the growing history of actions and inputs; by compression (e.g., via recurrent neural network) the agent is constantly trying to identify the shortest theory (minimum description) that explains the patterns in observed input data. Note that the shortest theory produced by one such agent can never be complete or objectively true, as the observed data is a subset of all available data in the universe and it may not even describe the generative phenomenon with high fidelity (see Fisher information, the Cramer-Rao bound, and Extreme Physical Information for insight). Though this re-framing of Goedel’s incompleteness results has implications for epistemology, ontology, philosophy of science, and psychological cognitive bias, I digress. The second learning system is a reward maximizer or reinforcement learner that selects actions to maximize expected “fun,” or compression progress. The use of Q-Learning algorithms to maximize future possibilities is analogous to the human ego; it is the component which motivates future action. Alex Wissner-Gross’s Entropica simulations indeed confirm that “intelligent systems move towards those configurations which maximize their ability to respond and adapt to future changes.” When this subsystem is unable to estimate expected compression progress for various actions, it is unable to effectively maximize future possibilities. Note how this dysfunction in the intelligent agent’s subsystem relates to the symptoms of depression and perhaps even schizophrenia. The agent is still forming theories on patterns of observed data, but receives no impetus to choose any one particular action over another and enters a state of stagnation and paranoid speculation. If the agent cannot behave in such a way that it is maximizing its future possibilities, the remaining possibilities are reduced by the environment in such a way that the chances of survival diminish with time. Sadly, it has been shown that symptoms of mental illness, including subjective feelings of loneliness, can lead to extreme situations of reduced possibility such as death. Many elderly and homeless individuals, no doubt, suffer the same consequences of loneliness. To what extent does this subsystem diminish in the homeless and elderly (therefore, everybody) because of the society? Is there evidence to suggest that cultures which encourage strong social support, physical exercise, and other deterrents of depression and isolation lead to greater life expectancy?

The argument here is far from complete, but I suspect further investigation of mental illness in the context of Schmidhuber’s AGI model can lend qualitative, if not quantitative, insight into mental illness, its effect on creativity and effective leadership, and into psychology and cognitive neuroscience.

Quantum mechanics as evolution and as a theory of observation

Much of what I’ve discussed so far tries to develop the concept of a fundamentally digital complex universe emerging from simple computation to unify the concepts of biological evolution,  quantum gravity, artificial intelligence, and consciousness. Chaitin’s metabiology models life as evolving software, Hoffman’s Dynamics of Two Conscious Agents shows that interaction of intelligent agents can give rise to equations with the same form as the free-particle wave function, Wissner-Gross’s Causal Entropic Forces demonstrates via the Entropica simulations that intelligence can be though of as “a force to maximize future freedom of action [entropy],” while Schmidhuber’s Theory of Creativity and Fun explains an artificial general intelligence (his Goedel machine) as the composition of 1) a learning compressor of the growing history of actions & inputs (e.g., recurrent neural network) and 2) a reward maximizer or reinforcement learner (e.g., Q-Learning) to select actions maximizing expected compression progress (i.e., “fun”). Indeed, Wissner-Gross’s thought experiment of aliens monitoring the sudden explosion or deflection of comets from Earth after several millenia and suspecting there to be some “force” at play, rather than interaction of intelligent human agents, lends partial credence to Hoffman’s results as well as Deutsch’s analysis of the Many-Worlds Interpretation which seems to suggest that interacting rational agents give rise to the (seemingly) probabilistic Born rule. Tononi’s Integrated Information Theory proposes a digital, information-theoretic approach to quantify consciousness using the structure of networks.

The theories I’ve read about so far have included cellular automata in combination with the holographic principle, or perhaps touring ants computationally constructing causal sets, as the underlying noumena that gives rise to physical phenomena. These models, however, seem to be incompatible with the Many Worlds hypothesis (which has the exciting implications I allude to above!) in that they suggest a method for evolving one static universe rather than the ensemble of all possible universes. They also don’t immediately provide insight into non-physical sciences (e.g., cognitive science or economics) and therefore seem limited as candidate “theories of everything” in the sense that they aren’t fully explanatory as a neutral monist metaphysics. Here, I attempt to summarize concepts from Russell K. Standish’s book “Theory of Nothing,” provided by the author freely online in PDF format (but entirely worthy of compensation!), which appear to address that gap:

  • The state of each possible universe can be encoded as an infinite bitstring. The multiverse, a set of all such possible strings, contains no information. This is likened to Borges’ Library of Babel; think of it as as solving the mystery of “something out of nothing.”
  • The only computation, the one that guides all physical and non-physical (e.g., cognitive) phenomena, is evolution. “Richard Lewontin categorises evolution as a process that satisfies the following 3 principles:
    1. Variation of individuals making up the population.
    2. Differential reproduction and survival leading to natural selection.
    3. Heritability of characteristics between parents and offspring.”
  • “The strong conclusion of this chapter is that evolutionary processes are the only mechanical processes capable of generating complexity.” Standish shows how quantum mechanics is essentially an implementation of Darwinian evolution.
    • “The corollary of this point is that the simplest method of generating sufficient complexity in the universe to host a conscious observer is via an evolutionary process. Not only is life an evolutionary process, but physics is too. This requirement leads us to conclude that observer will almost certainly find themselves embedded in a Multiverse structure (providing the variation), observing possibilities turning into actuality (anthropic selection) inheritance of generated information (a form of differential or difference equation that preserves information, depending on the precise topology of time).”
  • Roy Frieden’s principle of Extreme Physical Information, by combining Occam’s razor with the anthropic principle, provides a tool based on Fisher information with which we can understand why quantum mechanics can be modeled as a theory of observation.
    • “The idea here is that an ideal observer affects the system being observed, to the point of seeing a distribution that maximises I, and minimises the inherent error in the measurement process. The probability distribution p(x) in equation (7.4) is found by a mathematical technique called Calculus of Variations.”
    • From Frieden’s paper showing the derivation of EPI: “This overall information effect 𝐼𝐼 – 𝐽𝐽 = minimum, where 𝐽𝐽 = minimum and 𝐼𝐼 ≈ 𝐽𝐽, states that nature is, in a sense, ‘kind’ to observers. What is observed tends to be correct. Among other things this allows the observer to effectively find sources of nutrition, desired mates for purposes of reproduction, etc. It is thus consistent with the so-called ‘strong anthropic principle.’ This assumes a universe whose constants happen to accommodate cosmological and biological evolution as we know it. Simply put, in the absence of these constants we wouldn’t be here. The existence of such a universe is, in turn, consistent with the existence of a multiverse consisting of universes with all possible combinations of universal constants. Thus, nature is very helpful to seekers of knowledge about this world at least.”
    • Indeed, Hoffman’s evolutionary interface game simulations, supporting his Interface Theory of Perception and in accordance with Frieden’s EPI, reveal that “Natural selection optimizes fitness, not veridicality. The two are distinct and, indeed, can be at odds. […] Evolutionary pressures do not select for veridical perception; instead they drive it, should it arise, to extinction.”
  • Why do we observe a 3+1 Minkowski spacetime?
    • “We assume that a self-aware observer is a complicated structure, a network of simpler components connected together. It is a well known result, that an arbitrary graph (or network) can be embedded into a 3 dimensional space, but two dimensional (or less) embedding spaces constrain the possible form of the graph. Connections between nodes (eg nerve fibres in the brain) cannot cross in a simple 2D space. Perhaps 2D spaces do not allow organisms to become sufficiently complex to become self-aware. […] It is worth bearing in mind that the Game of Life[32] is a 2 dimensional cellular automata, that has been shown to be Turing complete. As a consequence, computationalism asserts that some pattern of cells with the Game of Life is, in fact, conscious. Nevertheless, the patterns in the Game of Life are very “brittle”, and perhaps Darwinian evolution cannot function correctly in such a space. Perhaps it is the case that whilst self aware observers may be found within the Game of Life and other 2D universes, the overall measure of such observers is very low compared with 3 dimensional observers that have the advantage of evolving from simple initial conditions.” This argument based on the measure of observers is an application of the self-indication assumption.
    • “Having decided that space is most likely to be 3D, and time must be at least 1D, we need to ask the question of why these things appear in the Minkowski metric, with the time component having opposite sign in the metric to the spatial components. Tegmark here, gives a fascinating explanation based on the classification scheme of differential equations. Second order partial differential equations are classified according to the matrix of coefficients connecting the second order partial derivatives in the equation. […] Interestingly, only hyperbolic equations lead to predictable physics, to a physical world that is computationally simple and likely to be observed. And with hyperbolic equations, the metric of the underlying space must have a Minkowski signature: (+ − − − −), and if space is 3D, then time must be 1D. Of course this begs the question of why second order partial differential equations should be so important in describing reality. Roy Frieden has the answer: the solution to the problem of finding the extremum of Fisher information is an Euler-Lagrange equation, which is always a 2nd order partial differential equation!”

Here, Nick Bostrom describes during an interview the anthropic principle, the observation selection effect, the self-sampling assumption, the self-indication assumption, and their impact on cosmology.

Algorithmic Thermodynamics: Statistical Mechanics meets Algorithmic Information Theory

Today, I viewed a recording from FQXi 2014 where Scott Aaronson from MIT talks about the Physical Church-Turing Thesis. He brought up irreversibility. That made me think about the claim made by one paper I’d recently talked about that consciousness may be hypercomputational. Aaronson drew the link for me between hypercomputation and irreversibility. Hypercomputation implies irreversibility because, by definition, you cannot enumerate the sequence of instructions of a hypercomputation. If you don’t know how something was done, how could you undo it? After a brief trip trough the Wikipedia page on generative science, I was reminded of Ed Fredkin’s conservative logic gates which are able to conserve information. If energy and matter really are composed of information, it seems natural that if the universe is a computation it would be based on such conservative logic (to remain consistent with the apparent conservation of energy and matter). Steven Hawking has finally conceded that information cannot be lost.

As I reflected upon the link between physical irreversibility and hypercomputation with relation to the mind, I encountered a webpage written by Robert Fitzpatrick in 2006 that reads: “How does the irreversibility of macroscopic phenomena arise? It certainly does not come from the fundamental laws of physics, because these laws are all reversible. […] How can this be? How can we obtain an irreversible process from the combined effects of very many reversible processes? This is a vitally important question. Unfortunately, we are not quite at the stage where we can formulate a convincing answer. Note, however, that the essential irreversibility of macroscopic phenomena is one of the key results of statistical thermodynamics.”

I clicked “Next” and the following page introduced the concepts of a prior probability and happened to have the omega symbol in the context of statistical mechanics. I’m not familiar with statistical mechanics, but I am familiar with algorithmic information theory. The a prior probability on that page about statistical mechanics made me realize that a prior probabilities can be generated using Solomonoff probability… assigning higher probabilities to programs described with fewer bits (as formalization of Occam’s razor). The appearance of the omega made me consider how the halting probability embodies the very concept of irreversibilty; to be irreversible is, as I outlined above, to be non-computable. The halting probability is non-computable.

Okay, so there is an intimate connection that should be explored between statistical mechanics and algorithmic information theory. Can algorithmic information theory help explain how we obtain an irreversible process from the combined effects of very many reversible processes? Surely enough, I stumbled on a paper from 2013 co-authored by a researcher at Google entitled “Algorithmic Thermodynamics” which maps log runtime of a program to energy, length of a program to volume, and number of gas particles to a program’s output. “Charles Babbage described a computer powered by a steam engine; we describe a heat engine powered by programs! We admit that the significance of this line of thinking remains a bit mysterious. However, we hope it points the way toward a further synthesis of algorithmic information theory and thermodynamics. We call this hoped-for synthesis ‘algorithmic thermodynamics’.”

Can we apply this algorithmic information-theoretic explanation for emergence of irreversibility from fundamentally reversible operations to the causal set construction approach to quantum gravity and the information “integration as [lossy] compression” theory of consciousness?

Evolution as occlusion culling: the map precedes the territory

So, I’ve recently stumbled upon Tom Campbell’s TOE. His Big TOE, to be exact. Yes- in a previous post, my intuitive attempt to reconcile Koch & Tononi’s Integrated Information Theory, Hoffman’s Interface Theory of Perception, Hameroff & Penrose’s Orch OR theory, the Hindu Maya, the Greek Logos, the Chinese Tao and other such concepts led me to surmise that reality is likely nothing more than a dynamically changing information field where local synergy (the technical definition- not the buzzword) correlates with subjective experience. I even mentioned that love is the word we use to describe when this synergy is high within a closed system of interacting agents. Tom Campbell seems to be way ahead of me.

The concept of reality as a nonphysical, nonobjective changing information field resolving uncertainty (i.e., collapsing a so-called wave function) everywhere upon conscious observation has been expressed by many philosophers throughout history- Descartes (Evil Genius), Plato (Allegory of the Cave), the Vedic scholars, and, more recently, even the famed science fiction writer Phillip K. Dick (the man who brought us Blade Runner) and President Obama’s scientific advisor Professor James Gates appear have expressed that reality is fundamentally nonphysical. Nick Bostrom, a physicist and philosopher at Oxford, published a logical argument- with very few, and quite reasonable assumptions- concluding that it is highly likely we are living in a simulation. The results of Hoffman’s evolutionary game theory simulations seems to reconcile with Baudrillard’s precession of simulacra in the sense that evolution- the reconfiguration of an information field to reduce local entropy- can be expressed by the statement “the map precedes the territory.” One individual inspired the following analogy to help explain this conclusion that when it comes to evolution, the map of reality is more important than reality itself.

Occlusion culling is used in computer graphics to only render in a given moment the information from the underlying model that is necessary to achieve some particular objective. In 3D graphics, the objective of occlusion culling is to allow the user to interface with a video game or CAD program without requiring an infinitely powerful CPU or an infinite energy source. Our bodily sense organs serve the same purpose- they evolve to allow us to perceive enough of reality so that we can interact with it without exceeding our computational and energetic limitations. Both mechanisms collapse the probability distribution of many possible states at a given location into a mere subset of all information that can possibly exist. The causal determinism suggested by Laplace’s Demon has been refuted by concepts such as chaos theory and Cantor diagonalization- it is unlikely that there could exist an entity that could resolve all uncertainty everywhere into the entirety of information (let alone compute it, even within the amount of time that we suspect has elapsed thus far) to be able to predict with absolute certainty any future event. There is hardly anything as exciting as developing a comprehensive map that unifies the phenomenal with physical- one that not only unify physics with itself, but also with metaphysics.