However, most deep learning installer tutorials are lacking in several respects and the rest* don't really work (or skip steps)*. Thus the need for this anti-tutorial that simply gives you the magic one-liner (a-la installing brew) to get all the common libraries setup and configured (on the lastest P2 class machines running Ubuntu 16.04) in Amazon's EC2..

The incantation below will give you everything you could possibly need, including

- Latest CUDA 8
- Latest CuDNN 5.1
- MXNet
- Torch
- Theano
- TensorFlow
- Keras
- OpenCV & DLib

For the impatient I've made a public image (ami-be391da9) on AWS. Alright, (drum roll please) here's the single line install command you can simply paste it into the terminal:

`wget http://public-fileshare.s3.amazonaws.com/install.gz; tar xzf install.gz; rm install.gz; cd scripts; chmod +x master_installer.sh;./master_installer.sh`

Should take about 2 hours to complete. But once it's done, you can check that everything works:

As you can see, torch and all the python imports work, but please let me know if the comments if anything breaks for you. Happy coding!

]]>Every year since 2003, Stephen Wolfram and his group of instructors from Wolfram Research organize a summer school, and in 2010 and 2011 I participated as a student and instructor, respectively. Since 2003, more than 250 people have participated, some of whom continued developing their 3-week research projects as their Master’s or Ph.D theses or publications.

The basic subject of NKS is the study of simple abstract rules, that is, elementary computer programs. In 1988 Wolfram discovered a class of one-dimensional cellular automata which he called the elementary cellular automata (ECA). Simple programs are capable of a remarkable range of behavior, NKS argues that this is evidence that simple programs are enough to capture the essence of almost any complex system.

Generally speaking, NKS has three forms:

- The experimental science and enumeration of simple programs
- The technological application of interesting simple programs
- A razor for thinking of the universe in computational terms

The central thesis of NKS posits two ideas: that the nature of computation must be explored experimentally, and that the results of these experiments have great relevance to understanding the natural world (which is assumed to be digital).

Edward Fredkin and Konrad Zuse pioneered the idea of a **computable universe**. In physics and cosmology, digital physics is a collection of theoretical perspectives based on the premise that the universe is, at heart, describable by information, and is therefore computable. The computational universe is the space of all computer programs.

In order to study simple rules and their often complex behavior, Wolfram believes it is necessary to systematically explore all of these computational systems and document what they do. Traditional wisdom holds that real world is complex but that in formal systems on paper everything is simple. Rule 30 and other systems show that this is not the case. The richness of the world can be faithfully reproduced in formal systems, and in simple programs at that.

The randomness displayed in Rule 30 is a completely deterministic one. Indeed, in NKS randomness almost always refers to the outcome of a deterministic process. A synonym is ’complexity’. Now, there exist are three mechanisms for randomness:

- Randomness from initial conditions
- Randomness from the environment
- Intrinsic randomness generation

Sensitive dependence on initial conditions in deterministic nonlinear systems has been well studied since Henri Poincaré introduced the three-body problem by in 1890. Randomness from the environment is not a true source of randomness, because the environment is just another system. That is, the external environment of one system is the internal environment of a larger system. The third type of randomness was investigated by Wolfram and encountered on numerous occasions in his exploration of the computational universe. A common question is “why wasn’t intrinsic randomness discovered earlier?” Page 48 replies: “Indeed, even very early in the history of traditional mathematics there were already signs of the basic phenomenon of complexity. One example known for well over two thousand years concerns the distribution of prime numbers: the rules for generating primes are simple, yet their distribution seems in many respects random. However almost without exception mathematical work on primes has concentrated not on this randomness, but rather on proving the presence of various regularities in the distribution.”

The NKS strategy for exploring the computational universe can be summarized as follows:

- Identify a computational system to explore
- Devise an enumeration scheme for your system
- Construct an efficient implementation of your system
- Carry out a visual exploration of the space of system behavior
- Use machine learning to build detectors and filters for programmatic searching
- Investigate in more detail specific features and properties of selected rules

Loosely speaking, a process is computationally irreducible [p 737-750] if it cannot be computed by a faster process. An example of reducibility is Rule 90. Given the initial condition, and the coordinates of some cell, there is a quick procedure [p 608-609] to determine whether the cell will be black or white. On the other hand, irreducibility means that there can be no way to predict how a system will behave except by going through almost as many steps of computation as the evolution of the system itself. [p 739] Rule 150 is computationally reducible because there exists a program that computes the color of a cell on row T faster than the automaton itself can compute it. Rule 30 is computationally irreducible because (as far as anyone knows) there is no such ”shortcut” program. But we must ask if intrinsic randomness and computational irreducibility are the same thing. I think that computational irreducibility implies intrinsic randomness generation. But intrinsic randomness generation does not imply computational irreducibility, because one can imagine modifying rule 30 to slow it down. Wolfram posits a conjecture on computational irreducibility:

Theorem - Let WorstTime(T, n) be the longest time that Turing machine T takes to halt over all inputs of length ≤ n on which it does halt. (See computations of this, pp. 761, 763.) T is reducible if some Turing machine U with the same inputs and outputs which has

WorstTime(U, n)/WorstTime(T,n) → 0 as n → ∞.

Then as the number of states grows infinite the fraction of reducible Turing machines vanishes.

The previous conjecture is an instance of the more general Principle of Computational Equivalence (PCE). The principle states that systems found in the natural world can perform computations up to a maximal universal level of computational power. Most systems can attain this level and in principle can compute the same things as a universal Turing machine.

**Weak PCE**: Almost every rule is either trivial or universal.**Strong PCE**: Almost every computation is either trivial or universal.

Irreducibility is present within the theory of graphs, for instance, the theory of finite cubic planar graphs is undecidable. Consider a Turing machine with a two way tape single head s1, s2, · · · , sn. Let the tape have a certain initial configuration and allow the machine to run. Assume the machine performs a finite computation and halts. The main idea is to code the printout of all such computations as finite cubic planar graphs. Notice that in order to do this for a given machine we need only a fixed number of configuration to code our information i.e. black, s1, s2, · · · , sn, head, lest end of the tape, right end of the tape and halt. The requirement that the graphs be cubic appears to offer no more difficulty than that of guaranteeing this fixed number of unambiguous configurations. Thus, we can identify certain vertices with squares of the tape while the others exist in configurations describing what is in those squares and how the machine moves.

**Theorem** - The theory of finite cubic planar graphs is undecidable.

Proof: Consider a Turing machine as above. For each possible beginning position of the tape consider the sentence in the language of graphs which says

[there exists (x1, ..., xn) where xi are in the configurations which represent this initial position] implies that [there exists (z1, ..., zm) where zi are in the halt configuration].

A solution to the decision problem for finite planar cubic graphs would imply a solution to the collection of all sentences (*), but these simply express the halting problem for the machine, thus proving the theorem. QED

Note that the PCE is not about functions or sets but rather concerns repeated application of some updating procedure to some state, and the history of states that results. In these terms, PCE states: For any updating procedure, either that procedure always yields simple results, or that procedure can be made to emulate any computation via an appropriate choice of initial conditions. Computation is therefore simply a question of translating input and outputs from one system to another. Consequently, most systems are computationally equivalent. Proposed examples of such systems are the workings of the human brain and the evolution of weather systems.

Originally, in 1936, Alan Turing proposed that all functions which could be effectively computed by a human could also be computed by a Turing Machine. (Of course, the converse is obviously true anything computed by a Turing Machine can be computed by a human.) Since that time, Turing Machines have become widely accepted as the formal definition of computation, and Turing’s claim that this definition conforms to our intuitive understanding of the concept of computation is known as the Church-Turing Thesis. Note that the Church-Turing Thesis cannot be definitively proved. It is an empirical claim supported by observation, like the Newton's Law of Gravity or the Laws of Quantum Mechanics. Just as it is conceivable that those laws could be proved wrong by future observations, so too could the Church-Turing Thesis be proved wrong by future observations.

In every system there is a threshold for complexity. And although it is often low, there are many things that fall below it and can therefore be reduced in one form or another. Traditional mathematics has very successfully explored what lies below this threshold. But most systems from traditional mathematics are in fact capable of showing irreducible complexity. In particular, the digits of real numbers, continued fractions, Diophantine equations, differential and difference equations, solvability by radicals etc... But phenomena such as the complexity of the digits of π were just considered meaningless rather than indicative of a general class behavior. The reasoning must have been something like this: “There is no reducible pattern in the digits of π therefore it cannot be understood, therefore it is meaningless, so let’s ignore it.” Instead, NKS says “But isn’t that interesting!” So given a particular computation, how then can we determine that the computation halts? We know there is no general procedure that can answer all such questions. Nevertheless, there is a partial solution: if a computation does halt, then can we always verify that it halts by explicitly performing the computation, and then waiting for it to halt? (Of course, if the computation never halts, then we will be waiting forever.) But could there be a more efficient procedure of which we are unaware? The ideas in A New Kind Of Science suggest that the answer is “no”. One of the central claims of A New Kind Of Science is that there exist computational systems whose behavior is computationally irreducible, meaning that there is no shorter way to determine the system’s behavior than by explicitly following its computation. If computationally irreducible systems do indeed exist, then there is, in general, no faster way to determine whether an arbitrary computation halts than to explicitly perform that computation.

Now, Cellular Automata are the primary system that is studied in A New Kind Of Science, so it is important for us to show that Cellular Automata are equivalent to Turing Machines in their computational power.

**Definition** - Whenever a system is equivalent to Turing machines in its computational power, we say that it is **computationally universal**.

The name universal derives from the Church-Turing Thesis, which claims that Turing Machines can compute all that can in principle be computed. The technique that we use to show that one system is equivalent in computational power to another system is called emulation.

Above is a graphic that shows a Cellular Automaton emulating a particular Turing Machine. The essential idea behind the emulation is that

Identify the Turing Machine’s tape with the cellular automaton’s line of cells.

Identify the white and black cells in the Turing Machine with white and light-gray cells in the cellular automaton.

Identify the Turing Machine’s head with a single cell in the cellular automaton, where the color of the cell encodes the state of the head together with the color that the head is reading.

It is clear that the behavior of this cellular automaton corresponds directly to the behavior of the Turing machine. For example, if we conventionally declare that the Turing Machine halts when the head moves right of its initial position, then the corresponding cellular automaton should be considered to halt whenever the cell which simulates the head moves right of its initial position. Of course, there is nothing special about this particular Turing Machine. Any Turing Machine can be emulated by a cellular automaton using similar methods.

]]>