To measure the intelligence of intelligent machines has been an obsession of computer scientists since the dawn of the computer era. The pioneering genius of Alan Turing has bequeathed us a test based on perception. His famous "imitation game" sets up a human judge in conversation against some "thing" (or some "one") hidden behind a wall. If following a question and answer session the judge is unable to tell whether whoever is on the other side is human or not, then that machine should be deemed as "intelligent" as a human.
The Turing Test is lousy for many reasons. For a start, it is subjective and therefore unscientific. A person's opinion - the human judge's - cannot substitute for an objective and impartial measurement of a system's property. Philosophers have attacked the Turing Test from every angle. The mind philosopher John Searle demonstrated that a computer can "appear" to be intelligent by simply manipulating symbols on the basis of a logical algorithm, but without having any understanding of what those symbols may actually "mean". Therefore, according to Searle, a computer's output is not an indication of the internal property that we consider as "intelligence" -and uphold so dearly in our human societies and cultures. And yet we need to find a way to compare the intelligence of computers to our own, as well as to compare the intelligence of machines among their mechanical peers. This need is nowadays becoming ever more urgent as Artificial Intelligent and Machine Learning systems come of age.
Perhaps one way of addressing the measurement of machine intelligence would be to mimic the way we measure human intelligence. Although the debate around what "intelligence" actually means has by no means been resolved, most psychologists would agree that the standard IQ tests measure "something" that has predictive power when comparing cognitive results across a human population, or predicting an individual's future performance in a range of cognitive tasks. Perhaps then one could devise a test that explores a number of agreed areas of cognition - for example knowledge, memory, comprehension, vocabulary, etc. - then draw a set of questions that measure "machine IQ". But there are several problems with this approach. Firstly, it is too anthropocentric. Secondly, an IQ test is a snapshot of a subject's cognitive ability. In the case of humans that can be considered as adequate because our brains do not in any way augment, or get an "upgrade", during our lifespan. But this is not true of computers. Machine Intelligence can increase its power many orders of magnitude in a relatively short time - compared to a human life - thanks to technological developments in hardware and software engineering. Also, human IQ measurement is always an indication of cognitive ability compared to some larger group. When it comes to computers this is problematic since there is a wide spectrum of performance depending on a computer's power. Unlike people computers are not created "equal".
So to measure the "machine IQ" we need a new definition of intelligence that goes beyond the human - let's call it "universal intelligence". Universal intelligence could be defined in very general terms. AI researchers Shane Legg and Marcus Hutter have defined it as "the measurement of an agent's ability to achieve meaningful goals in a wide range of environments". A meaningful goal would be a goal that bears some significance to the agent's survival, purpose or well-being. The environmental dimension is important to include in the definition because an intelligent agent should be able to interact with a given environment and create the appropriate strategies to achieve its goal. If we accept such a general definition it follows that human intelligence is a subset of universal intelligence, and therefore machine intelligence can one day become greater that human - a very profound, and disquieting, conclusion indeed. Nevertheless, on the basis of such definition we can begin to think of dynamic IQ measurement, instead of that static, human-purposed one. This dynamic testing should reflect the potential of an intelligent machine to scale its cognitive ability as it becomes more powerful; and can also inform us how a given machine compares to others. Finally, we need to factor in our measurement the degree of complexity of environments in which the intelligent machine agent will build strategies, solve problems and achieve meaningful goals. This measure of complexity is crucial because it represents the degree of autonomy of a system. After all, we humans pride ourselves that we can be innovative, i.e. find solutions to new problems; we call this ability "creativity" and claim that computers are useless when it comes to that. Thankfully, there is a beautiful mathematical expression that measures the complexity of environments. It is called the "Kolmogorov" complexity. Problem solved?
Not quite. The Kolmogorov complexity may be beautiful but it is also "non-computable". This loosely means that there does not exist a computer (or an algorithm) that can solve the Kolmogorov complexity for every given circumstance. In other words, we cannot build a computer that can measure the intelligence of other computers, conclusively for every possible environment. We can only get an approximation, which may be good enough to begin with. After all, should machine intelligence surpass human in the near future, estimating the Kolmogorov complexity would become meaningless to us. Only superintelligent machines will be able to appreciate the problem of non-computability - if "appreciation" is something that a universally intelligent agent actually needs to "feel" in order to achieve its goal. Perhaps then, our treasured self-awareness, the highest level of human consciousness, will have become a relic of biological evolution surpassed by electro-mechanical agents capable of some new level of consciousness which will be simply impossible for us to fathom, comprehend, or measure.