Translate this page to other languages

Using AI to increase assessments accuracy of random tests of intellect

Abstract

It is given a definition of a random intellect test (that assesses intellect of AIs, humans or not humans) and described a method of reducing of assessments errors caused by variations in the complexity of its tasks.

Definition and example of a random intellect test

A random intellect test (RIT) is a test of intellect provided that its tasks with approximately equal complexities are created by an almost perfect random number generator (RNG) and optionally by moves of testee.

For example, a RIT could be a computer puzzle game Color Lines https://en.wikipedia.org/wiki/Color_Lines with such changes: new balls are not added except for the five random balls that are added each time after the disappearance of the five-ball line, so the number of balls on the board (8x8) is always constant and is equal to their initial given number. The test may consist of several independent games called rounds. The changed game, i.e. RIT, stops after a given number of disappeared five-ball lines, not after a given time limit. The more moves has made testee to solve RIT tasks, the worse her/his/its result (intellect assessment).

Tasks complexities variations

Different tasks of RIT have approximately equal complexities, but not exactly equal ones. This causes an error in the measured result – in intellect assessment. To reduce the error, it’s necessary to evaluate complexity of RIT tasks by using AI and optimally take them into account in assessment.

Ideal RIT tasks solver

Ideal RIT tasks solver (IRITTS) is an abstraction of an AI solver of tasks of RIT which is an abstract machine-computer https://en.wikipedia.org/wiki/Abstract_machine (with a specified time for any operation, for example 1 second) which gets at all times the best results in RIT tests among all AI, humans and not human testees.

To determine which AI is the best/smartest IRITTS candidate (BC), the AIs must solve a lot of RIT tasks. The less the number of moves that an AI does while solving, the smarter it is.

Time as a measure of complexity of a RIT task

Having a RIT task, the elected AI finds the best move from all possible moves. Also the AI, i.e. the abstract machine, needs certain time, better to say - certain number of moves, to find the move. The more complexity of a task, the more time (moves) the machine needs. So the time (t), or a value of a function of the time F (t), could be a measure of complexity of the given task and be called objective complexity (OC).

Formula for measuring the intellect

Considering the complexities of RIT tasks, the formula for measuring the intellect (I) of an AI, an human or not human, should be:

I = k * T / M (1)

where
M - total number of moves which testee has made;
T - sum of t or sum of values of F (t) of all test tasks solved by BC.
To assess a testee, the BC must begin with the board state (bs) that testee sees after RNG interventions (RNGI), must sum OC of board states after own moves but only those moves that do not lead to RNGI and to add OC of bs to the sum. For example, the testee sees bs1 and makes moves that gives bs2, bs3, bs4 (after RNGI), bs5...BC takes bs1 and makes up to RNGI moves that gives bsbc1, bsbc2, sums their OC and add OC of bs1 to the sum, then takes bs4 and makes moves that gives bsbc3, ...;
k - a coefficient.

If T = 1, the formula is:

I = k / M (2)

This formula should be used to determine the BC by solving a lot of RIT tasks.

Reduction of intellect measurement errors

Let's, we have selected the BC and have also data on how (moves and timestamps) the candidate has passed the RITs many times. With formula (2) it was calculated average value of intellect (IAV) of the candidate – IAVS (where S - standard). This value is independent from any F (t). Calculating the intellect with formula (1) and F1 (t) we have got another IAV - IAV1. Taking IAVS as a standard, we should adjust the coefficient k to get IAV1 = IAVS. So we able to recalculate (with formula (1) and the new k) average value of intellect for every passed test and to get set of the values {I1} (with mean = IAVS). Having set {I1} we are able to find standard deviation (SD) – SD1. The same way, it is possible to find set {I} and SD for every F (t). It's obvious that F.best (t) better minimizes errors than any other F (t), if SD.best is a minimum in set SD.

https://groups.google.com/forum/#!topic/si.comp.ai/ptHYr4G-Bz4

Oleg Goryunov