1st International Planning Systems Competition 1998

Link to the International Planning Competition

The original Planning Competition Web page as maintained by Drew McDermott. There is also a very interesting article in the AI Magazine, Winter 1999, summarizing the competition.

5 Systems participated in the First Planning Systems Competition:

  1. Blackbox (STRIPS)
    by Henry Kautz (at this time at ATT) and Bart Selman ( at this time at ATT & Cornell)

  2. HSP (STRIPS)
    by Blai Bonet and Hector Geffner (at this time at Simon Bolivar University)

  3. SGP (ADL)
    by Corin Anderson and Dan Weld (at this time at University of Washington)

  4. STAN (STRIPS)
    bye Derek Long and Maria Fox (at this time at Durham University)

  5. IPP (STRIPS/ADL)
    by Jana Koehler and the ../ipp.html">IPP team (at this time at Freiburg University)

IPP source code and results

The source code provides the exact configuration of the system as it ran in the competition (but with some stupid bugs removed - though the computation of the DNF is still faulty). In the first round, IPP ran mainly in the basic mode without any RIFO search space reduction, which was occassionally invoked by hand and tested on unsolvable examples. Only the STRIPS gripper examples were solved using RIFO, which allowed IPP to scale to 12 balls, but also made the plans non-optimal because one gripper was removed by RIFO as irrelevant.

In the second round, IPP ran with the RIFO meta strategy doing the following: If more than 3500 ground instantiated actions and 35 objects are contained in the domain, RIFO is activated with its most stringent pruning strategy (corresponding to IPP switches -rr 1 -rm 4 -rl 3). If the planner cannot solve the planning problem within the pruned search space, the space is enlarged (corresponding to RIFO switches -rr 1 -rm 4 -rl 1). If the planning problem is still unsolvable, IPP tries to find a solution in the original search space. To prune search spaces by hand edit file ipp.c, set DO_META to FALSE, and follow instructions on RIFO options from the RIFO homepage.

Round 1 ADL

(170 test problems)

Note that several planners used a termination test and were able to prove problems from the mystery and mprime test sets as unsolvable. These were not counted as solved in the competition, i.e. solved only refers to generated plans for solvable problems. IPP could prove several problems as unsolvable, which we verified by repetition of the tests (they cannot be extracted from the competition output). We plan to add some information about the search space size and number of possible actions for each problem.

Av. time is the total time that was spent to generate all plans divided by the number of plans that was found. Fastest means that no other planner found the same plan in shorter time and shortest means that no other planner found a shorter plan, but there can be planners that found a plan of equal length. In this case, all planners receive a point. All times are given in seconds.

Planner Total Time Solved Fastest Shortest
IPP 22 (1421) 69 68 68
SGP 575 38 1 35

For ADL we list the total time spent on the problems. IPP solved the 38 problems that solved SGP plus 31 more difficult problems from the gripper, mystery, and mprime domains. No system solved any of the assembly problems, where IPP ran out of space because the ADL preprocessing generated several millions of instantiated actions. We show the total runtime for both systems for the 38 problems (where IPP is more than 25 times faster than SGP) and in braces the total runtime IPP needed to solve the other 31 problems. In total, IPP generated 662 actions with an average search time of 0.45 seconds per generated action. Even if IPP is written in C and SGP is written in LISP, IPP 3.3 is still significantly faster than SGP. Click on the figure above to see how IPP and SPG compare in the movie domain. Only on movie5, SPG could outperform IPP by 10 milliseconds. But the 30 movie runtimes of both systems are too small to allow for useful comparison.

The following table compares SGP and IPP on the remaining problems in the ADL set that both solved showing planning time in seconds and plan length (in braces):

Problem SGP IPP
mystery1 2.2 (4) 0.1 (4)
mystery2 203.3 (14) 12.2 (9)
mystery3 3.5 (4) 1.0 (4)
mprime1 7.1 (9) 1.1 (11)
mprime2 266.2 (12) 1.5 (9)
mprime3 22.9 (5) 3.2 (5)
mprime4 39.7 (17) 1.9 (11)
gripper1 0.4 (11) 0.04 (11)

IPP was also able to prove 6 problems in the mystery domain as unsolvable when repeating the competition tests on our machines under 10 minutes runtime restriction and 120 Mb memory restriction.

Round 1 STRIPS

(140 test problems)

Planner Av. Time Solved Fastest Shortest
IPP 7.4 63 29 49
Blackbox 1.5 63 16 55
HSP 35.5 82 19 61
Stan 55.4 64 24 47

This picture shows the runtime behavior of the four planners in the STRIPS version of the Gripper domain. The Y-axis lists the number of balls that have to be transported by a robot with two gripper hands. The columns show how far each planner scaled and how much runtime was spent. This domain was originally designed to motivate the use of resources in IPP because the STRIPS representation leads to a combinatorial explosion in the search space. All planner obviously suffer from this explotion, with the exception of HSP that does quite well. A plot of its runtime data seems to indicate non-exponential runtime (quadratic?). Interestingly, HSP plans only transport one ball at a time leading to lots of unnecessary move actions, while STAN and BLACKBOX generate optimal solutions. IPP has been run with RIFO switched on, which excludes one gripper as irrelevant, ie. non-optimal plans are found using only the left gripper. The plot of STAN's, IPP's, and BLACKBOX' runtime shows how hard STAN hits the combinatorial explosion, but IPP and BLACKBOX don't even provide data on the harder problem instances.

The movie domain was no challenge for any of the planners running in the STRIPS track. Since the goals remain constant over all 30 test problems, but only the number of objects to choose from changed, planning time was not an issue in this test set, but the systems just spent a slightly increasing amount of time on generating more action instances and building larger planning graphs. In fact, this domain contains lots of irrelevant information. RIFO detects that in all problems only 9 of the up to 174 actions and 5 of the up to 170 original initial facts are relevant.

In the picture, we had to divide HSP's runtime by 100 to allow for a meaningful plot. Even if the system is much slower than the others, it seems to scale almost linearly on the problem instances. Closer inspection of IPP's solution file revealed that the system was not run on the movie8 problem, which was probably accidentally skipped because the shellscript was not used all the time.

The logistics domain was quite difficult for all planners as the following table shows listing runtimes in seconds and in braces the length of the generated solution. Out of the 30 test problems, only between 3 and 5 problems were solved by the planners.

Problem BB HSP STAN IPP
log1 2.0 (27) 79.7 (43) 0.8 (27) 0.9 (26)
log2 6.4 (32) 97.1 (44) 4.3 (32) -
log5 - 14.4 (27) 364.9 (29) 2.4 (24)
log7 - 788.9 (112) - -
log11 6.5 (32) 86.2 (30) 12.8 (34) 6.9 (33)

The mystery and mprime domains were slightly easier for all planners. We show the number of plans that IPP found in a domain, the total runtime to generate them, the total number of actions that were generated, and the number of problems that can be proven as unsolvable by IPP when repeating the results on a SPARC I Ultra (which is almost exactly twice as slow as the PCs used in the competition.) Unsolvability could be proven during graph construction because the goals remained exclusive after the graph has leveled off and no search was necessary.

Domain solved total time # actions unsolvable proven
mprime 13 132.8 111 1
mystery 13 66.1 87 8

Round 2 STRIPS

(15 test problems, no ADL track)

Planner Av. Time Solved Fastest Shortest
IPP 17.3 11 3 8
Blackbox 2.5 8 3 6
HSP 25.8 9 1 5
Stan 1.3 7 5 4

Note that IPP could solve more problems than any other planner, but was never the fastest planner. The 3 points in the fastest category result from the fact that no other planner solved these problems. This is caused by the RIFO strategy to prune the search space that is currrently quite time consuming. We show again runtime and plan length (in braces) on the various problems:

Problem BB HSP STAN IPP
log1 0.5 (13) 4.3 (13) 0.1 (13) 0.2 (13)
log2 0.6 (20) 5.2 (27) 0.2 (20) 0.3 (20)
log3 0.9 (27) 10.5 (32) 0.4 (27) 0.6 (27)
log4 - 170.4 (59) - -
log5 - - - 66.5 (31)
mprime1 0.9 (5) 7.2 (4) 0.9 (5) 4.1 (5)
mprime2 2.5 (7) 12.8 (8) 4.8 (9) 2.6 (8)
mprime4 1.7 (5) 8.3 (4) 0.4 (5) 11.4 (4)
mprime5 0.5 (6) 4.6 (6) - 2.9 (6)
grid1 12.1 (14) 9.3 (24) 2.5 (14) 4.6 (20)
grid2 - - - 13.9 (31)
grid4 - - - 84.3 (47)

The influence of the RIFO strategy on the search space reduction is shown in the following table. It lists the number of ground actions and objects in the original problem description and in braces following the length of the plan if IPP could find one given a 10 minutes cpu time limit and a 120 Mb memory limit as in the competition, but on a SPARC 1/170 (which is approx. twice as slow.) Then we list the number of selected ground actions and objects after the strong and weaker RIFO selection process. (#) means the planning problem became unsolvable, - means RIFO was inactive, (-) means no plan was found because the planner either exceeded the cpu time or memory limit. The weaker RIFO heuristic is activated when either the strong heuristic makes a planning problem unsolvable as e.g. in the case of the grid3 problem or when the number of ground actions remains below the threshold parameter of 3500 and only the number of objects exceeds 35 as in the grid1 problem. A number in braces means shows the number of actions in the generated plan.

problem original RIFO strong RIFO weak
log1 571/25 (13) - -
log2 502/21 (20) - -
log3 958/26 (27) - -
log4 3561/42 (-) 189/30 (-) -
log5 4985/53 (-) 119/26 (31) -
mprime1 7809/36 (5) 7/9 (#) 11/9 (#)
mprime2 3281/32 (8) - -
mprime3 97259/68 (-) - -
mprime4 8485/42 (5) 7/10 (4) -
mprime5 6773/22 (6) - -
grid1 2610/38 (14) - 80/11 (20)
grid2 4501/50 (-) 69/19 (#) 260/19 (31)
grid3 7256/64 (-) 315/21 (#) 557/21 (-)
grid4 11151/80 (-) 135/24 (47) -
grid5 16240/98 (-) 1481/49(-) -

As the analysis shows, only 8 problems can be solved without RIFO, but in using the meta strategy 3 more solutions are found. The meta strategy combines only 2 out of the 12 selection heuristics. Which combinations work for which examples has to be found out by experimentation. In the competition, no such experimentation was possible and therefore the selection of the heuristics was done long before the competition simply following the intuition that one should try the strongest possible heuristic first because it leads to the smallest search space, but then relax it by allowing more actions when incompleteness occurs. Only the threshold parameters that decided whether RIFO is activated at all have been set to 3500 actions and 35 objects after a few trials on some of the competition problems.


Jana Koehler, 9/1/1998