Search                        Top                                  Index
TEACH MATCHES                                 Revised A.Sloman Oct 1996

<list> matches <pattern> -> <boolean>

This file introduces the use of the Pop-11 pattern matcher. In Sept 1996
it was extended to include the use of the pattern prefix "!" which
allows matcher variables to be lvars. Readers who are familiar with
pattern matching will find a more complete and concise summary in
HELP * MATCHES, but not including the use of "!" described below.


                        PATTERN MATCHING IN POP11
                        =========================

         CONTENTS - (Use <ENTER> g to access required sections)

 -- Introduction
 -- What is a list?
 -- Patterns are lists used as templates for matching
 -- Some examples
 -- How MATCHES works: a first approximation
 -- Matching arbitrary elements
 -- Digging out elements of a list using ??
 -- Getting practice with pattern variables
 -- Repeated pattern variables
 -- Constraining a variable to match only one item, using "?"
 -- Revision questions
 -- Note on "!" with patterns
 -- Further reading

-- Introduction -------------------------------------------------------

POP11 has a built in 'pattern matcher' which can be used for checking
the correspondence of a list with a pattern. The pattern matcher
provides a powerful tool for operating on lists, and makes it much
easier to write list-processing programs than if you use the "lower
level" facilities such as "hd" and "tl", sometimes referred to as
"first" and "rest", or "car" and "cdr".

This TEACH file introduces the use of the matcher, but without
mentioning all of its capabilities, since that would be too confusing.
At the end of this file there are pointers to additional things to read,
giving information about the more sophisticated facilities. If you look
at TEACH RESPOND after this you will see how to use the MATCHER to write
an Eliza-like program.

In AI programs it is very common to use lists to represent information.
The pattern matcher, and other procedures defined in terms of it, can be
used for checking whether some information is present in a list or list
of lists, and for extracting components from memory stores. It can also
be used for comparing a sentence typed in by the user with various
templates and choosing an action depending on which template the input
matches.

-- What is a list? ----------------------------------------------------

In POP-11 lists are created using square brackets. Thus the following is
a list of words:
    [the cat sat on the mat]

this is a list of numbers:

    [1 2 3 99 98 97]

and this is a list containing three strings:

    ['the cat' 'my dog' 'your boat']

To be more precise, those are list expressions, which instruct Pop-11 to
create the appropriate lists, which are special structures that can be
stored in the computer's memory, then combined, searched, changed, etc.

Pop-11 allows lists that contain a mixture of items of different sorts.
The following list contains words, numbers, lists of words and a list of
lists of numbers.

    [1 cat 2 dog [a b c] [d e f] [[66 67 68] [200 205 210 215]] 99]

In other words, Pop-11 allows lists to contain any mixture of objects of
arbitrary types, unlike some languages that require all elements of a
list to be of the same type.

A list expression can extend over several lines: spaces and newlines are
treated as equivalent. E.g. this is one list:

    [a list
        expression containing
        several words]

-- Patterns are lists used as templates for matching ------------------

A pattern is itself a list, which may contain 'pattern elements'. An
example of a pattern is:

    [== is a ==]

This can be used as a template which would match the lists:

    [fred is a student]
    [the oldest person in the romm is a very happy woman]
    [[joe bloggs] is a [champion golfer]

or even:

    [ sad asdf gobbledigook is a junkrubbish asdfasd sadf]

The pattern element "==" will "match" any sequence of words in a list, a
bit like a joker, or "wild card" in a card game, which can be used to
stand for any card. The difference is that "==" can stand for any
subsequence of list elements, even an empty subsequence, so that these
lists will also match the above pattern:
    [fred bloggs is a]
    [is a merry old soul]
    [is a]

The normal format for using the matcher is:

    <list> matches <pattern>

That is a POP-11 instruction that produces a result that is either TRUE
or FALSE, and may have additional useful side effects, as will be shown
below.


-- Some examples ------------------------------------------------------

To get a feel for how you use the matcher, try out the commands below,
using the Load This Line or Mark and load commands. The result will be
printed in your 'output.p' file. Before loading each line see if you can
tell whether the result will be TRUE or FALSE.

    [ 1 2 3] matches [1 2 3] =>
    [1 2 3] matches [3 2 1] =>
    [steve is a teacher ] matches [ ==  is a == ] =>
    [fred is not a computer] matches [ == is a == ] =>

Experiment with different lists and different patterns to see which ones
produce TRUE and which FALSE. Insert and delete words and pattern
elements in the above lists and patterns.

Compare the last one with

    [fred is not a computer] matches [ == is == a == ] =>


-- How MATCHES works: a first approximation ---------------------------

The operation called "matches" takes as its inputs two lists, one
written on each side. The first is sometimes called a "datum", the
second a "pattern".

It decides whether the list on the left (the datum) matches the pattern
on the right, by comparing the two element by element. If it does match
the operation produces the result TRUE, otherwise the result FALSE. It
always produces a "BOOLEAN" result, so it can be used in conditional
expressions in the form

    if <datum> matches <pattern> then ....

If it finds an embedded list on one side there must be an embedded list
in the corresponding place on the other side, and it will compare them,
as in this example:

    [[the old mayor] of [new york]] matches [[the ==] of [== york]] =>

If the pattern element "==" is found on the right hand side, the matcher
will try matching different numbers of items from the list on the left
hand side with it, to see if it can find a way to make everything else
match. There are other sorts of pattern elements, which are interpreted
differently, described below.

This sort of ability can be used in programs like Eliza to sort
sentences typed by the user into different categories, to be treated
differently, as explained in TEACH RESPOND. For example you can write
programs of the form:

    if      <input> matches <pattern1> then <output1>
    elseif  <input> matches <pattern2> then <output2>
    elseif  <input> matches <pattern3> then <output3>
    elseif  <input> matches <pattern4> then <output4>
    .... etc ...


-- Matching arbitrary elements ----------------------------------------

You can use '==' to represent any number of 'don't care' elements,
including NO elements. Try the following and see which are true, which
false (in each case, try to work out the result for yourself before you
get POP-11 to obey the command, using the Load This Line command.

    [1 2 3 4] matches [1 == 4] =>
    [1 2 3 4] matches [1 == ] =>
    [1 2 3 4] matches [2 == 4] =>
    [1 2 3 4] matches [== 4] =>
    [1 2 3 4] matches [1 == 2 3 4] =>
    [1 and also 2 3 4] matches [1 == 2 3 4] =>
    [1 2 3] matches [== 2 ==] =>

In each case where the result is FALSE try changing either the list on
the left or the pattern to make it TRUE, and vice versa.

Notice the different ways in which == will match an empty sublist
    [ 1 2 3] matches [== 1 2 3] =>
    [ 1 2]  matches [ 1 == 2] =>
    [ 1 2 3] matches [== 3 ==] =>

Will these come out true or false?

    [1 2 3] matches [== 1 == 2 == 3 == ] =>
    [1 2 3 4] matches [ == ] =>

Work out your answer then try them. Try constructing a list containing
the numbers 1 to 6 that will match this pattern:

    [== 3 == 2 == 1 == ]

Then create a list that will not match the pattern. Test your answers.

"==" is called a "segment" pattern element because it will match a
segment (i.e. a sublist) of arbitrary length in the datum. It is an
anonymous segment pattern element because it has no name, and you
therefore cannot refer to the segment of the list that was matched.
Later we'll see how to use ?? with a name, to achieve that.

"=" is another type of anonymous pattern element, which will match only
ONE item at a time in the datum. See which of these is true, and which
false:

    [1 2 3 4 5] matches [ 1 = 3 = 5] =>

    [1 2 3 4 5 6] matches [ = 2 = = 5 =] =>

    [1 2 3 4 5 6] matches [ = 2 = 5 =] =>

    [the cat is on the mat] matches [= is on =] =>

    [[the cat] is on [the mat]] matches [= is on =] =>

To understand the last example you have to realise that an embedded
list, like [the cat], is treated as ONE element of the list on the list,
so it can match an occurrence of "=".

-- Digging out elements of a list using ?? ----------------------------

Often you don't merely want to see if something is recognised as fitting
a pattern. You also want to get at the components of the list, and use
them in a subsequent command.

For instance, consider a conversational program with the following
strategy:

        user types                      program responds
    FRED IS VERY HAPPY          SUPPOSE FRED WERE NOT VERY HAPPY
    SMOKING IS BAD FOR YOU      SUPPOSE SMOKING WERE NOT BAD FOR YOU
    EVERY COMPUTER IS STUPID    SUPPOSE EVERY COMPUTER WERE NOT STUPID.

The rule can be expressed in English something like this.

Rule1 (English):

    If the input contains some words followed by "is" followed by some
    more words, then the output must contain the word "suppose" followed
    by the first set of words, then "were not" followed by the second
    set of words.

In order to express that in Pop-11 we shall need to use a conditional
instruction of the form:

    IF <expression1> THEN <expression2> ELSE <expression3> ENDIF

This says test the result of <expression1>. If it is true, then
perform the actions in <expression2> otherwise perform the actions in
<expression3>

We can use that in the following translation (notice that we are using
four variables "input", "first", "second" and "output", which have
not yet been declared):

Rule1 (Pop-11):

    if  input matches [??first is ??second] then
        [suppose ^^first were not ^^second] -> output;
    else
        [Sorry I do not understand] -> output;
    endif;

(Don't try typing that in yet. We have not defined INPUT so it will not
work.)

Here "??" is used to define a PATTERN ELEMENT used in matching and
remembering what is matched, while "^^" is used as a LIST CONSTRUCTOR
ELEMENT to produce a new list starting with the word "suppose", that is
assigned to the variable "output".

The above rule would sometimes behave rather stupidly, for instance
responding to:

    I DONT THINK THAT IS VERY SENSIBLE
with
    SUPPOSE I DONT THINK THAT WERE NOT VERY SENSIBLE

Never mind the stupidity for now. Let's look at the important
programming concepts.

Let's look more closely at the new pattern elements used in the first
line

    if  input matches [??first is ??second] then

Instead of the anonymous pattern element "==" we now have two new
pattern elements ??first and ??second. These ensure that if "matches"
returns the result TRUE, i.e. the match succeeds, then the variables
first and second will remember which segments of the input list were
matched against the two pattern elements before and after "is". The
variable first will then contain a list of items from the input which
were before "is", and the variable second will contain a list of the
items that came after "is". (Either or both could be empty lists).

If the value of input were the list [fred is very happy], matches would
do the following assignments:

    [fred] -> first;
    [very happy] -> second;

Test that as follows

    vars first, second;
    [fred is very happy] matches [ ??first is ??second ] =>

Then find out the values of first and second thus:

    first =>
    second =>

The new values of the variables FIRST and SECOND can then be used in
conjunction with the double-up-arrow symbol '^^' to build up
the response

    [suppose ^^first were not ^^second]

which in this case would evaluate to

    [suppose fred were not very happy]

Try it:
    [suppose ^^first were not ^^second] =>

You can think of "^^" as meaning, roughly, 'use the value of the
following variable to insert elements into the list'. If the value of
the variable following "^^" is not a list, a mishap will occur.

For now, we shall not pursue the use of "^^". (For more on "^^" see
TEACH ARROW, and TEACH RESPOND).

To see how the matcher assigns lists to pattern variables in the above
examples, try getting POP-11 to obey the following. Note that we start
by telling POP11 that we are going to use FIRST and SECOND as variables:

    vars first, second;
    [fred is sad] matches [ ??first is ??second] =>
    first =>
    second =>

Then try:
    [father christmas is a very old man] matches [ ??first is ??second] =>
    first =>
    second =>

Try further variations of your own. You can use more than two "pattern
variables", e.g.

    vars a, b, c;
    [alice andrews stood between bertha butlin and cathy carter]
        matches
    [??a stood between ??b and ??c ] =>

    a=>
    b=>
    c=>

Notice that an instruction using matches can extend over more than one
line, as can other Pop-11 expressions such as an addition:

    3
    +
    4 =>

-- Getting practice with pattern variables ----------------------------

What will be the values of X and Y after the following matches tests?
First try to work out the answers, then get POP-11 to tell you.

    vars x, y;
    [a b c] matches [a ??x] =>
    x =>

    [a b c] matches [??x c] =>
    x =>

    [a b c] matches [??x ??y] =>
    x =>
    y =>

    [1 2 3 4] matches [??y] =>
    y =>

Use the Load This Line command to find out the answers.

Notice that the items not preceded by '??' are 'constants', i.e. they
only match themselves.

A pattern variable may get the empty list as its value if there are no
corresponding elements in the other list. Try:

    [a b] matches [a ??x b] =>
    x =>

    [1 2 3] matches [??x 1 2 3 ??y] =>
    x =>
    y =>

When a match fails, the new values of the variables are unpredictable. Try

    undef -> x; undef -> y;
    [1 2 3 4] matches [??x 3 5 ??y] =>
    x =>
    y =>

-- Repeated pattern variables -----------------------------------------

If you use the same variable twice over in a pattern, the match will be
TRUE ONLY if there is a corresponding repetition in the list on the
left. Try the following:

    vars x, y;
    [war is war] matches [??x is ??x] =>
    x=>

    [war is very nasty] matches [??x is ??x] =>

    [war is very nasty] matches [??x is ??y] =>
    x =>
    y =>


In some cases you may find it a little unobvious how the use of repeated
variables affects the matcher. Try the following, and see if you can
anticipate how the match will work.

    [1 2 3 1 2 3] matches [??x ??x] =>
    x=>

    [1 2 3 4 5 6] matches [??x ??x] =>
contrast:
    [1 2 3 4 5 6] matches [== ==] =>

Thus, repeating a variable makes the pattern more restrictive than
repeating the "joker" (or "wild card") symbol '=='.

Repeated matching of the empty list is allowed. Try:

    [1 2 3 4 5 6] matches [??x ??y ??x] =>
    x=>
    y =>

-- Constraining a variable to match only one item, using "?" ----------

So far we have prefixed variables with the double query symbol '??',
which tells POP-11 to match the variable with any number of elements in
the left hand list. We sometimes want to make sure we get only one item
from the list on the left, and assign that item to a variable. Try:

    vars first rest;
    [a b c d] matches [?first ??rest] =>
    first=>
    rest =>

Notice how the use of '?' restricts ?FIRST to match ONE item which is
assigned to it (the head of the list), whereas '??' allows ??REST to
match any number of items (the tail of the list). The items matching
REST are put together into a new list and assigned to the variable REST.

Compare the following:
    [a b c d ] matches [?first ?rest] =>
    [a b c d] matches [??first ?rest] =>
    first =>
    rest =>

In the second case it might have been better to use "last" rather than
"rest".

Try writing a matches command that will assign the second element of a
list to the variable second, e.g. complete the following instruction in
such a way as to assign "b" to "second"

    [a b c d e]  matches ........  =>

Try making it assign the second last element to "x".

I.e. You can use the single query "?" to ensure that the matcher digs
out just the first or second or or the third or .... the last element of
the list.

As explained above you can use "=" as an "anonymous" pattern element to
match exactly one item, to contrast with "==" which will match any
number. Thus if we are interested only in the second, or second last
element of a list, we can use a combination of "=", "==" and "?" to
extract it, thus

    vars x;
    [a b c d e] matches [ = ?x ==] =>
    x =>

    [a b c d e] matches [ == ?x =] =>
    x =>

Try those out with different lists on the left hand side.

How would you use the matcher to dig out the third last element of a
list? Try it out and make sure it works.

You can mix 'queried' variables, with the 'anonymous' pattern elements
'=' and '=='. So the following will make x stand for the object
occurring immediately after c :
    [a b c d e] matches [== c ?x ==] =>
    x =>

    [a 1 b 2 c 3 d 4] matches [ == c ?x ==] =>
    x =>

Having done that, try a version which will set x to be whatever comes
just before the number 3, in various lists. E.g. complete the following
so as to make x get the value 2.

    [a 1 b 2 c 3 d 4 ] matches .......

-- Revision questions -------------------------------------------------

    1. What does MATCHES do? What inputs does it take, and what sort of
        result does it produce?

    2. Explain how '==' works with MATCHES.

    3. Explain how '??' works with MATCHES.

    4. What is the difference between '??' and '?'.

    5. What is the difference between '==' and '='.

    6. What are anonymous pattern elements?

    7. What are segment pattern elements?


-- Note on "!" with patterns ------------------------------------------

There is an extension to Pop-11 available at Birmingham which allows
patterns to be preceded by "!" which has the effect that if the pattern
is used inside a procedure, then the pattern variables preceded by "?"
or "??" can be made "lvars", so that they do not need to be declared as
"vars".

This reduces bugs in programs due to unintended interactions between
global variables, and also allows variables declared as "lvars" to be
used in patterns. Example:

define next_after(item, list) -> next;
    ;;; given item and list, return the first thing after item in list
    ;;; or false if there isn't one.
    ;;; Note: input and output variables (item, list and next) are
    ;;;  automatically declared as lvars.

    lvars found;    ;;; the pattern variable
    if list matches ! [ == ^item ?found == ] then
        found -> next
    else
        false -> next
    endif
enddefine;

/*
;;; Test that:

next_after(3, [1 2 3 4 5]) =>
** 4

next_after(6, [1 2 3 4 5]) =>
** <false>

next_after("cat", [the cat [did sit] on mat] ) =>
** [did sit]

*/

That procedure would not work without the "!" list prefix. The space
between "!" and "[" is not required. It was inserted for clarity.

Strictly speaking, preceding a pattern with "!" inside a procedure does
TWO things. First it treats all the pattern variables in the list,
i.e. all those following "?" or "??", as lvars local to the current
procedure. If you have not declared those variables as lvars you will
get a warning message. Try removing the lvars declaration from the above
procedure and recompiling it.

Secondly it changes the pattern so that instead of words following "?"
or "??" it contains identifier records, which the pattern matcher can
use to "communicate with" the local lvars in the procedure. This is
possible only since Poplog Version 15.0

The change to the pattern contents may show up if you use the Pop-11
tracer. This is how patterns print out without and with "!".

    [ == cat ?next == ] =>
    ** [== cat ? next ==]

    ! [ == cat ?next == ] =>
    ** [== cat ? <ID next <undef next>> ==]

The <ID ...> tells you that a word has been replaced with an identifier,
and the <undef next> tells you that this is an undefined global
variable.

If you give the identifier a value, its value shows when the pattern is
printed out:
    "dog" -> next;
    ! [ == cat ?next == ] =>
    ** [== cat ? <ID next dog> ==]

Inside a procedure, the default undefined value is likely to be 0,
though that depends on the implementation.

-- Further reading ----------------------------------------------------

TEACH * RESPOND shows how to make use of the matcher in designing an
Eliza-like program.

TEACH * ARROW gives more instruction in using '^^' and other constructs
to build lists in POP-11.

If you'd like some additional exercises giving more features of the
matcher try TEACH * MATCHES2

The Pop-11 PRIMER gives more information about the matcher, with
examples.

TEACH * LISTS introduces some of the other procedures provided in POP-11
for operating on lists.

For more experienced users, HELP * MATCHES gives a summary of the
facilities of the matcher, including several more sophisticated than
those shown here.

This is a modified version of the Poplog TEACH file.

--- C.all/teach/matches ---------------------------------------------------
--- Copyright University of Sussex 1987. All rights reserved. ----------

--- $poplocal/local/teach/matches
--- Copyright University of Birmingham 1996. All rights reserved. ------