Poplog pop11 tutorial on fastgram2

Search                        Top                                  Index

TEACH FASTGRAM2                                Aaron Sloman October 2011
                                                    Based on older teach
                                                files by several authors


               GENERATING AND ANALYSING SENTENCES (PART 2)
               ------------------------------------------

           [NB THIS IS A FIRST DRAFT -- USE AT YOUR OWN RISK!]

This is a  sequel to  TEACH FASTGRAM1,  which introduced  the idea  of a
formal grammar as a sort of program  and showed how it could be used  to
control  generation   of   linguistic  structures   (phrases,   clauses,
sentences)

This sequel goes into more detail than I originally intended. It may be
better to split it into smaller separate TEACH FILES. Please provide
feedback.

There will be further sequels, combining the grammar ideas with other
ideas, e.g. planning, communicating, perceiving.

Please look at the mini introduction to editing commands if you have not
previously done so or need revision:

    <ENTER> teach minived <RETURN>


CONTENTS - (Use <ENTER> g to access required sections)

 -- Motivation: why we need to be able to analyse sentences
 -- A blocks-world grammar and lexicon
 -- -- EXERCISE 1: Extending the examples given
 -- -- EXERCISE 2: Create test examples and devise a notation
 -- A very incomplete grammar and lexicon
 -- -- EXERCISE 3: extend and enrich blocks_gram blocks_lex
 -- -- EXERCISE 4: use generate with your grammar and lexicon
 -- -- EXERCISE 5: using setup to create parsers, to test sentences
 -- Using showtree
 -- EXERCISE: Make a summary of what you have learnt from this file
 -- TO BE ADDED

=======================================================================

-- Motivation: why we need to be able to analyse sentences

If someone says

    Please put the box next to the green chair on the table

that request could have two very different interpretations.

Both share these assumptions:

    There is a box
    There is a green chair
    There is a table

On one interpretation the box is next to the green chair, and the
request is to move it from there to a location on the table.

On the other interpretation there is a green chair on the table (perhaps
a model chair?) and the request is to move the box on the table next to
the green chair.

How could a robot deal with such requests?

It is impossible to work out simply from grammatical structures and the
meanings of the words which interpretation is correct. But at least it
is possible to work out that there are those two interpretations and
then use some aspect of the context to decide which interpretation is
correct. This teach file introduces ways in which a grammar can guide
the interpretation of sentences by breaking them up into meaningful
parts, in this case

    Please
    put
    the box
    next to
    the green chair
    on the table

where some of those parts also have parts that contribute to the
meaning, e.g. the determiner "the", the adjective "green" and the noun
"chair"

-- A blocks-world grammar and lexicon

This section could lead into a major project. For now just skim it,
and come back later (or dwell on it) if you are interested.

Using the ideas in the previous teach file we could try to construct a
grammar that can be used to give simple commands and then later extend
it to ask simple questions, and then show how the Pop11 grammar library
can parse those commands and questions, or in some cases why a more
powerful library (called TPARSE, introduced later) is needed.

Suppose we want to allow questions and commands, and we want both to be
able to refer to things with properties (e.g. size, colour, shape),
spatial relations (e.g. next to, inside, on,) and actions that can be
requested or commanded (e.g. put, place, pick up). We may allow some
objects to have unique names, e.g. Box2, Block3, Table1).

Using the ideas of the previous teach file, and the above comments we
can allow things to be referred to by different sorts of constructs
using:

    names:      Box2, Block3, Table1, Fred, Mary ...

    size adjectives:  big, small, tiny ...

    colour adjectives: blue, brown, yellow, ...

    material adjectives: wooden, glass, plastic, ...

    "determiner" words that can be followed by a noun
    or a qualified noun phrase: the, a, some, any, that ...

    prepositions that indicate spatial relations
        next to, inside, above, ...

    action words
        put, place, fetch,

and for questions

    thing query word: which, what

    person query word: who

    place query word: where


-- -- EXERCISE 1: Extending the examples given

    Try extending those lists. E.g. what other query words could be
    required in an intelligent domestic robot?

    If you wish to work on this topic it may be a good idea to copy the
    above into a new file owned by you. You can mark the range of text
    you want to copy (F1 or ESC m, to start, and F2, or ESC M)

    Then start a new file e.g.
        ENTER ved my_gram2.p RETURN
    (NB: don't try to use spaces in file names: they cause trouble
    in linux and Ved will be confused by them. Underscores are fine).

    Then, with the editor cursor in the new file, you can invoke
    Ved's "Transcribe In" command, abbreviated to 'ti':
        ENTER mi RETURN

    Then save the new file, to be safe:
        ENTER w1 RETURN


-- -- EXERCISE 2: Create test examples and devise a notation

Using the lists of components in Exercise 1, create some examples of
questions, commands and assertions, and devise a notation for showing
how your sentences are constructed from the components specified.

Tip 1:
It is useful to separate the lexicon, which contains lists of words
of the various allowed types, from the grammar that specifies ways in
which larger structures can be composed from smaller structures.

The previous teach file TEACH * GRAMMAR1 illustrated that separation.

    (you can go back to it by putting the ved cursor where the
    above asterisk is and typing: ESC h)

Tip 2:
Use abbreviations, similar to those used in the previous teach file.
E.g.
    S for sentence
    QS for question sentence
    CS for command sentence

    SizeAdj, ColourAdj, ...

    V (for verbs)

    TQ for Thing query word
    PQ for place query word

    Det for determiner.

You probably won't have time to do a complete job of specifying such a
grammar and lexicon. But even starting the task can be very illuminating
and help us not only to think about the problems of modelling linguistic
communication, but also the problems of giving a machine even to think
about or perceive its environment.

-- A very incomplete grammar and lexicon ------------------------------

Here is a fairly simple version of a grammar and lexicon for a tiny
subset of the blocks world. You can try extending it, after playing with
it.

vars blocks_gram blocks_lex;

[
    ;;; A sentence is a command or a question
    ;;; LIB GRAMMAR requires the grammar to start with 's'
    [s [COM] [QUEST]]
    ;;; A question asks about
    [QUEST
        ;;; the location of an object
        [WH_LOC VBE NP]
        ;;; what is in some spatial relation
        [WH_THING VBE PP]
        ;;; which member of a category is in some relationship
        [WH_SELECT SNP VBE PP]
        ;;; whether a specified object is in some relationship
        [VBE NP PP]
    ]
    ;;; Two sorts of commands
    [COM [V NP PP] [V NP ONTO_PP]]
    ;;; Noun phrases can be
    [NP
        ;;; a pronoun
        [PN]
        ;;; determiner and simple NP
        [DET SNP]
        ;;; determiner, simple NP and prepositional phrase
        [DET SNP PP]
    ]
    ;;; Simple NP can be a noun or adjective phrase + noun
    [SNP [NOUN] [AP NOUN]]
    ;;; adjective phrase is one or more adjectives
    [AP [ADJ] [ADJ AP]]
    ;;; prepositional phrase is preposition + NP
    [PP [PREP NP]]
    ;;; Target prepositional phrase indicating a location
    [ONTO_PP [ONTO_PREP NP]]
] -> blocks_gram;

;;; Now the lexicon. Try adding some words where you think more could fit.
[
    [NOUN       block box table one]
    [PN         it]
    ;;; we cheat by inventing compound words
    [V          put move pickup putdown]
    [VBE        is]
    ;;; question words location, identity, selection
    [WH_LOC     where]
    [WH_THING    what]
    [WH_SELECT    which]
    ;;; It might be better to divide this into different kinds of
    ;;; adjectives, e.g. COLOUR_ADJ SIZE_ADJ
    [ADJ        white red blue green big small large little]
    [PREP       on above over]
    [ONTO_PREP  onto]
    [DET        each every the a some]
] -> blocks_lex;

Copy that into a new file:

        ENTER ved blocks_gram.p RETURN

And then edit it, as you wish, as suggested below.

-- -- EXERCISE 3: extend and enrich blocks_gram blocks_lex

Try extending the above grammar and lexicon, and perhaps splitting some
of the categories into sub-categories that bring out important
differences between similar types of verbal expression.

-- -- EXERCISE 4: use generate with your grammar and lexicon

Use the generate command illustrated in TEACH GRAMMAR1 to generate
sentences from the above grammar and lexicon.

First compile the grammar library (use ESC d on the command):

    uses grammar

Try a few test runs, and see if you get any surprises:

    generate(blocks_gram, blocks_lex) ==>

or mark this range (F1, F2) then compile it (CTRL-d), after increasing
the maximum recursion level from its default (10) to 15:

    15 -> maxlevel;

    repeat 5 times
        generate(blocks_gram, blocks_lex) ==>
    endrepeat;

Try to explain the difference between values 10 and 15 for maxlevel.

Examples with maxlevel set to 10

** [which blue table is on it]
** [is it on it]
** [which little table is above it]
** [putdown it onto it]
** [move some one above it]
** [move it over it]

Examples with maxlevel set to 15

** [put it onto a blue table above a box]
** [which little big table is above some big block]
** [put each one over every small one above it over each one]
** [where is every large box on the one]
** [is the table on some white blue one on each table]

Are there some sentences here, or in your output, which you think
should be excluded by the grammar. Could you modify the grammar so
as to exclude them?

Note: if you wish to copy examples of output from the output.p file into
your file blocks_gram.p you can edit the output file

    ENTER ved output.p RETURN

Then using F1 and F2 mark the lines of output you wish to save.
Then using ESC x (or an ENTER ved .... command) return to your previous
file.
Select the location where you want the output to go.
Create an empty comment using Pop11 comment brackets, like this

/*

*/

Put the VED cursor in the blank space. Then to this to Move the text In:

    ENTER mi RETURN

That will give you commented out examples of output, which will be
ignored if ever you compile the whole file.

You can also type some extra text in at the top of the comment saying
how the examples were generated, e.g.

/*
Using 'generate'. defined in LIB GRAMMAR, with maxlevel set to 15,
the blocks_gram and blocks_lex, produced the following:

** [pickup it onto each white little block on a block]

...etc...

*/

-- -- EXERCISE 5: using setup to create parsers, to test sentences

Try using the parser generator in lib grammar to test sentences that you
think your grammar and lexicon can handle.

If you have not compiled your grammar and lexicon since this editing
session started, go to your file containing them, mark the whole range,
from the 'vars' declaration to the end of the second assignment, to
blocks_lex. Then compile that range (CTRL-d). Test that you have
compiled both by printing out the grammar and the lexicon (ESC d on each
line):

    blocks_gram ==>
    blocks_lex ==>

Now make sure that the grammar library is loaded ( ESC d ):

    uses grammar

A procedure you have not yet met is provided to create a collection of
parsers from your grammar and your lexicon. The procedure is called
setup. You can ask Pop11 what setup is

Use ESC-d:

    setup =>

that prints out:
    ** <procedure setup>

NB in Pop11 procedures are objects just as lists, numbers, words,
strings, and arrays are (among other things).

We use setup to create parsers  by applying the procedure to the grammar
and the lexicon (ESC-d):

    setup(blocks_gram, blocks_lex);

That command does not print anything out but it creates a collection of
procedures for recognising items that the grammar deals with, one
procedure for each of the capitalised sentence components in the
grammar, and one for each type of word in the lexicon.

You can ask pop11 to print out some of the procedures

    s =>
    ** <procedure s>

    QUEST =>
    ** <procedure QUEST>

    ONTO_PP =>
    ** <procedure ONTO_PP>

    NOUN =>
    ** <procedure NOUN>

    DET =>
    ** <procedure DET>

(You can think of these procedures as compiled versions of the grammar
rules. But they are procedures for recognition, whereas the grammar
rules can also be used for generation, as we saw previously.)

Try using the recognisers, and see if they do what they should do. The
lexical recognisers can only recognise words (represented in double
quotes in Pop11, if not in a list expression).

E.g. notice that DET returns a description of what it recognized:

    DET("the") =>
    ** [DET the]

otherwise it returns one of Pop11's two booleans (true and false are
booleans):

    DET("blue") =>
    ** <false>

Copy and edit the DET command and try it with other words from the
lexicon. Do the same with some of the other newly created procedures
for recognising lexical entries, e.g.

    ADJ("red") =>
    ** [ADJ red]

    ADJ("the") =>
    ** <false>

Now try some recognisers for complex constructs.

Reminder: these are the constructs (unless you changed the grammar):
    s QUEST COM NP SNP AP PP ONTO_PP

Each now is the name of a procedure, e.g. (use ESC d):

    QUEST, COM, PP =>
    ** <procedure QUEST> <procedure COM> <procedure PP>

So they can also be run. But because they come from the grammar, not the
lexicon, they must be applied to lists of words, not individual words.

E.g. run these (using ESC d on each), and try editing them to see what
results you get:

    AP([big red]) ==>

    ** [AP [ADJ big] [AP [ADJ red]]]

The adjectival phrase is an adjectival phrase starting with and
adjective indicated by

    [ADJ big]

and followed by an adjectival phrase indicated by

    [AP [ADJ red]]

Compare these:

    AP([the red]) ==>
    ** <false>

    AP([big red box]) ==>
    ** <false>

But compare

    NP([the big red box]) ==>
    ** [NP [DET the] [SNP [AP [ADJ big] [AP [ADJ red]]] [NOUN box]]]

This is noun phrase because it contains a determiner:

    [DET the]

Followed by a simple noun phrase SNP, which is an AP followed by a NOUN

    [SNP [AP [ADJ big] [AP [ADJ red]]] [NOUN box]]]

-- Using showtree -----------------------------------------------------

The library program showtree can be used to display the NP description
in a graphical format. (This will work in XVed, and also in Ved if you
are using an 'xterm' window, or using PuTTY in windows to access Poplog
remotely:

Compile the library (ESC d)

    uses showtree

Apply the procedure to the full NP description (using ESC d):

    showtree([NP [DET the] [SNP [AP [ADJ big] [AP [ADJ red]]] [NOUN box]]])

If it works on your terminal it will look something like this (only
prettier):

      |NP|
  -----------
|DET|     |SNP|
  |     ---------
  |     |       |
 the  |AP|    |NOUN|
    -------     |
  |ADJ|  |AP|  box
    |     |
    |     |
   big  |ADJ|
          |
          |
         red

Now try testing sentences that you construct to see if the "top level"
parsing procedure "s" recognises them:

Examples:

    s([which big box is above it]) ==>
** [s [QUEST [WH_SELECT which]
             [SNP [AP [ADJ big]] [NOUN box]]
             [VBE is]
             [PP [PREP above] [NP [PN it]]]]]

Examine that closely to work out where all the descriptions come from.

Why doesn't this one work:

    s([which large box is below it]) ==>
    ** <false>

Can you fix either the grammar or the lexicon, then recompile it, then
re-run the setup procedure so that that example is accepted?

The showtree library includes a pop11 macro "---" defined so that

    showtree([ word word word ..... ] ) ==>

can be abbreviated like this and compiled as before.

    --- word word word .....

e.g. try

   ---  is some box above it

** [s [QUEST [VBE is]
             [NP [DET some] [SNP [NOUN box]]]
             [PP [PREP above] [NP [PN it]]]]]

    --- where are all the large boxes
    ** <false>

    --- where is every large green box
    ** [s [QUEST [WH_LOC where]
                 [VBE is]
                 [NP [DET every] [SNP [AP [ADJ large] [AP [ADJ green]]] [NOUN box]]]]]

Try using some of the output of the generate(<grammar>, <lexicon>)
command, and see if the s procedure recognises everything generated,
and parses it correctly into sub-structures.

-- EXERCISE: Make a summary of what you have learnt from this file ----

-- TO BE ADDED --------------------------------------------------------

Further developments of these ideas

Designing and implementing a parser, instead of using lib grammar

Combining parsing with other components in a complete architecture, e.g.
for a chatbot or simple expert system, or travel adviser.

Combine the above ideas with an image analyser to generate descriptions
of what is in various images.

Combine the above ideas with a program for making pictures, and make it
draw a picture when given a sentence describing the desired contents.

What other applications can you think of?

Further reading
    TEACH * ISASENT
    TEACH * PARSESENT
    TEACH * PARSING

--- $usepop/pop/teach/fastgram2
--- University of Birmingham 2011. All rights reserved. ------