Introduction
The parsing and extraction of data from
text files, an ODBC database or email messages is
controlled by a parsing definition. A parsing definition
consists of a series of settings that together describe what
actions are to be taken. The following is a
mainly visual description of how to create and process a
parsing definition to process incoming email from the OS X
Mail email client.
If you want to save yourself a few mouse clicks and key
strokes you can download the requisite files from http://www.b-bsoftware.com/downloads/MTPExamples.dmg.
The downloaded folder 'Email Client Example' contains three
files. The first file, 'Internet Sales.pdfn' is the parsing
definition, it can be imported into MTP using the menu item
Definitions/Import
Definition. The second file is an OS X Mail
client mailbox file, it contains the sample email. This can
be imported into Mail by using the mailbox import function (File/Import
Mailboxes). Make sure
that the resultant mailbox is located at the highest level
in the mailbox hierarchy otherwise MTP will not recognise
it. The final file is a FileMaker Pro database. In
order to access this file from MTP
you must firstly create an ODBC data source (called
'Customers') that points to the database. Information about
this process may be found at http://www.filemaker.com/support/technologies/odbc.html.
Step
1
MTP processes
incoming text depending on the settings contained in a parsing
definition. MTP’s main window displays
a list of all known parsing definitions. When initially
launched the list will be empty. Therefore, in order
to process text a parsing definition must firstly be created. To
do so, click the ‘New’ button and enter a name for the new
definition; a new definition editing window will appear.
Step
2
The editing
window contains 6 panels each containing a series of settings
that instruct MTP on how to read text, extract data and then
construct output data. Each panel must be completed before
progressing to the next and all panels must be completed before
processing of live data can occur. The bottom left of each
panel contains an 'Info' button that will take you to the
appropriate page of the online user manual.
The first panel defines
the source of the text to be processed. Clicking any one of the
4 radio buttons at the top of the window will display a sub
panel allowing the input text type to be further defined. For
this example the 'Email Client' option is selected which
specifies that incoming email are to be read via OS X’s Mail
application. The only required setting for this option is the
name of the mailbox to be read. This is selected from the
'mailbox' popup menu. The popup menu is initially empty.
Clicking the ‘Refresh’ button will launch Mail (if not already open),
read the list of available mailboxes and populate the menu.
Select the ‘Sales’ mailbox, it contains email notifications of
internet sales.
To check the contents of
the mailbox simply click the ‘Test Mail Client’ button; Mail is
launched (if not already open) and email from the mailbox are
read and displayed in a mini email browser.
Note that each email contains formatted details of an individual
internet sale e.g.
Step 3
The next panel
allows the creation/editing of one or more data extraction
'variables'. A
variable is an entity, identified by a unique name that is
associated with a set of location and extraction rules that
results in the extraction of data from incoming text. A variable
thus symbolizes extracted data and can be referred to by name in
subsequent parts of the process, such as the mapping of data to
particular database fields or the placement of extracted data
into an outgoing email.
The panel initially contains an empty list of variables. To
create a new variable, for example to extract the name of the
purchaser from an email, click the 'New' button. The panel will
be expanded to display a variable definition sub-panel. The
screen shot below shows how such a variable may be defined. The final element in the
sub-panel specifies a unique name for the variable (Purchaser),
it is this name that will appear in the list of available
variables in the main panel.
Two important points to note
here concern the string location fields and the 'process' popup
menu. Both of the string location fields may contain regular expressions allowing for more
powerful and sophisticated searches. Right-clicking on a field
reveals a contextual menu of the more commonly used expressions.
Secondly, the process popup menu contains the list of available
BASIC scripts that may be used to further process the extracted
data. The application comes with a number of standard, example
scripts. For example selecting 'Extract-Last-Word' from the menu
would result in the last name of the purchaser being extracted
(and thus changing the variable name to 'Purchaser_Surname'
might be more appropriate). New BASIC scripts can
be added via the BASIC editing window.
Once a variable has been defined it may be
tested by clicking the 'Test Extraction' button. A separate
dialog window appears containing the results of the variable
data extraction for each of the email in the mailbox.
Similarly, a number of variables may be built up that
represent the important data elements of the email such as
the product name, email address of purchaser, data of
sale, purchase price etc.
Step 4
When at least
one variable has been defined you may proceed to the next panel.
This panel allows the definition of which email are to be
selected and thus processed. For instance the following screen
shot specifies that only email with subject headings containing
the phrase 'Thanks for your payment' will be selected for
processing, all other email will be ignored (similar to the
variable definition the search string input field may also contain regular expressions). If
on the other hand all email from the selected mailbox are to be
processed then select 'All' from the 'process' popup menu.
As will all the
definition panels it is possible to test the settings by
clicking the 'Test Selection' button. This will generate a
dialog window with the selection results for each of the email
in the mailbox.
Now that the input
source, data extraction and text selection have been defined the
remaining three
panels allow definition of what output is to be generated.
Step 5
The next
(fourth) panel specifies whether database output is required and
if so how the extracted data (variables) are to be combined to
produce the database data. Clicking the 'Write to a database'
check-box enables all of the other panel elements thus allowing
for the input of the database details.
Once
the data source name, and possibly user
name and password have been entered the name of the table
must be defined. If a legitimate table name is supplied
then clicking the 'Column Names' button will populate the
left hand side of the 'Column mappings' table with all of
the column names for the table. You may then map variable
names against one or more of those column names. This is
done by either double-clicking the column name or
highlighting the column name and clicking the 'Edit
Mapping' button. In either case the following dialog
window is displayed.
When one or
more mappings have been specified the output can
be tested by clicking the 'Test Database Output'
button, this displays the 'Definition Test
Window'.
This dialog window
allows the stepping through of the input text (in this case,
each email in the mailbox) selecting or rejecting each block of
text (email). If the text is accepted it is then processed and
database output is generated
and displayed, however it is
not actually written
to the database.
Step
6
The next panel
is very similar to the database
output panel, in that it specifies
whether text file output is
required and if so how the extracted
data (variables) are to be combined to produce the text file
output.
Clicking the 'Write to
an output file' check-box enables all of the other panel
elements thus allowing for the specification of the text file to
be written. These include the name of the file to be created. N.B.
If the text file already exists then data will be appended to
it. The construction of the output data may then be
defined by building a list of variables in the left hand list of
variables. The remaining panel elements specify the character to
be used to separate the variables and the character to be used
to terminate the line (record). When one or more variables
have been specified the output may be tested by clicking the
'Test Text File Output' button, this displays the 'Definition
Test Window'.
This dialog window allows the stepping
through of the input text (in this case, email
in the mailbox) selecting or rejecting each block of text
(email). If the text is accepted it is then processed and file
output is generated and displayed, however it is not actually
written to the text file.
Step
7
The final panel
allows the specification of an
output email template that will be
sent (via OS X's Mail email
client) for each selected text
block. In a similar manner to the
previous two panels, clicking the
'Send an email' check-box enables
all of the other panel elements
and also displays a floating
window that contains the names of
all the available variables.
A reference to a variable
may be inserted into each of the panel elements by
dragging-and-dropping a variable name from the floating window
into one of the panel elements. This will result in a construct
of the type '{Variable Name}' appearing in the element.
This construct is replaced by the associated extracted data when
the email is processed. For example, the construct '{Purchaser}'
would be replaced by 'Donald Duck' when the email is
processed. The email message editor allows for creation of rich
text via differing fonts, sizes, stylings and colour. Html tags
may also be embedded into the message (via the contextual menu)
allowing for construction of more complex messages.
When all the
necessary settings
have been provided
the output can be
tested by clicking
the 'Test Email
Output' button,
this displays the
'Definition Test
Window'.
This dialog window allows the stepping
through of the input text (in this case, email
in the mailbox) selecting or rejecting each block of text
(email). If the text is accepted it is then processed and the
outgoing email is displayed, however it is not actually
transmitted.
Now
What?
The parsing
definition is now complete. Click the 'OK' button at the bottom
of the window, the editing window will disappear, leaving the
parsing definition control window. The definition can now be
'processed' using live data. To do so, highlight the definition
name in the list of definitions, the 'Process Definition' button
will be enabled. Clicking the button will initiate a processing
run.
The first action undertaken is validation of the definition.
That is, each setting in the definition is checked. If no errors are detected then processing
will proceed otherwise processing will be terminated. Since text file output has
been requested the location of the folder that is to contain the
output file is requested. If this is
provided then processing will continue otherwise processing
will be terminated. Processing continues
with the display of the 'Definition Process' window,
the source text will then be read and processed.
As each text block is
processed a progress/status message is displayed in the window
and an entry made in the processing report. When all the text
has been processed the progress window will be closed. The
processing report may then be displayed by selecting the menu
item File/Processing
Reports.