Executable English / Internet Business Logic Tutorial, Part 4

Theo Walker    internet.business.logic@gmail.com    

 

The previous tutorials have covered using the Executable English / Internet Business Logic system (IBL) to write and run agents that have up to a few thousand lines of data. This tutorial shows how to use the IBL to work with networked SQL databases that may contain very large tables. The IBL can automatically generate and run SQL queries, using just the information that you write in the rules. You can also use the IBL to create and drop tables, and to add and delete rows in tables.

The SQL queries and transactions that the IBL generates can be very complex, even if they are based on a few simple rules [1]. However, the SQL is generated and run behind the scenes, so it not usually necessary for anyone to look at it. When SQL is generated and run, the IBL can explain the results, in English, at the business or scientific level.

Section 1 - Connecting to a Database

Suppose you have a table called Employees in a SQL database, and that it looks like this:

EmpID EmpName ManagerName
1 Curly Moe
2 Larry Charlie
3 Carl Lenny

Suppose further that the table is in a MySQL database management system at www.executable-english.com. To make the Employees table available to the IBL, you can write a rule like this.


 url:www.executable-english.com  dbms:mysql  dbname:mysql tablename:Employees port:3306  id:mysql  password:mysqlmysql
 ------------------------------------------------------------------------------------------------------------------- 
 table Employees has a row some-id some-employee some-manager


The rule takes values from the three columns from the database table (EmpID, EmpName, ManagerName) and puts them into some-id, some-employee, and some-manager. If you wrote an agent with just this rule in it, the Question Menu would contain:


 table Employees has a row some-id some-employee some-manager 


Provided that the table exists, has three columns, and the correct link information is given in the first line of the rule, asking this question will yield the rows in the database table. (If there were many rows, the IBL would show the first 50, then the first 100, and so on). You can see a working example in the agent called "IdEmployeeManager1":
  1. Log in to the IBL using the User Id "demo" and the password "author".
  2. Select the IdEmployeeManager1 agent from the list, and pick "Choose and agent and go to its Question Menu..." from the menu at the top.
  3. Click on the question: table Employees has a row some-id some-ename some-mgrname, and click Ask.
  4. You will see the rows in the Employees table displayed. (If the table is empty, a "Sorry, no answers were found" message will appear. If the table does not exist, you will see a message to that effect.)
You can look at the agent using "Go to View or Change the Agent". You will see the rule used to link to MySQL and access the Employees table at the top. The other rules were used to create the Employees table through the IBL, which we will explore next. Before moving on to the next section, you are welcome to access your own online database via the IBL. To do this, add a rule like this one to the top of the IdEmployeeManager1 agent:



 url:[database URL]  dbms:[mysql, oracle, etc.]  dbname:[database] tablename:[name] port:[####]  id:[user id]  password:[password]
 --------------------------------------------------------------------------------------------------------------------------------- 
 table [table name] has a row [some-column1 some-column2... and so on for each column in the table]


(The first line of the rule is fixed format. The last line can be any English sentence you like, so long as it mentions the right number of column place holders.)

Note that, if you want to access multiple tables, you will need a separate rule for each table. Also, you will probably need to ask your system and database administrators to allow queries sent from www.executable-english.com, and to allow the results to be sent back. (It is also possible to install the IBL system in your organization -- please send email to internet.business.logic@gmail.com)



Section 2 - Managing a Database, and ETL

Section 1 showed how to use any database as a source of knowledge for the IBL. This is very powerful; the IBL can connect to a database and gather information needed to answer any question about the data it contains. Another aspect is that the IBL can be used to create tables, remove them, join tables, and insert and delete rows.

One example of where this behavior is useful is in reorganizing a data model. An existing database with thousands or millions of entries can be linked to the IBL, the data can be transformed, and the result can be loaded into a new database with data rearranged into a new set of tables. (This kind of process is sometimes called "Extract, Transfer and Load", or ETL for short). Further, information from multiple different databases can be combined, since the data's meaning will always be preserved in the IBL. For example, medical information from different organizations can be combined to fuel new discoveries. Suppose we have (fictitious) population data in tables from two different databases:

Disease Data
Disease CountryCode Percentage
Sickle-cell Anemia DE 0.0002
Sickle-cell Anemia IS 0.000001
Sickle-cell Anemia JP 0.00001
Sickle-cell Anemia IN 0.00008
Phenylketonuria DE 0.00012
Phenylketonuria IS 0.00161
Phenylketonuria JP 0.00012
Phenylketonuria IN 0.00011
Porphyria DE 0.000022
Porphyria IS 0.000031
Porphyria JP 0.000021
Porphyria IN 0.000035


Haplotype Data
HaplotypeID Country Percentage
1 Germany 78
2 Germany 10
3 Germany 30
1 Iceland 65
2 Iceland 74
3 Iceland 32
1 Japan 3
2 Japan 13
3 Japan 26
1 India 44
2 India 9
3 India 25


We will work through an ETL example using this data. Although this is a specific example, the approach is general, and applicable to any SQL databases. You may like to use the approach, step by step, with your own task and data as you read the following.

The above table contains artificial data about Haplotypes. A Haplotype is a set of individual differences on a chromosome which are inherited together. Researchers attempt to link haplotypes with expressed characteristics, especially diseases, based on population statistics and other data. Merging these two data sets by country into a single table is a logical step towards finding useful correlations. We shall see how to do this using the agent called Haplotype1 on the IBL demo id. The agent contains the rules that define how to bridge the two tables, via a third table, Countrycode, that translates between country names and their abbreviations. (More data would be needed in order to draw useful conclusions that would be useful to scientists, of course.)

The above data is a simplification we use to indicate what is needed to solve this kind of problem. There are several haplotype databases in existence. There are clinical studies into the frequency of various diseases within populations, which are also available in database form - and their records are of course more thorough than the artificial data above. Data mining techniques have been applied on joins of such information, especially in the prediction of cancer susceptibility based on sets of haplotypes. Medical applications of the IBL would therefore likely be focused on a specific disease. Still, a wide-scale mining of data sets such as these could certainly provide leads for such research; if a country had exceptionally high percentages for a given haplotype, and exceptionally high percentages for a disease that had a possibility of being associated with that haplotype (e.g. the haplotype was on the right chromosome) some useful correlations could emerge.

The Haplotype1 agent contains all the rules needed to create the three tables in the database, insert the data into them, extract and join the data, and create a new table from the result. There are also sample rules for deleting rows and removing tables. You can open the Haplotype1 agent by logging into the demo id with password "author".

Viewing the tables

In the Question Menu of the Haplotype1 agent, you will see a set of questions including:


 for some-haplotype in some-country the occurrence is some-percent2

 the code some-country-code corresponds to some-country

 for some-disease in the country with code some-country-code the occurrence is some-percent1

 6. in some-country the occurrences are some-disease some-percent1 and some-haplotype some-percent2 


(The "6." in front of the last line just a label for the sixth step in a suggested demo you can run.) Asking one of these questions will show the rows of data in the corresponding table. If the table is empty, no rows will show. If the table does not exist, or cannot be reached over the network, the IBL will show an error message.

This tutorial section will go through the process of creating three tables -- Haplotype, Countrycode and Disease, plus a fourth Result table. Then we shall see how to load data into the first three from premade IBL-style tables. Finally, we shall extract data from three tables. Along the way, the rules in the Haplotype1 Agent used in each step will be highlighted. That way, when you want to make your own database-linked IBL agent, you can simply copy and modify this agent's rules rather than starting from scratch.

Removing tables

To begin, it would be best to remove the existing tables and re-create them. That way, you can be sure that no other user has left data that might interfere. Click on this question from the Haplotype1 agent's question menu.


 1. REMOVE the the sql table -- some-table 


Click on Ask, without specifying any values. The IBL will show the English headings corresponding to four SQL tables -- Haplotype, Countrycode, Disease, and Saved Result -- that can be removed. The action bar at the top of the page should now show "Make these Changes to the Agent Data", with a Go button to the right. Clicking on the Go button will remove all four tables and return you to the Question Menu. In the Agent, the rule that was used to make this step was:


 some-table should have a column some-number some-name some-type
 drop sql table : that-table
 --------------------------------------------------------------- 
 1. REMOVE the the sql table -- that-table


The last line of the rule, 1. REMOVE the the sql table -- that-table, creates the question you asked from the Question Menu. drop sql table : that-table is applied to the four tables, generating a SQL Drop command for each. some-table should have a column some-number some-name some-type tells the IBL which tables this agent has access to and what their columns are. It refers to the following table in the agent file:


 this-table should have a column this-number this-name this-type
 ============================================================================================================================ 
 for this-disease in the country with code this-country-code the occurrence is this-percent       1   Disease      CHAR(30)
 for this-disease in the country with code this-country-code the occurrence is this-percent       2   CountryCode  CHAR(3)
 for this-disease in the country with code this-country-code the occurrence is this-percent       3   Percent      FLOAT(30)
 the code this-country-code corresponds to this-country                                           1   Countrycode  CHAR(3)
 the code this-country-code corresponds to this-country                                           2   Country      CHAR(50)
 for this-haplotype in this-country the occurrence is this-percent                                1   Haplotype    INT(8)
 for this-haplotype in this-country the occurrence is this-percent                                2   Country      CHAR(30)
 for this-haplotype in this-country the occurrence is this-percent                                3   Percent      FLOAT(30)
 8. SAVED RESULT in this-country the occurrences are this-disease this-percent1 and this-haplotype this-percent2  1   Country     CHAR(30)
 8. SAVED RESULT in this-country the occurrences are this-disease this-percent1 and this-haplotype this-percent2  2   Disease     CHAR(30)
 8. SAVED RESULT in this-country the occurrences are this-disease this-percent1 and this-haplotype this-percent2  3   DisPercent  FLOAT(30)
 8. SAVED RESULT in this-country the occurrences are this-disease this-percent1 and this-haplotype this-percent2  4   Haplotype    CHAR(30)
 8. SAVED RESULT in this-country the occurrences are this-disease this-percent1 and this-haplotype this-percent2  5   HapPercent   CHAR(30) 


This IBL table contains information that is needed to manage the SQL tables that the IBL is connected to. In addition to the three tables of source data, there is also a schema above for a SQL table that will hold the Saved Result of the Extract,Transfer and Load operation. Inspecting the result will be step 8. of the demo that you can run.

In this example, we are using the MySQL database management system. For compatibility with the Oracle DBMS, there is also a table in the agent to indicate the Oracle tablespace to use.

If you are managing your own tables, you will need to write an IBL table like the one above, containing the information on the schema of each of your SQL tables.


Creating the tables

Now that any old tables have been removed, fresh ones can be created. To do this, use the Question Menu to ask:


 2. CREATE a table with heading -- some-table -- in tablespace some-tablespace 


Click on Ask, and the IBL will show the English headings for four tables that will be created. Click on the Go button to make it happen, and you will be back at the Question Menu again.

The rule in the agent that was used to create the tables (see below) is quite similar to the rule for removing tables. It references the column information from the same IBL table as the removal rule.


 some-table should be in tablespace some-tablespace
 that-table should have a column some-colnumber some-colname some-coltype
 create sql table : that-table that-colnumber that-colname that-coltype that-tablespace
 -------------------------------------------------------------------------------------- 
 2. CREATE a table with heading -- that-table -- in tablespace that-tablespace
     that-colnumber that-colname that-coltype



Inserting the data

Now that the four tables have been created, the data for Haplotype, Countrycode, and Disease can be inserted. Much like the last two steps, all you need to do to insert the data is to go to the Question Menu and ask these questions, with no specified values, and click on Go.


 3. ADD for some-haplotype in some-country the occurrence is some-percent
 4. ADD the entry some-country-code corresponds to some-country
 5. ADD for some-disease in the country with code some-country-code the occurrence is some-percent 


Once you have asked the three questions and the tables have been updated, you may want to check the contents of the tables using the questions:


 for some-haplotype in some-country the occurrence is some-percent2
 the code some-country-code corresponds to some-country
 for some-disease in the country with code some-country-code the occurrence is some-percent1 


These questions are just like the questions from Section 1. They use the following SQL linking rules:


 url:www.executable-english.com  dbms:mysql  dbname:mysql tablename:disease  port:3306  id:mysql  password:mysqlmysql
 ------------------------------------------------------------------------------------------------------------------ 
 for this-disease in the country with code this-country-code the occurrence is this-percent




 url:www.executable-english.com  dbms:mysql  dbname:mysql tablename:Countrycode port:3306  id:mysql  password:mysqlmysql
 --------------------------------------------------------------------------------------------------------------------- 
 the code this-countrycode corresponds to this-country




 url:www.executable-english.com  dbms:mysql  dbname:mysql tablename:haplotype port:3306  id:mysql  password:mysqlmysql
 ------------------------------------------------------------------------------------------------------------------- 
 for this-haplotype in this-country the occurrence is this-percent


The three questions that you used to insert the data obtained their data from the following IBL-style tables inside the agent:


 table Disease should contain an entry this-disease this-countrycode this-percentage
 ======================================================================================= 
                                   Sickle-cell             DE              0.0002
                                   Sickle-cell             IS              0.000001
                                   Sickle-cell             JP              0.00001
                                   Sickle-cell             IN              0.00008
                                   Phenylketonuria         DE              0.00012
                                   Phenylketonuria         IS              0.00161
                                   Phenylketonuria         JP              0.00012
                                   Phenylketonuria         IN              0.00011
                                   Porphyria               DE              0.000022
                                   Porphyria               IS              0.000031
                                   Porphyria               JP              0.000021
                                   Porphyria               IN              0.000035



 table Countrycode should contain an entry this-country-code corresponds to this-country
 ======================================================================================= 
                                            DE                              Germany
                                            IN                              India
                                            IS                              Iceland
                                            JP                              Japan



 table Haplotype should contain an entry this-haplotype this-country this-percentage
 ===================================================================================== 
                                         1                 Germany         78
                                         2                 Germany         10
                                         3                 Germany         30
                                         1                 Iceland         65
                                         2                 Iceland         74
                                         3                 Iceland         32
                                         1                 Japan           3
                                         2                 Japan           13
                                         3                 Japan           26
                                         1                 India           44
                                         2                 India           9
                                         3                 India           25



The rules used to take that data from the IBL table and put it into the database were:


 table Haplotype should contain an entry some-haplotype some-country some-percent 
 add : for that-haplotype in that-country the occurrence is that-percent
 -------------------------------------------------------------------------------
 3. ADD for that-haplotype in that-country the occurrence is that-percent




 table Countrycode should contain an entry some-country-code corresponds to some-country
 add : the code that-country-code corresponds to that-country
 ---------------------------------------------------------------------------------------
 4. ADD the entry that-country-code corresponds to that-country




 table Disease should contain an entry some-disease some-country-code some-percent
 add : for that-disease in the country with code that-country-code the occurrence is that-percent
 -------------------------------------------------------------------------------------------------
 5. ADD for that-disease in the country with code that-country-code the occurrence is that-percent


You may notice that the rules for transferring data into a database table are quite simple; they work just like the rules for adding and removing tables. IBL-style tables are easy to create as well; you can copy and paste data from a spreadsheet or a web page, put a sentence above it and add equals signs for the double underline, and the table is done. The process of loading data into a database through the IBL is thus quite straightforward. This works well for up to a few hundred lines of data. To load larger tables into SQL, it is best to use a utility, such as the one included with your MySQL or other DBMS.

Combining the tables

Now that the Haplotype, Countrycode and Disease tables contain the data, you can ask the following queston to combine them:


 6. in some-country the occurrences are some-disease some-percent1 and some-haplotype some-percent2 


You should see an answer table with 36 rows. (Of course, in a real ETL example, there may be a very large number of rows. We will look later in this tutorial at the rule numbered 10. for dealing with that kind of situation.)

The system found the 36 rows of the answer using this rule in the agent.


 for some-disease in the country with code some-country-code the occurrence is some-percent1
 the code that-country-code corresponds to some-country
 for some-haplotype in that-country the occurrence is some-percent2
 --------------------------------------------------------------------------------------------------
 6. in that-country the occurrences are that-disease that-percent1 and that-haplotype that-percent2 


To get the answer, the system used the information in the rule to automatically generate and run the a SQL query like this:


 select distinct 
  Haplotype.Country,Disease.Disease,Disease.Percent,Haplotype.Haplotype,Haplotype.Percent 
   from 
    Disease,Countrycode,Haplotype 
   where 
    Countrycode.Country=Haplotype.Country and Countrycode.Countrycode=Disease.CountryCode 
   order by 
    Haplotype.Country,Disease.Disease,Disease.Percent,Haplotype.Haplotype,Haplotype.Percent 


(In the above, the prefix "mysql." is omitted for readability.) In real use, the system can generate SQL queries that would be too complicated to write reliably by hand, such as those in [1].

The next step is to actually save the result. Ask the following question from the Question Menu:


 7. ADD SAVED RESULT in some-country the occurrences are some-disease some-percent1 and some-haplotype some-percent2 


As before, no parameters are needed, and clicking on Go will perform the action.

The rule that is used is


 6. in some-country the occurrences are some-disease some-percent1 and some-haplotype some-percent2
 add : 8. SAVED RESULT in that-country the occurrences are that-disease that-percent1 and that-haplotype that-percent2
 --------------------------------------------------------------------------------------------------------------------- 
 7. ADD SAVED RESULT in that-country the occurrences are that-disease that-percent1 and that-haplotype that-percent2


and the result is saved in the SQL table called "Hapresult".

Next, you can ask to see the saved result by asking the question


 8. SAVED RESULT in some-country the occurrences are some-disease some-percent1 and some-haplotype some-percent2 


You should again see the 36 rows of the answer. This time there was very little for the system to do. It simply fetched the rows from the "Hapresult" SQL table, using the following rule:


 url:www.executable-english.com  dbms:mysql  dbname:mysql tablename:Hapresult  port:3306  id:mysql  password:mysqlmysql
 --------------------------------------------------------------------------------------------------------------------
 8. SAVED RESULT in this-country the occurrences are this-disease this-percent1 and this-haplotype this-percent2 


We turn now to a way of making the extract-transfer-load in the case when there are many rows in the result, so that we do not wish to try to display all those rows in a browser. First, you can remove the rows from the "Hapresult" SQL table by asking the question


 9. DELETE SAVED RESULT ROWS in some-country the occurrences are some-disease some-percent1 and some-haplotype some-percent2 


and click on Go. The SQL table is still there, but it now has no rows of data. Now ask the question


 10. ADD some-number rows to the SAVED RESULT TABLE 


You should see that the answer is just 36, meaning that the system is ready to add 36 rows for you. As before, clicking on the Go button will actually add the rows.

In this case, the rule that was used was


 count : 7. ADD SAVED RESULT in some-country the occurrences are some-disease some-percent1 and some-haplotype some-percent2 = a-number 
 --------------------------------------------------------------------------------------------------------------------------------------
 10. ADD that-number rows to the SAVED RESULT TABLE




This completes the running of the demo.

In the example, Haplotype1, the three source tables for the extract-transfer-load task are all in SQL in the same DBMS at www.executable-english.com. You may like to try variations on this. For example, the Countrycode data could be stored as a non-SQL IBL table in an agent that you write, since it will never have a large number of rows. The SQL tables of source data, and the saved result, could all be in different DBMSs on different servers on the Web. In this way, translation information such as the Countrycodes, can be used to combine existing data from the Web. As usual, English explanations, at the business or scientific level, will be available.

However, this kind of ETL task, in which retrievals from SQL tables are interleaved with fetches from data in the IBL agent file, is likely to run slowly. For speed, it may be better in some cases to accumulate copies of all of the source data (even small tables) in one DBMS, so that the IBL system can generate and run a one shot ETL SQL query. For intermediate efficiency, it may be best to have all the data in SQL, but in several DBMSs on different servers.

This concludes this tutorial. You now have examples to work with of when you use data from your own databases, and you know how to manage a database through the IBL. Please let us know your comments by email to internet.business.logic@gmail.com, and we hope that you enjoy working with the IBL!




Section 3 - References


[1] Oil Industry Supply Chain Management Using English Business Rules Over SQL