Normalisation – Cloudy DATA Blog

MS Access Function : Automate Normalisation or De-concatenate a field (MS Outlook export example)

I was contemplating a better way of storing our old emails and unhappy with some of the systems in place I started considering whether I could dump them into a database.

Thing is when you export from Outlook some of the standard fields in particular To and From are concatenated. Wouldn’t it be nice to separate those fields into their own table of names and addresses and reference them back to a main table of messages.

This is what I came up with.

For demonstrations purposes I will use two tables

t001parent pkid - autonumber primary key ccaddresses - memo or long text

and the child table

t002newchildren pkid - autonumber primary key ccaddress - string(150 should do it) pkidt001 - number

and here is the blank second table

Next we create a user defined function

Public Function CreateChildTable()

    Dim db          As DAO.Database
    Dim rs          As DAO.Recordset
    Dim rsTarget    As DAO.Recordset

    Dim strSQL      As String
    Dim strField1   As String
    Dim strField2   As String
    Dim varData     As Variant
    Dim i           As Integer
    Dim intPKID     As Integer

    Set db = CurrentDb

    'Select all the ccaddresses from the parent table
    strSQL = "SELECT pkid,ccaddresses FROM t001parent"

    Set rsTarget = db.OpenRecordset("t002newchildren", dbOpenDynaset, dbAppendOnly)

    Set rs = db.OpenRecordset(strSQL, dbOpenDynaset)
    With rs
        While Not .EOF
            intPKID = !pkid
            varData = Split(!ccaddresses, ";") ' Get all semi colon delimited fields
               
            'add email addresses if there is only one email address there
            With rsTarget
            .AddNew
            !ccaddresss = Trim(varData(0)) ' remove spaces before writing new fields
            !pkidt001 = intPKID
            .Update
            End With
            

            'loop through addtional email addresses and add them as children to table
            For i = 1 To UBound(varData)
                With rsTarget
                    .AddNew
                    !ccaddresss = Trim(varData(i)) ' remove spaces before writing new fields
                    !pkidt001 = intPKID
                    .Update
                End With
            Next
            .MoveNext
        Wend

        .Close
        rsTarget.Close

    End With

    Set rsTarget = Nothing
    Set rs = Nothing
    db.Close
    Set db = Nothing

End Function

After running this we should

We’ve just nicely split the parent table ccaddesses field into constituent emails and normalised it as a child of the parent table.

Pivoting Relationships from Many to Many – One to Many – One to One – and then Key Value – Relationships and Schemas

There are a whole host of applications in which you may wish to record the biological or legal relationships between individuals. Here is a short investigation of some of the subtleties related to the options you have to model this data.

The actual biological parent to biological child relationship is somewhat more subtle than the classic description in books would have you believe (usually these equate the parent child relationship as a one to many relationship.

Strictly speaking biological parent to biological child is a many to many relationship (which in MS Access is a recursion of 2 one to many relationships) but it is a specific version of the many to many relationship. One parent can have many (read infinite) children but one child can only ever have 2 parents – one biological mother linked to one biological father (ignoring latest developments in science). Of course a one to two relationship is still a type of of one to many relationship it is a more specific type of one to many relationship but one to many nonetheless – so to model all relationships you still need two one to many links. If you consider the unique combination of mother and father as a group in itself then you can model all relationships again as only consisting of one group of one to many relationships rather than as two one to many relationships (which we presently have defined as one one to many and one one to two relationship). To do this you would have to make a mother and father a unique pair who can have infinite number of children but a child can ONLY have one mother father pair.

Many to many relationship

To a certain extent this looks like a one to many relationship as there appears to be only two tables (albeit one being aliased). This is correct but to record a dependents relationship with both parents two separate records with separate PKIDs need to be input into the persons junction table actually making it a many to many relationship.

By viewing the parents PKIDs as a group we can convert this many to many relationship back to a one to many relationship by adding a further field into the persons junction table. Now the relationship between a dependent and its mother and father can be recorded in one record in the personsjunctiontable.

It is not initially clear but a one to many relationship where the many is a finite number (ie not infinite – in our example 2) – can be re-modelled as a one to one relationship as follows.

In fact one thing that I learnt when I started reading about relational databases is that relational not only relates to the relationship between tables but the relationship within tables.

Thus we can separate a table of columns back out into a one to many relationship by pivoting out the column names and making them a value within a field themselves linking them to a field and then adding the ability to add a value. This works because the number of columns is finite. I believe this is the thinking behind a key value database.

Looking at the tables resulting from data input you would get

What makes this particularly good is that a lot of attributes can be stored against an individual and if you forget a column name you simply add it to the field name table and it becomes an additional option in the KeyValueData table. This can be particularly good if you are not sure of the set of column names that your users may wish to use. It has the disadvantage that validation will become harder as the values are often a mix of value types and cannot be so easily tied back to a particular list as the source of the field will change according to the value of its relative field name. Here for example I could easily add Mother and Father to the table of field names but a user would simply type these in rather than having them validated as per the first three relationship structures.

Towards the best table design for Addresses

Floor Reference or Flat No
Building Name or House Name
House/Building/Unit Number
Street Name
Through Street
Locality
Town or City
Country
Postcode

A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.

Antoine de Saint-Exupery

OK so I didn’t know who he was either until I did a search on simplicity but I wanted to record my best field structure for the storage of addresses which although important is on a par with watching paint dry in terms of excitement. Despite this addresses are central to a heck of a lot of applications so it is important to get it right. This copes with pretty much most addresses. I mainly use locality in larger towns or cities to denote a sub area specifically for users as you can guarantee they will be unfamiliar with the majority of specific street names and particularly for users who are doing business in the same city they are located in it is very useful for them to have a quick understanding of where things are in relation to the assets of the operation involved. Such “context” can be added by using maps but will come for free with intelligent staff if you give them opportunity. I long ago dropped counties and districts but I could see if you were in a larger country such as Russia or America could be useful to note state / oblast etc.. Additionally in the UK town is really postal town – which is often not obvious at all – it can be overkill but you can organise addresses so that postal town is referenced from the town automatically. Probably overkill.

Information on BS7666
National Land and Property Gazeteer