Why joining against Experian Mosaic is easy

One issue that has arisen in the debate about the release by either the HSCIC or its predecessor NHSIC is the joining of the HES hospital data against Mosaic demographic data.

This would have been done by NHSIC. And once they had made the basic decision to release the data in the first place (a separate discussion) this was the _right_ thing to do, and it would be the correct way to do a similar task for a less controversial research project.

Mosaic data maps very small areas to demographic tags. Let's assume that the data goes down to full postcode level (I believe that in some cases it's slightly less granular than that).

The Mosaic data would look like this:

X12 3YZ Demographic Description 1 

X12 3YY Demographic Description 2 

X12 3YX Demographic Description 1 

X12 3YW Demographic Description 1

There are a lot of full postcodes in the country (I'm guessing, but around 2m --- 20 million houses, ten per code). There are a few hundred Mosaic descriptions, if that.

So the process will have been something like this:

IFoA take the Mosaic data and, with Experian's agreement, pass it to the NHSIC for this specific purpose (this is a standard thing to do with this sort of data).

NHSIC join the HES data against the Mosaic data using the postcode as the key, so that each HES record is extended by a demographic description.

NHSIC then truncate the postcodes to the agreed length (probably just the initial letters like "B" or "SW" would be enough) and hand over the records. All that IFoA see against each patient is therefore a very low resolution postcode, which will match an entire city or county, plus a demographic tag, which will be shared amongst tens of thousands of postcodes.

The basic agreement to release data to the IFoA is something that there is a lot of dispute about, and I think it was a very, very bad thing. But once you've made the decision to do it, what was done with Mosaic tags was the right thing: the IFoA got the data they could use, and the level of resolution in it was appropriately reduced.

ian