igb's posthaven

Nick Robinson Misses the Point

Home Secretary letter to Michael Gove on extremism in schools - News stories - GOV.UK

Nick Robinson had the 10 past, and outlined what had happened.

Gove and May had been at a meeting of the "Extremism Task Force" where Gove had lost the argument over whether you should wait, or not, for people to actually threaten violence before you try to de-radicalise them. The context was discussion of voluntary code of practice for "supplementary schools", which is code for after-hours Madrassas.

I think, as a free-speech advocate, that it's perfectly reasonable to say that not only should it be legal to advance unpopular ideas, but that in general arguing for violence should not be illegal. After all, plenty of vanguard parties of the hard left argue for non-democratic revolution, and they are not proscribed organisations, nor should they be. However, even if you think it should be legal to call for the execution of the Prime Minister, that doesn't make it wrong for the state to attempt to argue you out of it: there's a massive, massive gulf between the state locking people up for advancing vanguard ideas and the state putting "how democracy permits you to change the government without an AK" lessons into schools.

Whatever, it turns out that Gove believes that the Home Office has been reluctant to tackle extremism, and argued for the threshold for action being lower, while May went for the current "there has to be evidence of plausibly threatened violence" threshold. Gove apparently lost the argument. So far, I'd incline towards Gove's position as outlined: waiting until people are actually radicalised before intervening seems a high risk strategy, and even if you do manage to catch the people with AKs, you're left with a lot of people who aren't actually dangerous, but form an ecosystem within which those who are dangerous go unchallenged.

However, Gove then went to Cameron, having lost, and attempted to re-open the debate. May found out, was unhappy, and went nuclear. She writes:

The allegations relating to schools in Birmingham raise serious questions about the quality of school governance and oversight arrangements in the maintained sector, not just the supplementary schools that would be signatories to this Code of Practice. How did it come to pass, for example, that one of the governors at Park View was the chairman of the education committee of the Muslim Council of Britain? Is it true that Birmingham City Council was warned about these allegations in 2008? Is it true that the Department for Education was warned in 2010? If so, why did nobody act?

The first question is inane: the MCB's a legal organisation, and there is absolutely no reason why someone should not be a school governor and a member of a pressure group. If you want to proscribe the MCB, say so, but retrospectively arguing that membership makes you unfit to be appointed as a governor is simply silly. But the rest is toxic, and for those that don't follow Birmingham education, the intervention of the head of Queensbridge is a massive thing, because Tim Boyes is one of the most respected heads in the area, and will have some of the primaries involved as feeders. He has cast-iron evidence that he presented the issues in 2010, which sounds about right, and that leaves the DfE in a very exposed position. It's less clear that BCC were warned in 2008, but problems at Moseley around then were common knowledge; it went into special measures for other reasons, but it seems unlikely that under the governance arrangements of the time the IEB didn't know what had been going on.

Robinson went on with some gossip about Charles Farr, who has the counter-extremism brief in the Home Office, having had an affair with May's SPAD who's now leaking about him, which is vaguely amusing. But he appeared to completely miss the real story: the two key candidates for the Tory leadership if they lose next year, having a massive, public row which boils down accusing each other of being soft on terrorism.

"Soft on immigration" would be toxic enough, but "soft on terrorism?" I love it when the Tories tear themselves apart.

Apache Password Storage

Updating my password on an SVN server, I happened to forget the parameters to htpasswd

and actually looked at the usage message. It contains this rather interesting line buried at the

bottom (Solaris 11.1, derived from Apache 2.22):

The SHA algorithm does not use a salt and is less secure than the MD5 algorithm.

Obviously, by "X algorithm" they mean "the overall process of taking password and using our password hashing procedure which incorporates hash algorithm X".

Indeed, it's true: there's no salt. Generate an apache password file entry for user myUser, password myPassword:

igb@mail:~$ htpasswd -nbs myName myPassword

myName:{SHA}VBPuJHI7uixaa6LQGWx4s+5GKNE=

and repeat the same task using simple hashing:

igb@mail:~$ echo -n myPassword | openssl dgst -sha1 -binary | openssl enc -base64

VBPuJHI7uixaa6LQGWx4s+5GKNE=

to see that they're the same. I wonder how many sites quickly thought "MD5 is a bit broken, SHA1 is better"? When in fact, dictionary searching given a file of 1000 SHA1 hashes is at least 1000 times easier (unsalted).

It's actually documented http://httpd.apache.org/docs/2.2/misc/password_encryptions.html

The iterated MD5 that is used if you select that option is here:

http://svn.apache.org/viewvc/apr/apr-util/branches/1.3.x/crypto/apr_md5.c?view=co

It does have good salting, and is clearly the better option. It at least generates different outputs each time it is called with the same input!

igb@mail:~$ for i in 1 2 3 4; do htpasswd -nbm myName myPassword | head -1; done

myName:$apr1$8wUtj8FR$m2OfIoNqjJVkNYjAhwZ25.

myName:$apr1$3qS0CJOD$ONOeHyqTIUPgnMBKH9XCW0

myName:$apr1$g4igyO/H$XJIx4OHNIDxDD4m3Q5vNj1

myName:$apr1$95GXXu0V$wPM3zi/BLMVgpGJI4KNAC/

igb@mail:~$

However, that too makes one wonder. It uses sort-of iterated MD5: it doesn't repeat the whole algorithm, complete with finalisation, rather it iterates repeatedly with the password, the salt and some fixed strings, calling the hash update function each time. Unless I'm missing something, the way the code is written means that running apr_md5_update repeatedly is equivalent to building a buffer containing the catenation of the successive strings and calling it once: that is ripe for hardware acceleration.

It's not parameterised (by contrast, see the SHA256 and SHA512 based hashes now used for the password file on recent Linuxes and Solarises here http://pythonhosted.org/passlib/lib/passlib.hash.sha256_crypt.html - they have a parameter for how many iterations to use, allowing scaling over time). The comment in the code shows how old the decisions are:

     * And now, just to make sure things don't run too fast..
     * On a 60 Mhz Pentium this takes 34 msec, so you would
     * need 30 seconds to build a 1000 entry dictionary...
     */

My laptop (three year old Air with an i5 processor) can compute 240000 such hashes in 30s without even going to the effort of writing dedicated code:

ians-macbook-air:~ igb$ time (for i in 1 2 3 4; do head -60000 /usr/share/dict/words | openssl passwd -apr1 -salt ayS1/GqV -stdin > /dev/null & done; wait)

real 0m28.363s

user 1m46.752s

sys 0m0.227s

ians-macbook-air:~ igb$

and a more modern i7 Air is can do about 400000. Being able to perform hashes 400x faster using a very naive approach, with presumably much more performance available via GPUs and other hardware tweaks, makes the loss of a password hash pretty serious. ~10k/sec means that you could could run the top 10000 entries from the roku database (say) again 86400 users in a day using a laptop, which would be a pretty devastating attack against a large stolen hash file.

Fwd: "Evil Twin" WiFi

Today programme at the moment (so you can pull it from iPlayer later today, starting 1h20 in) is talking about security of public WiFi, with the usual claim that using an untrusted WiFi network is risky for, specifically, accessing banking. And that people should not access banking websites from public locations.

I'm very sceptical about this claim. OK, we have recently had a case in which a major browser had a flaw which permitted the use of fake certificates, but only under specific circumstances and there's no suggestion I've seen that it had been exploited on a long-term or wide-spread basis. And there is a roughly plausible mechanism that can be used:

* Hi-jack insecure connections to Google

* Look for searches for BankCo UK Banking

* Inject a fake URL pointing to https://BankC0.co.uk

* Look for DNS lookups for BankC0.co.uk, and return IP number of attacker's system

* Present a certificate for BankC0.co.uk, rather than BankCo.co.uk.

Alternatively, you can hijack requests destined for http://BankCo.co.uk and redirect them to https://BankC0.co.uk, possibly with the help

of fiddling with Google.

This attack would work even in the face of certificate pinning and, arguably, certificate transparency. Transparency would allow BankCo to patrol the logs looking for "similar" names being issued, but if the claim is users don't check URLs then the attacker could pass back an arbitrary string as the URL and hope the victim doesn't notice.

But is this actually happening? Yes, in principle we should worry about capability and potential rather than execution, but looking at the attacks that are actually "in the wild" helps us prioritise. Wikipedia's entry for "Evil Twin" networks is old and has very little hard information about actual exploitation. The pages that come up towards the top of the Google search are getting on for ten years old. The line between the good guys and the bad guys is blurred and the tools used by the bad guys rarely remain secret for long; it defies belief that "Evil Twin" attacks could have been happening for ten years and yet no example of code or hardware has emerged.

So why do I think the risk is overstated?

Firstly, the attack is quite narrow. If you have a bookmark for your online banking, it doesn't work. If your web browser's search bar uses https to access Google, as is increasingly common, it's narrower still (you'd have to hope that users bookmarked the http:// version on their banks website, and use a redirect). Alternatively, you'd have to hope that people override the many warnings of fake certificates while no-one in the location complains to the owners.

Secondly, the attack doesn't yield money. Unlike skimming PINs and card details, which you can monetise immediately, all you get from this attack is a set of login credentials. If the bank uses one of those funky card machines or some other second factor, then you will not be able to login afterwards. If the bank uses "select letters 2 and 5" type authentication you might be able to guess, but even the banks that do that require a phone call in order to set up a new payment. And even if you intend to make transactions within the login session the user has established, rather than afterwards, you will again hit the problem that setting up a new recipient requires out-of-band authentication. There is far more advantage in having credit card number, CV2 and address than in having online banking details, and far easier ways to obtain them.

Thirdly, the attack leaves virtual fingerprints everywhere. Ross Anderson has written that the massive step forward in credit card fraud was the realisation by attackers that if they didn't put through the transaction that was hijacked by the skimmer, the banks couldn't cross-match accounts that had been the victim of fraud in order to find out where it had happened. But here, the attacker doesn't control other payments, so if the victim has paid by debit card (it's a reasonable bet in Starbucks, a near-certainly in hotels) there will be common transactions between victims. The attacker is probably able to avoid just using a single IP number for the transactions, but given the detail of the logs that are kept it's likely that there will be timing or format similarities between attacked sessions that allow the man-in-the-middle to be retrospectively identified. And, of course, because there is no way to obtain cash from this attack, you have the problem of money laundering: you have to have a destination where you can transfer the money to, where you can subsequently draw it out, without getting caught. Even if you manage this, the account(s) will be again an obvious common factor between the victims.

Fourthly, the attack leaves actual fingerprints. You're going to have to carry the equipment into the building, or nearby. The equipment has a history. If you abandon the equipment, it will be found, and the police will have your software to analyse and the hardware to match for fingerprints and DNA. If you stay near the equipment, you risk observation by CCTV. There's a reason why ATM skimming is done in petrol station forecourts and out of the way street corners, and that is for an attack which can immediately be converted into cash, rather than requiring laundering.

And finally, the bandwidth of the attack is very low. You might be able to obtain login details for bank accounts, but the subset of those where you can set up a payment to a new destination will be vanishingly small. What can you do with access to a bank account where you can't transfer money?

If you were going to conduct these sorts of attacks on on-line banking, you would do so at far lower risk by malware infection, installing keyloggers on victim machines. There's little evidence that in the UK at least, key loggers are today a significant risk for online banking either; the phone call to setup new recipients is the key defence. If you want GMail account details (or similar) then Malware is an infinitely more effective attack, and that clearly is circulating in the wild. And if you want to steal money and goods, credit card details are the best thing to have, which again doesn't require an elaborate attack on WiFi.

So I think this is another of the cyber-crime industry's bogie-men: an attack which is theoretically possible, but only actually works against a tiny, possibly null, subset of potential victims, at high risk and expense to the attacker.

ian

Why Gaol Doesn't Work, and an alternative

One of the common threads in discussions about Caredata and other large databases is the idea that there should be gaol terms for those that transgress. See, for example, Ben Goldacre's two columns on the topic in the Guardian.

I don't think that this can work, and I don't think it's an effective penalty. Worse, I think it's a distraction.

In the aftermath of the Herald of Free Enterprise disaster, there was a massive call for the introduction of an effective charge of Corporate Manslaughter. Such legislation has now been on the books for seven years. There have been very few prosecutions, even fewer convictions, and I believe (I would welcome correction) no gaol terms.

The problem is that the threshold to get over for showing that the company either sanctioned, or was reckless about, the behaviour that led to the death is extremely difficult. Courts are ill-placed to determine who said what to whom in corridors and meeting rooms, and the threshold of "beyond a reasonable doubt" means that lack of evidence is lack of conviction.

And at least in the case of health and safety (the main area where corporate manslaughter is likely to arise) there is widespread public awareness of the legislation, and the endpoint --- a corpse --- is fairly unambiguous.

That's not remotely the case in data protection. Firstly, the legislation surrounding data protection is not remotely unambiguous and there is very little case law. Actually demonstrating that an individual in senior management grossly breached, or was reckless as to the breach, will be virtually impossible. Consider Caredata today: ministers and senior directors are unable to agree on what was released, under what provisions, and what the law actually says. This would not even get over civil standards of proof, never mind criminal. Courts require a very high threshold to gaol people for acts committed as officers of companies, and this would not get close that level.

Secondly, if the claim were that it would deter individuals from misusing data they have access to, it would be even less effective. Courts will be very reluctant to support the contention that employees have a wide-ranging obligation to check the orders they are given by their employer for lawfulness except when the act is so manifestly unlawful as to fail the "reasonable man" test, or when the employee is a qualified professional being asked to knowingly breach their professional obligations (for example, an accountant being asked to file misleading accounts). Actually pinning down someone in the chain involved in releasing data who can be reasonably expected to realise that the act they were asked to commit was part of any unlawful scheme would be very difficult in a civil case; again, it is fanciful to think it would pass a "beyond a reasonable doubt" test. In the case of the release of HES information to the IFoA, assume for one moment that it actually is, in terms, illegal: where would you place the responsibility, and whom would you propose to prosecute?

It is possible that the threat might be useful against individuals who, of their own volition, access, release or otherwise mis-use data they are not entitled to handle in this way. However, that is where the "distraction" argument comes in. Data controllers should put in place controls and processes such that individuals cannot release data they are not entitled to. By having "oh, but they'll go to gaol" lying around as a rusty blunderbuss, a data controller can put in place inadequate controls and defend them with the argument that the staff are incentivised to behave by the threat of gaol. But that's true of frauds carried out by staff against either their employer or their employer's customers: it's straightforwardly illegal, and you can go to gaol. People still do it, because they (accurately) regard the risk of detection as low, the risk of prosecution as even lower (employers are very reluctant to admit to fraud in their operation) and the risk of serious sanction almost infinitesimal.

And in any event, none of this consoles the victims. If your medical record is leaked, that someone went to gaol does not get your privacy back. And until a significant number of people have been gaoled pour encourager les autres (ie, a significant number of offences have been committed) the threat is hollow anyway. So in the meantime, data controllers will deploy inadequate controls backed by implausible threats, and everything will go on much as it already does.

For sanctions to be effective, they have to be usable and deterring. Data protection failures are unlikely, other than in the most egregious cases, to leave a detailed enough trail to sustain a criminal prosecution, still less one ending in gaol time for individuals. It's a hollow threat, which makes the threatener look weak.

No, far more effective is a civil regime as follows:

As a data controller, you are responsible for the data you handle. If it leaks, you have have broken that responsibility. We do not care why it happened: you are responsible for implementing controls sufficient for the material at hand. After one leak of government-supplied data you will be subject to a one year suspension from the processing of any government-supplied data for any purpose, including existing contracts. This will probably bankrupt you. A second offence will result in a ten year ban, which will bankrupt you. If you have any doubts about your data protection regime, please seek advice from the ICO or CESG, who will be only too happy to help. Board, hear this: just as you are still liable to repay money to your customers that was stolen by rogue staff, yes, we are making you responsible for your staff. We are not joking.

This would also incentivise other staff to keep an eye on their colleagues: knowledge that everyone will lose their jobs in the event of a failure will focus everyone's minds wonderfully. The fear of this will put a massive premium on the willingness of private sector companies to take on risky contracts, which will make government much more careful about issuing them. Everyone wins.

Caredata Governance

Part of the reason why Caredata has become such a hot topic is the revelation that patient-level data was sold to actuaries, for a study into which factors are meaningful when assessing premiums. And that when this was revealed, no-one appears quite clear who approved it, and under what rules. There is now some significant debate as to whether this sale was wrong, whether it was permissible under the rules at the time, whether it would be permissible now (ie, under the Caredata rules as planned for the now-delayed spring 2014 launch) and whether it will be permissible under the hypothetical rules Jeremy Hunt is proposing in the aftermath of Friday's announcement of new legislation.

The problem seems to be a governance structure that is so complex that actual responsibility and accountability has been diffused to the point of invisibility. There is a complex mesh of advisor groups, boards and executives --- has anyone seen a diagram? --- but, when an actual case is challenged, no-one appears able to point to who took the decision, and under what rules. Even if the people who agreed the release of the IFoA can be identified, it's not at all clear what rules they were operating under and whether those rules were followed. The failure of the HSCIC to produce a code of practice exacerbates this.

The governance should have three clear components.

First, there should be a set of rules setting down the purposes for which data can be released, and in what form. The rules are owned by a group of people, with a named chairman, who sign off successive releases of the document. If the rules are found to be inadequate, either because they do not cover some case or because public opinion challenges the contents, that group of people are tasked with re-writing it. Those people are appointed by a minister who is democratically accountable to parliament (or, more probably, a select committee); it is likely that the process and policy for these appointments would be the subject of secondary legislation or the schedule to primary legislation. This is strategy.

Secondly, there should be another group of people who consider requests for access and evaluate them in the context of the rules. These decisions should be uncontentious, and if there is disagreement between reasonably informed people then that is more likely to reflect a problem with the rules than anything else. These people will probably need to be employees of the agency handing the data as the decisions will need to be made relatively quickly, but as they wield relatively little power this is not of itself dangerous. This is tactics.

And finally, there needs to be oversight that the decisions are being made correctly and that the process is fit for purpose. This could be done by a select committee directly, is more commonly done by appointing a retired judge or similar to act as a regulator. This person does not make decisions or policy, but confirms that the process is being followed, samples decisions to check in detail, and reports annually. This is audit. For all the fact that the legislation has many problems and there has been a lot of dispute, the role of the Interception of Communications Commissioner is a good model.

One committee, named and appointed by a minister who is democratically accountable, sets detailed policy. A second committee executes it. A commissioner checks the process is being followed.

That way, when things go wrong, people can be held to account. Democratically.

Opting Out Is Always Rational

One of the most common memes used in support of mass health data projects is that the data supports important research. Whether it is disease causation, effective treatment, epidemiology, drug side-effects, researchers need large amounts of data, so your data matters.

But from the perspective of a patient, ie you, your data doesn't matter.

Your data would only matter if a study which looked at the whole dataset would have a different outcome with or without your participation. But in a dataset covering 47m people (the size of the Hospital Episode Statistics database) or around 53m people (the number of people registered with general practitioners in England, assuming everyone is), the chances of your individual record being anything other than statistical noise are infinitesimal. In order for that to be the case, you would have to be very unlike the rest of the dataset, but mass population studies rarely identify things that affect only one person. So there always be sufficient people who look like you to fill your place in the analysis. And of course, the chances of a medical breakthrough hinging on your personal data, _and_ being related to a condition you have, _and_ producing a change in treatment quickly enough to benefit you are similarly small. An infinitesimal chance of a very small benefit has a net present value of zero, for practical purposes.

On the other hand, the risk of the data being leaked, re-identified or otherwise mis-used is greater than zero. We don't know how much greater, and without a code of practice we can't calculate it. But if, for example, your health record in which you talk to your GP about your depression were leaked to your ex-spouse in a contested custody battle, the effect would be immediately harmful. That's an immediate risk: a small chance times a very large disbenefiit has a net present value considerable greater than zero.

Now the problem with this, of course, is that if everyone thinks like this, there is no data. But of course, they won't; Germany's scheme is opt-in, and yet has reasonable numbers of participants. But shouting yet more loudly about potential benefits doesn't work, because that has already been written down to zero. What needs to happen is calm, rational discussion about why people are over-estimating the potential harm such a project can cause. And without transparent, accountable organisations handling the data, that will never happen.

ian

Why joining against Experian Mosaic is easy

One issue that has arisen in the debate about the release by either the HSCIC or its predecessor NHSIC is the joining of the HES hospital data against Mosaic demographic data.

This would have been done by NHSIC. And once they had made the basic decision to release the data in the first place (a separate discussion) this was the _right_ thing to do, and it would be the correct way to do a similar task for a less controversial research project.

Mosaic data maps very small areas to demographic tags. Let's assume that the data goes down to full postcode level (I believe that in some cases it's slightly less granular than that).

The Mosaic data would look like this:

X12 3YZ Demographic Description 1

X12 3YY Demographic Description 2

X12 3YX Demographic Description 1

X12 3YW Demographic Description 1

There are a lot of full postcodes in the country (I'm guessing, but around 2m --- 20 million houses, ten per code). There are a few hundred Mosaic descriptions, if that.

So the process will have been something like this:

IFoA take the Mosaic data and, with Experian's agreement, pass it to the NHSIC for this specific purpose (this is a standard thing to do with this sort of data).

NHSIC join the HES data against the Mosaic data using the postcode as the key, so that each HES record is extended by a demographic description.

NHSIC then truncate the postcodes to the agreed length (probably just the initial letters like "B" or "SW" would be enough) and hand over the records. All that IFoA see against each patient is therefore a very low resolution postcode, which will match an entire city or county, plus a demographic tag, which will be shared amongst tens of thousands of postcodes.

The basic agreement to release data to the IFoA is something that there is a lot of dispute about, and I think it was a very, very bad thing. But once you've made the decision to do it, what was done with Mosaic tags was the right thing: the IFoA got the data they could use, and the level of resolution in it was appropriately reduced.

ian

When Hubris Meets Over-Promotion

Yesterday, the Health Select Committee met to discuss the Caredata project. It was a shocking thing to watch.

You can see the entire car crash here.

The first part was interesting but unexciting. Phil Booth (Medconfidential, ex-No2ID) and Nick Pickles (Big Brother Watch) outlined the concerns about the identifiability of data when combined with other data sets, issues of consent and issues of safe processing and transparent policies around release. Sharmila Nebhrajani (AMRC) and Peter Weissberg (BHF) made a strong case for the benefits of processing data for public health while admitting that the execution of this project left a lot to be desired. Chand Nagpaul for the RCGP presented the issues confronting GPs, particular confidentiality with patients and responsibilities as data controllers, while again making it clear that the project has a whole has massive potential benefits.

There were attempts to get Phil and Nick to condemn the processing of the data in any circumstances, which was rightly seen as the straw man it was, but in general terms there was nothing to surprise those familiar with the saga. The Committee showed impatience with presentations of benefits as though those of themselves negated risks, but in general proceedings showed a broad agreement.

This was not true of the second part. Daniel Poulter (undersecretary of state for health), Tim Kelsey (NHS England, Director for Patients and Information) and Max Jones (Director, HSCIC) were under-briefed and unimpressive. The committee drew harsh inferences from the fact that they had not attended the first part, and Poulter, in particular, was clearly not on top of his brief. All three appeared to assume that the committee would roll over in the face of a presentation of benefits, and at several points Kelsey seemed to think that the meeting was a platform for him to set the agenda, rather than answer questions. Jones relied on the defence that the HSCIC was a new body and therefore the actions of its predecessors were neither relevant nor knowable, which is an extraordinary legal theory.

A massive backlog of points to be confirmed in writing later built up, as Jones blustered and repeatedly claimed not to know about the key operations of the body he is director of: for example, the code of practice for the HSCIC's processing of Caredata assets has yet to be written, but he could not provide a timescale for its production. Sarah Woolaston and Charlotte Leslie, who clearly _were_ on top of their briefs, picked away at the inconsistencies, and got very little hard information for their pains. The threat hangs in the air of the HSC summoning the staff of of HSCIC's predecessor bodies: I don't have Erskine May by heart, but I would have thought that the summoning of Mark Thompson in his guise as former DG of the BBC sets a precedent.

What did we learn? Firstly, we learnt that the NHS has a habit of promoting middle-managers to leadership roles without getting leaders: Kelsey and, particularly, Jones looked under-prepared, under-briefed and under-rehearsed. Their endless recourse to "we don't know, we'll write to you" was completely unacceptable for senior managers of major NHS functions: they should know, or have it in their briefing pack in front of them. Secondly, we learnt that treating a select committee with contempt, by assuming that an invitation to appear in front of them is a platform to make statements, goes down very badly. Thirdly, we saw how shallow the talent pool in the Tory party is, given Poulter's hesitant, blustering and uncertain performance: his civil servants will be very cross, I suspect.

But to my mind, the most shocking revelation was that the HSCIC is collecting data, and releasing it to consumers, without having a code of practice in place. Everything I've ever done in information governance --- I've run an ISO 27001 accredited operation --- says that this is insane. Coupled with the HSCIC's claim that it does not hold the records of its predecessors (a claim I intend to test with some FoI requests) and you are left with the obvious conclusion that information governance in the NHS is a Potemkin village, thin sheets of painted board concealing a swamp of poor practice.

It's to be hoped that the HSC follow through. If they do not, Jones and Kelsey will be able to get away with not knowing and not telling. But if these are the best people the NHS can put up to make their case, and the best arguments, then Caredata is dead in the water, either because the HSC will stop it, or because the rate of opt-out will render the scheme useless.

And Dan Poulter? I think he can forgot his ambitions, to be honest.

ian

NHS Email

Begin forwarded message:

From: MAILER-DAEMON@nhs-pd1e-esg001.ad1.nhs.net (Mail Delivery System)

Subject: Undelivered Mail Returned to Sender

Date: Mon 24 Feb 2014 07:34:54 GMT

To: igb@batten.eu.org

This is the mail system at host nhs-pd1e-esg001.ad1.nhs.net.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to <postmaster>

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

The mail system

<england.cdo@nhs.uk>: mail for nhs.uk loops back to myself
Reporting-MTA: dns; nhs-pd1e-esg001.ad1.nhs.net
X-Postfix-Queue-ID: 8195544916D
X-Postfix-Sender: rfc822; igb@batten.eu.org
Arrival-Date: Mon, 24 Feb 2014 07:34:53 +0000 (GMT)

Final-Recipient: rfc822; england.cdo@nhs.uk
Original-Recipient: rfc822;england.cdo@nhs.uk
Action: failed
Status: 5.4.6
Diagnostic-Code: X-Postfix; mail for nhs.uk loops back to myself

From: Ian Batten <igb@batten.eu.org>

Subject: F.A.O. Information Governance Compliance Team

Date: Mon 24 Feb 2014 07:34:50 GMT

To: enquiries@hscic.gov.uk

Cc: england.cdo@nhs.uk

The Staple Inn Actuarial Society processed a large volume of Hospital Episode Statistics, which they also joined to Experian credit reference data.

Please supply:

* The submission made by SIAS in support of obtaining this data. This may take the form of a Privacy Impact Assessment, a Research Proposal, or some other document.

* The minutes of meetings at which this proposal was discussed.

* Details of the financial settlement between HSCIC and SIAS.

* Details of any agreement between HSCIC and SIAS which permits the combining of HES data with Experian data

It has been clearly stated by Geraint Lewis, NHS Chief Data Officer, that insurance companies are not able to purchase HSCIC data for commercial use, and that HSCIC does not sell data on a commercial basis, it only recovers costs. I therefore give you advance notice that any refusal on the basis of "commercial confidence" will be the immediate subject of an appeal to the ICO.

Ian Batten
XXX
Birmingham
XXX

Why I am opting out of #caredata

Today, I sent a letter to my GP confirming that I am opting out of the Caredata scheme, and do not want my data uploaded in any form to either the secondary uses databases (codes 9Nu0, 9Nu4), or to other record systems. I am already opted out of the Summary Care Record Scheme (code 93C3); I have pre-emptively also added 93C1 to opt out of upload to local record keeping systems.

I opted out of the SCR scheme because, for me, the risks were entirely disproportionate to any benefits. I am not allergic to anything, I am not taking long-term medication, there is no particular information in the SCR extract that would assist a doctor in the extremely implausible scenario where I am unable to communicate but my identity can be established with sufficient certainty as to make use of my records without further confirmation safe and good practice. As there is no benefit to me, no risk, no matter how small, is worth taking, and my general objection to large government databases applies: data leaks, data is repurposed, data is wrong.

I am opting out of the nascent Midland Care Records scheme as it appears to be run by people outside the NHS. There is a website, but such contact details as it contains refers to "powered by Central Midlands Commissioning Support Unit". The link takes you to www.experiencecounts.org.uk which purports to be in some way affiliated with the NHS, but provides no evidence for it. Even its domain name is outside the NHS domain, and therefore it presumably operates outside the NHS information governance framework. It looks like a commercial operation, and therefore not a safe place for my records.

However, the Caredata scheme opens up a different moral conundrum: I am asked to provide my records not for my own benefit, but for the common good. Although I might benefit directly in some rather unlikely circumstances, it is far more likely that the benefit will be more diffuse; drug research, epidemiology, treatment standards on a national basis. I am broadly receptive to these aims, even to the point of overcoming my needle fear to have samples taken for the Biobank project. I have participated in several followup questionnaires and even worn an activity monitor for a week for them.

However, Biobank is a model of ethnics and information governance. I was approached, sent information, gave consent after exploring their objectives and structure, and was at all times in control of the process of providing data. Contrast Caredata: after a lengthy period in which the project was veiled in secrecy, the NHS rather reluctantly agreed to a minimum cost information campaign involving using the Post Office junk mail channel. I didn't receive the leaflet, and I don't know anyone who did. As it happens, I am not opted out from receiving unaddressed mail, but it is interesting to contrast Daniel Poulter MP's statement to parliament that it was delivered to those people who have opted out with the NHS's flat contradiction. Misleading parliament is usually seen as a bad thing, but apparently not when it's done to mislead people about NHS projects.

Had I received the leaflet, it is not clear I would have been any the wiser. The leaflet is deeply misleading; as well as only mentioning the name of the project in the URL at the foot of the last page, it is vague to the point of obfuscation about what data will be uploaded, and for what purposes. Couple that with the NHS's decision to use a bizarre interpretation of "selling" in their assurances that the data will not be sold while publishing a price list and Tim Kelsey's bland claim that data re-identification is not possible, or if it's possible it's very difficult, or if you do it it's illegal (although it's not clear under what legislation) and you are left in a position of not really knowing what the project is actually going to do.

Finally, Gerait Lewis published a blog which, although not answering the question as to whether a blog is an official statement of NHS policy, did remove some mystery from the proceedings. But not terribly reassuringly. For example, having defined red data as data which is straightforwardly personally identifiable, he writes it may be released if there is "legal approval [from] the Secretary of State for Health or the Health Research Authority following independent advice from the Confidentiality Advisory Group (CAG).". So that boils down to "the NHS can't release your personal data without telling you unless the NHS decides to release your personal data without telling you". Recent minutes of the CAG show them agreeing to the release of identifiable data in the absence of consent or opt-out for a project, risk stratification, they have reservations about the value of, so they are hardly a fierce guardian of privacy, and in any event "advice" is not a veto, so the Secretary of State can ignore them anyway.

As to the benefits, well, obviously insurance companies can benefit from this sort of data. The NHS again is totally confused as to whether insurance companies will be prevented from buying it, or will be able to buy it but will be given a stiff telling off if they do the wrong things with it, or what. Similarly drug companies: research, marketing, what is permitted? In an outbreak of black farce, the original timescale was for the uploads to start before the committee met to agree the permitted purposes; in any event, the purposes can be changed at any time, so provide little solace.

There is now a six month pause while the NHS tells us why we are wrong; already there have been outbreaks of "you little people don't understand, and we doctors know best" from Clare Gerada, which is a priceless demonstration of why doctors should keep off the telly. Sorry, Clare, but speaking slowly and being patronising doesn't convince when you're so obviously contradicting yourself: "commercial entities can't have access, unless they can have access".

So I'm opted out. If the NHS can make a case, I'll change my mind. But something better than this car-crash of a publicity programme has to convince me.