WTF – Tim O’Reilly – Lightbulbs On!

What's the Future - Tim O'Reilly

Best Read of the Year, not just for high technology, but for a reasoned meaning behind political events over the last two years, both in the UK and the USA. I can relate it straight back to some of the prescient statements made by Jeff Bezos about Amazon “Day 1” disciplines: the best defence against an organisations path to oblivion being:

  1. customer obsession
  2. a skeptical view of proxies
  3. the eager adoption of external trends, and
  4. high-velocity decision making

Things go off course when interests divide in a zero-sum way between different customer groups that you serve, and where proxies indicating “success” diverge from a clearly defined “desired outcome”.

The normal path is to start with your “customer” and give an analogue of what indicates “success” for them in what you do; a clear understanding of the desired outcome. Then the measures to track progress toward that goal, the path you follow to get there (adjusting as you go), and a frequent review that steps still serve the intended objective. 

Fake News on Social Media, Finance Industry Meltdowns, unfettered slavery to “the market” and to “shareholder value” have all been central to recent political events in both the UK and the USA. Politicians of all colours were complicit in letting proxies for “success” dissociate fair balance of both wealth and future prospects from a vast majority of the customers they were elected to serve. In the face of that, the electorate in the UK bit back – as they did for Trump in the US too.

Part 3 of the book, entitled “A World Ruled by Algorithms” – pages 153-252 – is brilliant writing on our current state and injustices. Part 4 (pages 255-350) entitled “It’s up to us” maps a path to brighter times for us and our descendants.

Tim says:

The barriers to fresh thinking are even higher in politics than in business. The Overton Window, a term introduced by Joseph P. Overton of the Mackinac Center for Public Policy,  says that an ideas political viability falls within a window framing a range of policies considered politically acceptable in the current climate of public opinion. There are ideas that a politician simply cannot recommend without being considered too extreme to gain or keep public office.

In the 2016 US presidential election, Donald Trump didn’t just  push the Overton Window far too to right, he shattered it, making statement after statement that would have been disqualifying for any previous candidate. Fortunately, once the window has come unstuck, it is possible to move it radically new directions.

He then says that when such things happen, as they did at the time of the Great Depression, the scene is set to do radical things to change course for the ultimate greater good. So, things may well get better the other side of Trumps outrageous pandering to the excesses of the right, and indeed after we see the result of our electorates division over BRexit played out in the next 18 months.

One final thing that struck me was how one political “hot potato” issue involving Uber in Taiwan got very divided and extreme opinions split 50/50 – but nevertheless got reconciled to everyone’s satisfaction in the end. This using a technique called Principal Component Analysis (PCA) and a piece of software called “Pol.is”. This allows folks to publish assertions, vote and see how the filter bubbles evolve through many iterations over a 4 week period. “I think Passenger Liability Insurance should be mandatory for riders on UberX private vehicles” (heavy split votes, 33% both ends of the spectrum) evolved to 95% agreeing with “The Government should leverage this opportunity to challenge the taxi industry to improve their management and quality control system, so that drivers and riders would enjoy the same quality service as Uber”. The licensing authority in Taipei duly followed up for the citizens and all sides of that industry. 

I wonder what the BRexit “demand on parliament” would have looked like if we’d followed that process, and if indeed any of our politicians could have encapsulated the benefits to us all on either side of that question. I suspect we’d have a much clearer picture than we do right now.

In summary, a superb book. Highly recommended.

Your DNA – a Self Testing 101

23andMe testing kitYour DNA is a string of protein pairs that encapsulate your “build” instructions, as inherited from your birth parents. While copies of it are packed tightly into every cell in, and being given off, your body, it is of considerable size; a machine representation of it is some 2.6GB in length – the size of a blue-ray DVD.

The total entity – the human genome – is a string of C-G and A-T protein pairs. The exact “reference” structure, given the way in which strands are structured and subsections decoded, was first successfully concluded in 2003. It’s absolute accuracy has gradually improved regularly as more DNA samples have been analysed down the years since.

A sequencing machine will typically read short lengths of DNA chopped up into pieces (in a random pile, like separate pieces of a jigsaw), and by comparison against a known reference genome, gradually piece together which bit fits where; there are known ‘start’ and ‘end’ segment patterns along the way. To add a bit of complexity, the chopped read may get scanned backwards, so a lot of compute effort to piece a DNA sample into what it looks like if we were able to read it uninterrupted from beginning to end.

At the time of writing (July 2017), we’re up to version 38 of the reference human genome. 23andMe currently use version 37 for their data to surface inherited medical traits. Most of the DNA sampling industry trace family history reliably using version 36, and hence most exports to work with common DNA databases automatically “downgrade” to that version for best consistency.

DNA Structure

DNA has 46 sections (known as Chromosomes); 23 of them come from your birth father, 23 from your birth mother. While all humans have over 99% commonality, the 1% difference make every one of us (or a pair of identical twins) statistically unique.

The cost to sample your own DNA – or that of a relative – is these days in the range of £79-£149. The primary one looking for inherited medical traits is 23andMe. The biggest volume for family tree use is AncestryDNA. That said, there are other vendors such as Family Tree DNA (FTDNA) and MyHeritage that also offer low cost testing kits.

The Ancestry DNA database has some 4 million DNA samples to match against, 23andMe over 1 million. The one annoyance is that you can’t export your own data from these two and then insert it in the other for matching purposes (neither have import capabilities). However, all the major vendors do allow exports, so you can upload your data from AncestryDNA or 23andMe into FTDNA, MyHeritage and to the industry leading cross-platform GEDmatch DNA databases very simply.

Exports create a ZIP file. With FTDNA, MyHeritage and GEDmatch, you request an import, and these prompt for the name of that ZIP file itself; you have no need to break it open first at all.

On receipt of the testing kit, register the code on the provided sample bottle on their website. Just avoid eating/drinking for 30 minutes, spit into the provided tube up to the level mark, seal, put back in the box, seal it and pop it in a postbox. Results will follow in your account on their website in 2-4 weeks.

Family Tree matching

Once you receive your results, Ancestry and 23andMe will give you details of any suggested family matches on their own databases. The primary warning here is that matches will be against your birth mother and whoever made her pregnant; given historical unavailability of effective birth control mechanisms and the secrecy of adoption processes, this has been known to surface unexpected home truths. Relatives trace up and down the family tree from those two reference points. A quick gander of self help forums on social media can be entertaining, or a litany of horror stories – alongside others of raw delight. Take care, be ready for the unexpected:

My first social media experience was seeing someone confirm a doctor as her birth father. Her introductory note to him said that he may remember her Mum, as she used to be his nursing assistant.

Another was to a man, who once identified admitted to knowing her birth mother in his earlier years – but said it couldn’t be him “as he’d never make love with someone that ugly”.

Outside of those, fairly frequent outright denials questioning the fallibility of the science behind DNA testing, none of which stand up to any factual scrutiny. But among the stories, there are also stories of delight in all parties when long lost, separated or adopted kids locate, and successfully reconnect, with one or both birth parents and their families.

Loading into other databases, such as GEDmatch

In order to escape the confines of vendor specific DNA databases, you can export data from almost any of the common DNA databases and reload the resulting ZIP file into GEDmatch. Once imported, there’s quite a range of analysis tools sitting behind a fairly clunky user interface.

The key discovery tool is the “one to many” listing, which does a comparison of your DNA against everyone elses in the GEDmatch database – and lists partial matches in order of closeness to your own data. It does this using a unit of measure called “centiMorgans”, abbreviated “cM”. Segments that show long exact matches are totted up, giving a total proportion of DNA you share. If you matched yourself or an identical twin, you’d match a total of circa 6800cM. Half your DNA comes from each birth parent, so they’d show as circa 3400cM. From your grandparents, half again. As your family tree extends both upwards and sideways (to uncles, aunts, cousins, their kids, etc), the numbers will increasingly dilute by half each step; you’ll likely be in the thousands of potential matches 4 or 5 steps away from your own data:

If you want to surface birth parent, child, sibling, half sibling, uncle, aunt, niece, nephew, grandparent and grandchild relationships reliably, then only matches of greater than 1300cM are likely to have statistical significance. Any lower than that is an increasingly difficult struggle to fill out a family tree, usually persued by asking other family members to get their DNA tested; it is fairly common for GEDmatch to give you details (including email addresses) of 1-2,000 closest matches, albeit sorted in descending ‘close-ness’ order for you).

As one example from GEDmatch, the highlighted line shows a match against one of the subjects parents (their screen name and email address cropped off this picture):

GEDmatch parent

There are more advanced techniques to use a Chromosome browser to pinpoint whether a match comes down a male line or not (to help understand which side of the tree relationships a match is more likely to reside on), but these are currently outside my own knowledge (and current personal need).

Future – take care

One of the central tenets of the insurance industry is to scale societal costs equitably across a large base of folks who may, at random, have to take benefits from the funding pool. To specifically not prejudice anyone whose DNA may give indications of inherited risks or pre-conditions that may otherwise jeopardise their inclusion in cost effective health insurance or medical coverage.

Current UK law specifically makes it illegal for any commercial company or healthcare enterprise to solicit data, including DNA samples, where such provision may prejudice the financial cost, or service provision, to the owner of that data. Hence, please exercise due care with your DNA data, and with any entity that can associate that data with you as a uniquely identifiable individual. Wherever possible, only have that data stored in locations in which local laws, and the organisations holding your data, carry due weight or agreed safe harbour provisions.

Country/Federal Law Enforcement DNA records.

The largest DNA databases in many countries are held, and administered, for police and criminal justice use. A combination of crime scene samples, DNA of known convicted individuals, as well as samples to help locate missing people. The big issue at the time of writing is that there’s no ability to volunteer any submission for matching against missing person or police held samples, even though those data sets are fairly huge.

Access to such data assets are jealously guarded, and there is no current capability to volunteer your own readings for potential matches to be exposed to any case officer; intervention is at the discretion of the police, and they usually do their own custom sampling process and custom lab work. Personally, a great shame, particularly for individuals searching for a missing relative and seeking to help enquiries should their data help identify a match at some stage.

I’d personally gladly volunteer if there were appropriate safeguards to keep my physical identity well away from any third party organisation; only to bring the match to the attention of a case officer, and to leave any feedback to interested relatives only at their professional discretion.

I’d propose that any matches over 1300 cM (CentiMorgans) get fed back to both parties where possible, or at least allow cases to get closed. That would surface birth parent, child, sibling, half sibling, uncle, aunt, niece, nephew, grandparent and grandchild relationships reliably.

At the moment, police typically won’t take volunteer samples unless a missing person is vulnerable. Unfortunately not yet for tracing purposes.

Come join in – £99 is all you need to start

Whether for medical traits knowledge, or to help round out your family trees, now is a good time to get involved cost effectively. Ancestry currently add £20 postage to their £79 testing kit, hence £99 total. 23andMe do ancestry matching, Ethnicity and medical analyses too for £149 or so all in. However, Superdrug are currently selling their remaining stock of 23andMe testing kits (bought when the US dollar rate was better than it now is) for £99. So – quick, before stock runs out!

Either will permit you to load the raw data, once analysed, onto FTDNA, MyHeritage and GEDmatch when done too.

Never a better time to join in.

The Next Explosion – the Eyes have it

Crossing the Chasm Diagram

Crossing the Chasm – on one sheet of A4

One of the early lessons you pick up looking at product lifecycles is that some people hold out buying any new technology product or service longer than anyone else. You make it past the techies, the visionaries, the early majority, late majority and finally meet the laggards at the very right of the diagram (PDF version here). The normal way of selling at that end of the bell curve is to embed your product in something else; the person who swore they’d never buy a Microprocessor unknowingly have one inside the controls on their Microwave, or 50-100 ticking away in their car.

In 2016, Google started releasing access to its Vision API. They were routinely using their own Neural networks for several years; one typical application was taking the video footage from their Google Maps Streetview cars, and correlating house numbers from video footage onto GPS locations within each street. They even started to train their own models to pick out objects in photographs, and to be able to annotate a picture with a description of its contents – without any human interaction. They have also begun an effort to do likewise describing the story contained in hundreds of thousands of YouTube videos.

One example was to ask it to differentiate muffins and dogs:

This is does with aplomb, with usually much better than human performance. So, what’s next?

One notable time in Natural History was the explosion in the number of species on earth that  occured in the Cambrian period, some 534 million years ago. This was the time when it appears life forms first developed useful eyes, which led to an arms race between predators and prey. Eyes everywhere, and brains very sensitive to signals that come that way; if something or someone looks like they’re staring at you, sirens in your conscience will be at full volume.

Once a neural network is taught (you show it 1000s of images, and tell it which contain what, then it works out a model to fit), the resulting learning can be loaded down into a small device. It usually then needs no further training or connection to a bigger computer nor cloud service. It can just sit there, and report back what it sees, when it sees it; the target of the message can be a person or a computer program anywhere else.

While Google have been doing the heavy lifting on building the learning models in the cloud, Apple have slipped in with their own CloudML data format, a sort of PDF for the resulting machine learning data formats. Then using the Graphics Processing Units on their iPhone and iPad devices to run the resulting models on the users device. They also have their ARkit libraries (as in “Augmented Reality”) to sense surfaces and boundaries live on the embedded camera – and to superimpose objects in the field of view.

With iOS 11 coming in the autumn, any handwritten notes get automatically OCR’d and indexed – and added to local search. When a document on your desk is photo’d from an angle, it can automatically flatten it to look like a hi res scan of the original – and which you can then annotate. There are probably many like features which will be in place by the time the new iPhone models arrive in September/October.

However, tip of the iceberg. When I drive out of the car park in the local shopping centre here, the barrier automatically raises given the person with the ticket issued to my car number plate has already paid. And I guess we’re going to see a Cambrian explosion as inexpensive “eyes” get embedded in everything around us in our service.

With that, one example of what Amazon are experimenting with in their “Amazon Go” shop in Seattle. Every visitor a shoplifter: https://youtu.be/NrmMk1Myrxc

Lots more to follow.

PS: as a footnote, an example drawing a ruler on a real object. This is 3 weeks after ARkit got released. Next: personalised shoe and clothes measurements, and mail order supply to size: http://www.madewitharkit.com/post/162250399073/another-ar-measurement-app-demo-this-time

CloudKit – now that’s how to do a secure Database for users

Data Breach Hand Brick Wall Computer

One of the big controversies here relates to the appetite of the current UK government to release personal data with the most basic understanding of what constitutes personal identifiable information. The lessons are there in history, but I fear without knowing the context of the infamous AOL Data Leak, that we are destined to repeat it. With it goes personal information that we typically hold close to our chests, which may otherwise cause personal, social or (in the final analysis) financial prejudice.

When plans were first announced to release NHS records to third parties, and in the absence of what I thought were appropriate controls, I sought (with a heavy heart) to opt out of sharing my medical history with any third party – and instructed my GP accordingly. I’d gladly share everything with satisfactory controls in place (medical research is really important and should be encouraged), but I felt that insufficient care was being exercised. That said, we’re more than happy for my wife’s Genome to be stored in the USA by 23andMe – a company that demonstrably satisfied our privacy concerns.

It therefore came as quite a shock to find that a report, highlighting which third parties had already been granted access to health data with Government mandated approval, ran to a total 459 data releases to 160 organisations (last time I looked, that was 47 pages of PDF). See this and the associated PDFs on that page. Given the level of controls, I felt this was outrageous. Likewise the plans to release HMRC related personal financial data, again with soothing words from ministers in whom, given the NHS data implications, appear to have no empathy for the gross injustices likely to result from their actions.

The simple fact is that what constitutes individual identifiable information needs to be framed not only with what data fields are shared with a third party, but to know the resulting application of that data by the processing party. Not least if there is any suggestion that data is to be combined with other data sources, which could in turn triangulate back to make seemingly “anonymous” records traceable back to a specific individual.Which is precisely what happened in the AOL Data Leak example cited.

With that, and on a somewhat unrelated technical/programmer orientated journey, I set out to learn how Apple had architected it’s new CloudKit API announced this last week. This articulates the way in which applications running on your iPhone handset, iPad or Mac had a trusted way of accessing personal data stored (and synchronised between all of a users Apple devices) “in the Cloud”.

The central identifier that Apple associate with you, as a customer, is your Apple ID – typically an email address. In the Cloud, they give you access to two databases on their cloud infrastructure; one a public one, the other private. However, the second you try to create or access a table in either, the API accepts your iCloud identity and spits back a hash unique to your identity and the application on the iPhone asking to process that data. Different application, different hash. And everyone’s data is in there, so it’s immediately unable to permit any triangulation of disparate data that can trace back to uniquely identify a single user.

Apple take this one stage further, in that any application that asks for any personal identifiable data (like an email address, age, postcode, etc) from any table has to have access to that information specifically approved by the handset owners end user; no explicit permission (on a per application basis), no data.

The data maintained by Apple, besides holding personal information, health data (with HealthKit), details of home automation kit in your house (with HomeKit), and not least your credit card data stored to buy Music, Books and Apps, makes full use of this security model. And they’ve dogfooded it so that third party application providers use exactly the same model, and the same back end infrastructure. Which is also very, very inexpensive (data volumes go into Petabytes before you spend much money).

There are still some nuances I need to work. I’m used to SQL databases and to some NoSQL database structures (i’m MongoDB certified), but it’s not clear, based on looking at the way the database works, which engine is being used behind the scenes. It appears to be a key:value store with some garbage collection mechanics that look like a hybrid file system. It also has the capability to store “subscriptions”, so if specific criteria appear in the data store, specific messages can be dispatched to the users devices over the network automatically. Hence things like new diary appointments in a calendar can be synced across a users iPhone, iPad and Mac transparently, without the need for each to waste battery power polling the large database on the server waiting for events that are likely to arrive infrequently.

The final piece of the puzzle i’ve not worked out yet is, if you have a large database already (say of the calories, carbs, protein, fat and weights of thousands of foods in a nutrition database), how you’d get that loaded into an instance of the public database in Apple’s Cloud. Other that writing custom loading code of course!

That apart, really impressed how Apple have designed the datastore to ensure the security of users personal data, and to ensure an inability to triangulate data between information stored by different applications. And that if any personal identifiable data is requested by an application, that the user of the handset has to specifically authorise it’s disclosure for that application only. And without the app being able to sense if the data is actually present at all ahead of that release permission (so, for example, if a Health App wants to gain access to your blood sampling data, it doesn’t know if that data is even present or not before the permission is given – so the app can’t draw inferences on your probably having diabetes, which would be possible if it could deduce if it knew that you were recording glucose readings at all).

In summary, impressive design and a model that deserves our total respect. The more difficult job will be to get the same mindset in the folks looking to release our most personal data that we shared privately with our public sector servants. They owe us nothing less.

A first look at Apple HomeKit

Apple HomeKit Logo

Today’s video from Apple’s Worldwide Developers Conference viewing concerned HomeKit, which is the integration platform to control household appliances from your iPhone. Apple have defined a common set of Accessory Profiles, which are configured into a Home > Zone > Room hierarchy (you can define several ‘home’ locations, but one of them is normally selected as the primary one). Native devices include:

  • Garage Door Openers (with associated lighting)
  • Lights
  • Door locks
  • Thermostats
  • IP (Internet Protocol) Cameras
  • Switches

Currently, there are a myriad of different per vendor standards to control home automation products, but Apple are providing functionality to enable hardware (or software) bridges between disparate protocols and their own. Once a bridge has been discovered, the iPhone sees all the devices sitting the other side of the bridge as if they were directly connected to the iPhone and using the Apple provided interface protocols.

Every device type has a set of characteristics, such as:

  • Power State
  • Lock State
  • Target State
  • Brightness
  • Model Number
  • Current Temperature
  • etc

When devices are first defined, each has a compulsory “identify me” action. Hence if you’re sitting on the floor, trying to work out which of twelve identical-looking lightbulbs in the room to give an appropriate name, the “identify me” action on the iPhone pick list will result in the matching bulb blinking twice; for a security camera, blinking a colour LED, and so forth.

Each device, it’s room name, zone (like “upstairs”, “back garden”) and home name, plus the common characteristic actions, are encoded and enacted using Siri – Apple’s voice control on the iPhone. “Switch on all downstairs lights”, “Set the room temperature to 20 degrees C” and so forth are spoken into your iPhone handset. That is the default User Interface for the whole Home Automation Setup. The HomeKit resident database is in turn also available for use by vendor specific products via the HomeKit API, should a custom application be desirable.

There are of course extensive security controls to frustrate any attempt for anyone to be able to do “man in the middle” attacks, or to subvert the security of your device connections. For developers, Apple provide a software simulator so that you can test your software against a wide range of device types, even before the hardware is made available to you.

Most of the supporting detail to build compliant devices is found in the MFI (Made for iDevices) Guidelines, which are only available the other side of a license agreement with Apple here. The full WWDC presentation on HomeKit (just under an hour long) is called “Introduction to HomeKit” and present in the list of video sessions from WWDC here.

Overall, very impressive. That’s the home stuff largely queued up, just awaiting news of a bridge I think. Knowing how simple the voice setup is on Android JellyBean for a programmer (voice enabling an app is circa 20 lines of JavaScript), i’m sure a Google equivalent is eminently possible; if Google haven’t done their own API, then a bridge to Apple’s ecosystem (if the licensing allows it) should not be a major endeavour.

So, the only missing thing was talk of iBeacon support. However, that is a different use case. There are already pilots that sense presence of a low energy bluetooth beacon, and bring specific applications onto the lock screen. Examples include the Starbucks payment card app coming forward to make itself immediately available when you’re close to a Starbucks counter, or the Virgin Atlantic app making your boarding card available when you approach the check-in desk at an airport. Both are features of Apple’s PassBook loyalty card app – which is already used by hundreds of retailers, supermarkets and airlines.

The one thing about iBeacon is that you can enable your iPhone 5S to be a low energy beacon in it’s own right. You have full control over this and your presence is not made available to anything but applications on your own iPhone handset – over which, in the final analysis, you have total control. One use case already is pairing your Pebble Smartwatch with your iPhone 5S handset, so that if your phone leaves your immediate location by a specified short distance (say, 2 meters), you’re aggressively told immediately.

So, lots to look forward to in the Autumn. Quite a measured approach compared to the “Internet of Things” which other vendors are hyping with impunity (and quoting staggering revenue numbers which I find difficult to map onto any reality – starting with what folks seem to suggest is even a current huge market size already).

My next piece of homework will be to look at CloudKit, now that Apple are dogfooding it’s use in their own products while releasing it to third party developers. Hopefully, a good sign that Apple are now providing cloud services that match the resilience of competitive offerings for the first time – even if they are specific to Apple’s own platforms. But that’s all the other side of finishing my company’s end of year tax return prep work first!

Data Sharing: who do you trust?

Loose Lips Might Ship Sinks Poster

Yesterday I posted my full approval for folks like Apple and Google to know a lot of data about me, specifically from the devices I usually carry around with me. This is in the full knowledge that the full extent of data sharing is open, transparent and that I get notified (at least by Google) if any application on my Android handset is seeking to solicit more data from me, or changing their data sharing policy in any way. With that, I have full confidence that I can opt out if I ever feel the level of intrusion exceeds my comfort levels with the data use; i’m generally very happy if it does improve the level of service delivered to me without downsides.

I’ve only really baulked at one such update, which was a request by LinkedIn to be able to mine the call records of who I contacted, and who I received calls from, on my mobile phone. I felt this was a violation of the use I put their application to, so elected to remove the application from my Nexus 5 instead.

After I posted my note, I had a reply on Facebook from Bruce Stidston, that read:

You’re right, IMHO, up to a point when you say “what’s not to like?”. For me, the bit that’s not to like is scope creep. The NHS, for example, accumulates data on each patient, and that’s (potentially) cool when it’s used to improve patient outcomes by sharing within the NHS. The problem is that as we move into maturity in IT and data collection technologies, we’re not even in infancy when it comes to concepts of privacy. So when some bright spark reckons it’s cool to dish out “aggregated and individually unidentifiable” data to Big Pharma to shore up NHS finances, I need to be right there on the ball to say yay or nay — and that’s in the best-case situation. The real-case situation is they’ll do it anyway and seek forgiveness afterwards. That’s what’s not to like.

I think of this generalised problem as “the tragedy of the techno-morons”. Smart people did amazing things to make impossible things happen — think just for a moment of the layers of wonderful intricacy that make GPS work, which all of us now depend on — and then some Tim Nice-But-Dim (like my MP) who have only just worked out how a bicycle works are entrusted with the powers to sign off huge snowballs of potentially invasive applications for those technologies. I never forget that the guys at BT who decided that deep-packet inspection of private IP datastreams was fine for advertising purposes, have yet to be hauled before the courts.

I think Bruce is 100% correct. It was with some horror that I saw some plans to share my NHS data with commercial organisations, data which was claimed in the headlines to be anonymised but which appeared to contain my date of birth and postcode. The missing cluestick is that a UK postcode routinely covers an average of circa 10 households, and i’m pretty sure i’m the only one in my postcode of my age and gender, and that’s even before my day and month of birth get served up. This is a textbook example of history about to repeat itself, given the people looking at this process are obviously unaware of what happened when AOL released ‘anonymised data’ a few years ago. You only have to Google “AOL data leak” and you’ll probably find top of the list is this Wikipedia article.

The sad fact is that anonymising the data set relies on ensuring an inability to triangulate data, between disparate data sources, to be able to trace records provided back to specific named individuals. The proposals drove a bus straight through this without apparent due care and attention. The side effect of this is then for a commercial entity to be able to positively discriminate against me for the purposes of insurance (which should be a random level tax across a policy holding population) or to undermine my human rights for privacy, freedom of expression or freedom of movement without unwanted side effects.

The meme of “Crisis in the NHS” is not an appropriate one in my view, in that the UK health service is well funded and very efficient compared to the health systems in virtually every major economy. It appears to be being subverted in support of introducing American-style structural changes, where the costs are around double ours per head of population, not universal and yet stuffed with inefficiencies we should have no wish to copy here. With that in mind, seeing the delay in the consultation about data sharing enacted, it came as rather a shock to see this list of data sharing activity that had already taken place without consultation:

Ministers have gone against the findings of their own information governance review and allowed patient-identifiable data from GP records to be used in the NHS outside of the ‘safe havens’ recommended by the Caldicott report for six months.

Health secretary Jeremy Hunt has approved plans for NHS England to waive common confidentiality laws for six months under a legal exemption called section 251, allowing patient identifiable data to be passed to commissioners and support units.

This is despite the safe havens for potentially identifiable patient data recommended by the Government’s own Caldicott2 report published earlier this year not being in operation.

The extent of this sharing is documented here. At the time I first looked at the document of already approved data releases, it ran to 40 pages of A4. It’s currently 459 releases over 48 pages (latest up-to-date here). I fear Bruces “Tim – Nice but Dim” goes by the name of Jeremy Hunt and the damage has been in full flow, despite previous assurances, for some time now. This is an appalling travesty and an apparent violation of the whole basis of UK Data Protection Acts. The Minister should be thoroughly ashamed and, if justice were to be served, should be up in front of the European Court for a fundamental violation of Section 8 of the European Convention of Human Rights (the right to privacy).

It’s also with an equal level of concern that Ministers of the UK Government are also suggested that tax records should be released in a publicly accessible form by HMRC.

I’m all for data to be shared for Medical Research purposes (as suggested by Larry Page), or in support of Government initiatives to undertake projects for the common good of the UK population. My wife Jane already has all her genome stored at 23andMe, as we have full confidence in their data sharing policy and our ability to reverse out if we feel at all uncomfortable in the future. In doing so here in the UK, the folks releasing data should be fully cognizant of the need to ensure the privacy of individuals that may otherwise be subjected to personal or commercial discrimination as a result of provision of data, either directly or from being complicit in allowing triangulation from other sources to the same end result.

Those who don’t learn from history are, as always, destined to repeat it. We should by now know better than that, and have politicians that know likewise.