Future Health: DNA is one thing, but 90% of you is not you

One of my pet hates is seeing my wife visit the doctor, getting hunches of what may be afflicting her health, and this leading to a succession of “oh, that didn’t work – try this instead” visits for several weeks. I just wonder how much cost could be squeezed out of the process – and lack of secondary conditions occurring – if the root causes were much easier to identify reliably. I then wonder if there is a process to achieve that, especially in the context of new sensors coming to market and their connectivity to databases via mobile phone handsets – or indeed WiFi enabled, low end Bluetooth sensor hubs aka the Apple Watch.

I’ve personally kept a record of what i’ve eaten, down to fat, protein and carb content (plus my Monday 7am weight and daily calorie intake) every day since June 2002. A precursor to the future where devices can keep track of a wide variety of health signals, feeding a trend (in conjunction with “big data” and “machine learning” analyses) toward self service health. My Apple Watch has a years worth of heart rate data. But what signals would be far more compelling to a wider variety of (lack of) health root cause identification if they were available?

There is currently a lot of focus on Genetics, where the Human Genome can betray many characteristics or pre-dispositions to some health conditions that are inherited. My wife Jane got a complete 23andMe statistical assessment several years ago, and has also been tested for the BRCA2 (pronounced ‘bracca-2’) gene – a marker for inherited pre-disposition to risk of Breast Cancer – which she fortunately did not inherit from her afflicted father.

A lot of effort is underway to collect and sequence the complete Genome sequences from the DNA of hundreds of thousands of people, building them into a significant “Open Data” asset for ongoing research. One gotcha is that such data is being collected by numerous organisations around the world, and the size of each individuals DNA (assuming one byte to each nucleotide component – A/T or C/G combinations) runs to 3GB of base pairs. You can’t do research by throwing an SQL query (let alone thousands of machine learning attempts) over that data when samples are stored in many different organisations databases, hence the existence of an API (courtesy of the GA4GH Data Working Group) to permit distributed queries between co-operating research organisations. Notable that there are Amazon Web Services and Google employees participating in this effort.

However, I wonder if we’re missing a big and potentially just as important data asset; that of the profile of bacteria that everyone is dependent on. We are each home to approx. 10 trillion human cells among the 100 trillion microbial cells in and on our own bodies; you are 90% not you.

While our human DNA is 99.9% identical to any person next to us, the profile of our MicroBiome are typically only 10% similar; our age, diet, genetics, physiology and use of antibiotics are also heavy influencing factors. Our DNA is our blueprint; the profile of the bacteria we carry is an ever changing set of weather conditions that either influence our health – or are leading indicators of something being wrong – or both. Far from being inert passengers, these little organisms play essential roles in the most fundamental processes of our lives, including digestion, immune responses and even behaviour.

Different MicroBiome ecosystems are present in different areas of our body, from our skin, mouth, stomach, intestines and genitals; most promise is currently derived from the analysis of stool samples. Further, our gut is only second to our brain in the number of nerve endings present, many of them able to enact activity independently from decisions upstairs. In other areas, there are very active hotlines between the two nerve cities.

Research is emerging that suggests previously unknown links between our microbes and numerous diseases, including obesity, arthritis, autism, depression and a litany of auto-immune conditions. Everyone knows someone who eats like a horse but is skinny thin; the composition of microbes in their gut is a significant factor.

Meanwhile, costs of DNA sequencing and compute power have dropped to a level where analysis of our microbe ecosystems costs from $100M a decade ago to some $100 today. It should continue on that downward path to a level where personal regular sampling could become available to all – if access to the needed sequencing equipment plus compute resources were more accessible and had much shorter total turnaround times. Not least to provide a rich Open Data corpus of samples that we can use for research purposes (and to feed back discoveries to the folks providing samples). So, what’s stopping us?

Data Corpus for Research Projects

To date, significant resources are being expended on Human DNA Genetics and comparatively little on MicroBiome ecosystems; the largest research projects are custom built and have sampling populations of less than 4000 individuals. This results in insufficient population sizes and sample frequency on which to easily and quickly conduct wholesale analyses; this to understand the components of health afflictions, changes to the mix over time and to isolate root causes.

There are open data efforts underway with the American Gut Project (based out of the Knight Lab in the University of San Diego) plus a feeder “British Gut Project” (involving Tim Spector and staff at University College London). The main gotcha is that the service is one-shot and takes several months to turn around. My own sample, submitted in January, may take up 6 months to work through their sequencing then compute batch process.

In parallel, VC funded company uBiome provide the sampling with a 6-8 week turnaround (at least for the gut samples; slower for the other 4 area samples we’ve submitted), though they are currently not sharing the captured data to the best of my knowledge. That said, the analysis gives an indication of the names, types and quantities of bacteria present (with a league table of those over and under represented compared to all samples they’ve received to date), but do not currently communicate any health related findings.

My own uBiome measures suggest my gut ecosystem is more diverse than 83% of folks they’ve sampled to date, which is an analogue for being more healthy than most; those bacteria that are over represented – one up to 67x more than is usual – are of the type that orally administered probiotics attempt to get to your gut. So a life of avoiding antibiotics whenever possible appears to have helped me.

However, the gut ecosystem can flex quite dramatically. As an example, see what happened when one person contracted Salmonella over a three pay period (the green in the top of this picture; x-axis is days); you can see an aggressive killing spree where 30% of the gut bacteria population are displaced, followed by a gradual fight back to normality:

Salmonella affecting MicroBiome PopulationUnder usual circumstances, the US/UK Gut Projects and indeed uBiome take a single measure and report back many weeks later. The only extra feature that may be deduced is the delta between counts of genome start and end sequences, as this will give an indication to the relative species population growth rates from otherwise static data.

I am not aware of anyone offering a faster turnaround service, nor one that can map several successively time gapped samples, let alone one that can convey health afflictions that can be deduced from the mix – or indeed from progressive weather patterns – based on the profile of bacteria populations found.

My questions include:

  1. Is there demand for a fast turnaround, wholesale profile of a bacterial population to assist medical professionals isolating a indicators – or the root cause – of ill health with impressive accuracy?
  2. How useful would a large corpus of bacterial “open data” be to research teams, to support their own analysis hunches and indeed to support enough data to make use of machine learning inferences? Could we routinely take samples donated by patients or hospitals to incorporate into this research corpus? Do we need the extensive questionnaires the the various Gut Projects and uBiome issue completed alongside every sample?
  3. What are the steps in the analysis pipeline that are slowing the end to end process? Does increased sample size (beyond a small stain on a cotton bud) remove the need to enhance/copy the sample, with it’s associated need for nitrogen-based lab environments (many types of bacteria are happy as Larry in the Nitrogen of the gut, but perish with exposure to oxygen).
  4. Is there any work active to make the QIIME (pronounced “Chime”) pattern matching code take advantage of cloud spot instances, inc Hadoop or Spark, to speed the turnaround time from Sequencing reads to the resulting species type:volume value pairs?
  5. What’s the most effective delivery mechanism for providing “Open Data” exposure to researchers, while retaining the privacy (protection from financial or reputational prejudice) for those providing samples?
  6. How do we feed research discoveries back (in English) to the folks who’ve provided samples and their associated medical professionals?

New Generation Sequencing works by splitting DNA/RNA strands into relatively short read lengths, which then need to be reassembled against known patterns. Taking a poop sample with contains thousands of different bacteria is akin to throwing the pieces of many thousand puzzles into one pile and then having to reconstruct them back – and count the number of each. As an illustration, a single HiSeq run may generate up to 6 x 10^9 sequences; these then need reassembling and the count of 16S rDNA type:quantity value pairs deduced. I’ve seen estimates of six thousand CPU hours to do the associated analysis to end up with statistically valid type and count pairs. This is a possible use case for otherwise unused spot instance capacity at large cloud vendors if the data volumes could be ingested and processed cost effectively.

Nanopore sequencing is another route, which has much longer read lengths but is much more error prone (1% for NGS, typically up to 30% for portable Nanopore devices), which probably limits their utility for analysing bacteria samples in our use case. Much more useful if you’re testing for particular types of RNA or DNA, rather than the wholesale profiling exercise we need. Hence for the time being, we’re reliant on trying to make an industrial scale, lab based batch process turn around data as fast we are able – but having a network accessible data corpus and research findings feedback process in place if and when sampling technology gets to be low cost and distributed to the point of use.

The elephant in the room is in working out how to fund the build of the service, to map it’s likely cost profile as technology/process improvements feed through, and to know to what extent it’s diagnosis of health root causes will improve it’s commercial attractiveness as a paid service over time. That is what i’m trying to assess while on the bench between work contracts.

Other approaches

Nature has it’s way of providing short cuts. Dogs have been trained to be amazingly prescient at assessing whether someone has Parkinson’s just by smelling their skin. There are other techniques where a pocket sized spectrometer can assess the existence of 23 specific health disorders. There may well be other techniques that come to market that don’t require a thorough picture of a bacterial population profile to give medical professionals the identity of the root causes of someone’s ill health. That said, a thorough analysis may at least be of utility to the research community, even if we get to only eliminate ever rarer edge cases as we go.

Coming full circle

One thing that’s become eerily apparent to date is some of the common terminology between MicroBiome conditions and terms i’ve once heard used by Chinese Herbal Medicine (my wife’s psoriasis was cured after seeing a practitioner in Newbury for several weeks nearly 20 years ago). The concept of “balance” and the existence of “heat” (betraying the inflammation as your bacterial population of different species ebbs and flows in reaction to different conditions). Then consumption or application of specific plant matter that puts the bodies bacterial population back to operating norms.

Lingzhi Mushroom

Wild mushroom “Lingzhi” in China: cultivated in the far east, found to reduce Obesity

We’ve started to discover that some of the plants and herbs used in Chinese Medicine do have symbiotic effects on your bacterial population on conditions they are reckoned to help cure. With that, we are starting to see some statistically valid evidence that Chinese and Western medicine may well meet in the future, and be part of the same process in our future health management.

Until then, still work to do on the business plan.

Crossing the Chasm on One Page of A4 … and Wardley Maps

Crossing the Chasm Diagram

Crossing the Chasm – on one sheet of A4

The core essence of most management books I read can be boiled down to occupy a sheet of A4. There have also been a few big mistakes along the way, such as what were considered at the time to be seminal works, like Tom Peter’s “In Search of Excellence” — that in retrospect was an example summarised as “even the most successful companies possess DNA that also breed the seeds of their own destruction”.

I have much simpler business dynamics mapped out that I can explain to fast track employees — and demonstrate — inside an hour; there are usually four graphs that, once drawn, will betray the dynamics (or points of failure) afflicting any business. A very useful lesson I learnt from Microsoft when I used to distribute their software. But I digress.

Among my many Business books, I thought the insights in Geoffrey Moores Book “Crossing the Chasm” were brilliant — and useful for helping grow some of the product businesses i’ve run. The only gotcha is that I found myself keeping on cross referencing different parts of the book when trying to build a go-to-market plan for DEC Alpha AXP Servers (my first use of his work) back in the mid-1990’s — the time I worked for one of DEC’s Distributors.

So, suitably bored when my wife was watching J.R. Ewing being mischievous in the first UK run of “Dallas” on TV, I sat on the living room floor and penned this one page summary of the books major points. Just click it to download the PDF with my compliments. Or watch the author himself describe the model in under 14 minutes at an O’Reilly Strata Conference here. Or alternatively, go buy the latest edition of his book: Crossing the Chasm

My PA (when I ran Marketing Services at Demon Internet) redrew my hand-drawn sheet of A4 into the Microsoft Publisher document that output the one page PDF, and that i’ve referred to ever since. If you want a copy of the source file, please let me know — drop a request to: ian.waring@software-enabled.com.

That said, i’ve been far more inspired by the recent work of Simon Wardley. He effectively breaks a service into its individual components and positions each on a 2D map;  x-axis dictates the stage of the components evolution as it does through a Chasm-style lifecycle; the y-axis symbolises the value chain from raw materials to end user experience. You then place all the individual components and their linkages as part of an end-to-end service on the result. Having seen the landscape in this map form, then to assess how each component evolves/moves from custom build to commodity status over time. Even newest components evolve from chaotic genesis (where standards are not defined and/or features incomplete) to becoming well understood utilities in time.

The result highlights which service components need Agile, fast iterating discovery and which are becoming industrialised, six-sigma commodities. And once you see your map, you can focus teams and their measures on the important changes needed without breeding any contradictory or conflict-ridden behaviours. You end up with a well understood map and – once you overlay competitive offerings – can also assess the positions of other organisations that you may be competing with.

The only gotcha in all of this approach is that Simon hasn’t written the book yet. However, I notice he’s just provided a summary of his work on his Bits n Pieces Blog yesterday. See: Wardley Maps – set of useful Posts. That will keep anyone out of mischief for a very long time, but the end result is a well articulated, compelling strategy and the basis for a well thought out, go to market plan.

In the meantime, the basics on what is and isn’t working, and sussing out the important things to focus on, are core skills I can bring to bear for any software, channel-based or internet related business. I’m also technically literate enough to drag the supporting data out of IT systems for you where needed. Whether your business is an Internet-based startup or an established B2C or B2B Enterprise focussed IT business, i’d be delighted to assist.

Mobile Phone User Interfaces and Chinese Genius

Most of my interactions with the online world use my iPhone 6S Plus, Apple Watch, iPad Pro or MacBook – but with one eye on next big things from the US West Coast. The current Venture Capital fads being on Conversational Bots, Virtual Reality and Augmented Reality. I bought a Google Cardboard kit for my grandson to have a first glimpse of VR on his iPhone 5C, though spent most of the time trying to work out why his handset was too full to install any of the Cardboard demo apps; 8GB, 2 apps, 20 songs and the storage list that only added up to 5GB use. Hence having to borrow his Dad’s iPhone 6 while we tried to sort out what was eating up 3GB. Very impressive nonetheless.

The one device I’m waiting to buy is an Amazon Echo (currently USA only). It’s a speaker with six directional microphones, an Internet connection and some voice control smarts; these are extendable by use of an application programming interface and database residing in their US East Datacentre. Out of the box, you can ask it’s nom de plume “Alexa” to play a music single, album or wish list. To read back an audio book from where you last left off. To add an item to a shopping or to-do list. To ask about local outside weather over the next 24 hours. And so on.

It’s real beauty is that you can define your own voice keywords into what Amazon term a “Skill”, and provide your own plumbing to your own applications using what Amazon term their “Alexa Skill Kit”, aka “ASK”. There is already one UK Bank that have prototyped a Skill for the device to enquire their users bank balance, primarily as an assist to the visually impaired. More in the USA to control home lighting and heating by voice controls (and I guess very simple to give commands to change TV channels or to record for later viewing). The only missing bit is that of identity; the person speaking can be anyone in proximity to the device, or indeed any device emitting sound in the room; a radio presenter saying “Alexa – turn the heating up to full power” would not be appreciated by most listeners.

For further details on Amazon Echo and Alexa, see this post.

However, the mind wanders over to my mobile phone, and the disjointed experience it exposes to me when I’m trying to accomplish various tasks end to end. Data is stored in application silos. Enterprise apps quite often stop at a Citrix client turning your pocket supercomputer into a dumb (but secured) Windows terminal, where the UI turns into normal Enterprise app silo soup to go navigate.

Some simple client-side workflows can be managed by software like IFTTT – aka “IF This, Then That” – so I can get a new Photo automatically posted to Facebook or Instagram, or notifications issued to be when an external event occurs. But nothing that integrates a complete buying experience. The current fad for conversational bots still falls well short; imagine the workflow asking Alexa to order some flowers, as there are no visual cues to help that discussion and buying experience along.

For that, we’d really need to do one of the Jeff Bezos edicts – of wiping the slate clean, to imagine the best experience from a user perspective and work back. But the lessons have already been learnt in China, where desktop apps weren’t a path on the evolution of mobile deployments in society. An article that runs deep on this – and what folks can achieve within WeChat in China – is impressive. See: http://dangrover.com/blog/2016/04/20/bots-wont-replace-apps.html

I wonder if Android or iOS – with the appropriate enterprise APIs – could move our experience on mobile handsets to a similar next level of compelling personal servant. I hope the Advanced Development teams at both Apple and Google – or a startup – are already prototyping  such a revolutionary, notifications baked in, mobile user interface.

On the unusability of internal systems. Ugh!

Enterprise Apps - Notes Needed


Saw this picture alongside an excellent blog post today. Does this look familiar?

The company have probably spent many millions buying software to automate their business processes or to fulfil all manner of other objectives. But the User Interface and Operating Nuances are so involved, the poor user has to keep a notebook to hand to help navigate around the mess served to them. And they have to interact with their ultimate customers with a smile on their face, protecting them from the mess behind the scenes.

If that was served up on a phone handset, no consumer would touch it with the longest bargepole known to man. One of the things that plays on my mind is how to disrupt these vendors. Or the companies whose directors decide to buy this stuff and inflict this (and the associated costs) to their downstream customers.

Jon Barrett had a lot of the glue to sort this phenomenon with Digital’s Jabberwocky project back in the early 1990’s, with what amounted to be an Enterprise Software Bus with some basic screen scraping functionality. At least pilot users could string together some business process interactions atop those disparate applications that behaved in a way that today’s mobile phone users might have found a bit more palatable. It’s been a long time since, and little apparent progress.

In the meantime, the blog post by Leisa Reichelt is here. Well worth a read.

Footnote: within 12 hours of posting this, I read an excellent article here on the failure of a “Choose and Book” system on which over £300m was spent. Reading the drains up, it looks like a set of top level objectives were being pursued, but with no appreciation of the unwanted constraints being placed on the users of the resulting service, so the whole thing fell into disrepute. Like the old dutch proverb: “a ship on a beach is a lighthouse to the sea”.

Nadella: Heard what he said, knew what he meant

Satya Nadella

That’s a variation of an old “Two Ronnies” song in the guise of “Jehosaphat & Jones” entitled “I heard what she said, but knew what she meant” (words or three minutes into this video). Having read Satya Nadella’s Open Letter to employees issued at the start of Microsoft’s new fiscal year, I did think it was long. However, the real delight was reading Jean-Louis Gassee – previously the CTO of Apple – not only pulling it apart, but then having a crack at showing how it should have been written:


This is the beginning of our new FY 2015 – and of a new era at Microsoft. I have good news and bad news.The bad news is the old Devices and Services mantra won’t work. For example: I’ve determined we’ll never make money in tablets or smartphones.

So, do we continue to pretend we’re “all in” or do we face reality and make the painful decision to pull out so we can use our resources – including our integrity – to fight winnable battles? With the support of the Microsoft Board, I’ve chosen the latter.

We’ll do our utmost to minimize the pain that will naturally arise from this change. Specifically, we’ll offer generous transitions arrangements in and out of the company to concerned Microsoftians and former Nokians.

The good news is we have immense resources to be a major player in the new world of Cloud services and Native Apps for mobile devices.

We let the first innings of that game go by, but the sting energizes us. An example of such commitment is the rapid spread of Office applications – and related Cloud services – on any and all mobile devices. All Microsoft Enterprise and Consumer products/services will follow, including Xbox properties.

I realize this will disrupt the status quo and apologize for the pain to come. We have a choice: change or be changed.

Stay tuned.


Jean-Louis Gassee’s  full take-home on the original is provided here. Satya Nadella should hire him.

The Moving Target that is Enterprise IT infrastructures

Docker Logo

A flurry of recent Open Source Enterprise announcements, one relating to Docker – allowing Linux containers containing all their needed components to be built, distributed and then run atop Linux based servers. With this came the inference that Virtualisation was likely to get relegated to legacy application loads. Docker appears to have support right across the board – at least for Linux workloads – covering all the major public cloud vendors. I’m still unsure where that leaves the other niche that is Windows apps.

The next announcement was that of Apache Mesos, which is the software originally built by ex-Google Twitter engineers – largely the replicate the Google Borg software used to fire up multi-server workloads across Google’s internal infrastructure. This used to good effect to manage Twitters internal infrastructure and to consign their “Fail Whale” to much rarer appearances. At the same time, Google open sourced a version of their software – I’ve not yet made out if it’s derived from the 10+ year old Borg or more recent Omega projects – to do likewise, albeit at smaller scale than Google achieve inhouse. The one thing that bugs me is that I can never remember it’s name (i’m off trying to find reference to it again – and now I return 15 minutes later!).

“Google announced Kubernetes, a lean yet powerful open-source container manager that deploys containers into a fleet of machines, provides health management and replication capabilities, and makes it easy for containers to connect to one another and the outside world. (For the curious, Kubernetes (koo-ber-nay’-tace) is Greek for “helmsman” of a ship)”.

That took some finding. Koo-ber-nay-tace. No exactly memorable.

However, it looks like it’ll be a while before these packaging, deployment and associated management technologies get ingrained in Enterprise IT workloads. A lot of legacy systems out there are simply not architected to run on scale-out infrastructures yet, and it’s a source of wonder what the major Enterprise software vendors are running in their own labs. If indeed they have an appetite to disrupt themselves before others attempt to.

I still cringe with how one ERP system I used to use had the cost collection mechanisms running as a background batch process, and the margins of the running business went all over the place like a skidding car as orders were loaded. Particularly at end of quarter customer spend spikes, where the complexity of relational table joins had a replicated mirror copy of the transaction system consistently running 20-25 minutes behind the live system. I should probably cringe even more given there’s no obvious attempt by startups to fundamentally redesign an ERP system from the ground up using modern techniques. At least yet.

Startups appear to be much more heavily focussed on much lighter mobile based applications – of which there are a million different bets chasing VC money. Moving Enterprise IT workloads into much more cost effective (but loosely coupled) public cloud based infrastructure – and that take full advantage of its economics – is likely to take a little longer. I sometimes agonise over what change(s) would precipitate that transition – and whether that’s a monolith app, or a network of simple ones daisy chained together.

I think we need a 2014 networked version of Silicon Office or Hypercard to trigger some progress. Certainly their abject simplicity is no more, and we’re consigned to the lower level, piecemeal building bricks – like JavaScript – which is what life was like in assembler before high level languages liberated us. Some way to go.

Death of the Web Home Page. What replaces it??

Go Back You Are Going Wrong Way Sign

One of the gold nuggets on the “This week in Google” podcast this week was that some US News sites historically had 20% of their web traffic coming in through their front door home page. 80% of their traffic arrived from links elsewhere that landed on individual articles deep inside their site. More recently, that has dropped to 10%.

If they’re anything like my site, only a small proportion of these “deep links” will come from search engine traffic (for me, search sources account for around 20% of traffic most days). Of those that do, many arrive searching for something more basic than what I have for them here. By far my most popular “accident” is my post about “Google: where did I park my car?”. This is a feature of Google Now on my Nexus 5 handset, but I guess many folks are just tapping that query into Google’s search box absolutely raw (and raw Google will be clueless – you need a handset reporting your GPS location and the fact it sensed your transition from driving to walking for this to work). My second common one is people trying to see if Tesco sell the Google Chromecast, which invariably lands on me giving a demo of Chromecast working with a Tesco Hudl tablet.

My major boosts in traffic come when someone famous spots a suitably tagged Twitter or LinkedIn article that appears topical. My biggest surge ever was when Geoffrey Moore, author of “Crossing the Chasm”, mentioned my one page PDF that summarised his whole book on LinkedIn. The second largest when my post that congratulated Apple for the security depth in their CloudKit API, as a fresh change to the sort of shenanigans that several UK public sector data releases violate, appeared on the O’Reilly Radar blog. Outside of those two, I bump along at between 50-200 reads per day, driven primarily by my (in)ability to tag posts on social networks well enough to get flashes of attention.

10% coming through home pages though; that haunts me a bit. Is that indicative of a sea change to single, simple task completion by a mobile app? Or that content is being littered around in small, single article chunks, much like the music industry is seeing a transition from Album Compilations to Singles? I guess one example is this weeks purchase of Songza by Google – and indeed Beats by Apple – giving both companies access to curated playlists. Medium is one literary equivalent, as is Longreads. However, I can’t imagine their existence explains the delta between searches and targeted landing directly into your web site.

So, if a home page is no longer a valid thing to have, what takes it’s place? Ideas or answers on a postcard (or comment here) please!

Explaining Distributed Data Consistency to IT novices? Well, …

Greek Shepherd

it’s all greek to me. Bruce Stidston cited a post on Google+ where Yonatan Zunger, Chief Architect of Google+, tried to explain Data Consistency by way of Greeks enacting laws onto statute books on disparate islands. Very long post here. It highlights the challenges of maintaining data consistency when pieces of your data are distributed over many locations, and the logistics of trying to keep them all in sync – in a way that should be understandable to the lay – albeit patient – reader.

The treatise missed out the concept of two-phased commit, which is a way of doing handshakes between two (identical copies) of a database to ensure a transaction gets played successfully on both the master and the replica sited elsewhere on a network. So, if you get some sort of failure mid transaction, both sides get returned to a consistent state without anything going down the cracks. Important if that data is monetary balance transfers between bank accounts for example.

The thing that impressed me most – and which i’d largely taken for granted – is how MongoDB (the most popular Open Source NoSQL Database in the world) can handle virtually all the use cases cited in the article out of the box, with no add-ons. You can specify “happy go lucky”, majority or all replicas consistent before confirming write completion. And if a definitive “Tyrant” fails, there’s an automatic vote among the surviving instances for which secondary copy becomes the new primary (and on rejoining, the changes are journaled back to consistency). And those instances can be distributed in different locations on the internet.

Bruce contended that Google may not like it’s blocking mechanics (which will slow down access while data is written) to retain consistency on it’s own search database. However, I think Google will be very read heavy, and it won’t usually be a disaster if changes are journaled onto new Google search results to its readers. No money to go between the cracks in their case, any changes just appear the next time you enact the same search; one very big moving target.

Ensuring money doesn’t go down the cracks is what Blockchains design out (majority votes, then change declines to update attempts after that’s achieved). That’s why it can take up to 10 minutes for a Bitcoin transaction to get verified. I wrote introductory pieces about Bitcoin and potential Blockchain applications some time back if those are of interest.

So, i’m sure there must be a more pithy summary someone could draw, but it would add blockchains to the discussion, and probably relate some of the artistry behind hashes and Git/Github to manage large, multiuser, multiple location code, data and writing projects. However, that’s for the IT guys. They should know this stuff, and know what to apply in any given business context.

Footnote: I’ve related MongoDB as that is the one NoSQL database I have accreditations in, having completed two excellent online courses with them (while i’m typically a senior manager, I like to dip into new technologies to understand their capabilities – and to act as a bullshit repellent!). Details of said courses here. The same functionality may well be available with other NoSQL databases.

Starting with the end in mind: IT Management Heat vs Light

A very good place to startOne source of constant bemusement to me is the habit of intelligent people to pee in the industry market research bathwater, and then to pay handsomely to drink a hybrid mix of the result collected across their peers.

Perhaps betrayed by an early experience of one research company coming in to present to the management of the vendor I was working at, and finding in the rehearsal their conjecture that sales of specific machine sizes had badly dipped in the preceding quarter. Except they hadn’t; we’d had the biggest growth in sales of the highlighted machines in our history in that timeframe. When I mentioned my concern, the appropriate slides were corrected in short order, and no doubt the receiving audience impressed with the skill in their analysis that built a forecast starting with an amazingly accurate, perceptive (and otherwise publicly unreported) recent history.

I’ve been doubly nervous ever since – always relating back to the old “Deep Throat” hints given in “All the Presidents Men” – that of, in every case, “to follow the money”.

Earlier today, I was having some banter on one of the boards of “The Motley Fool” which referenced the ways certain institutions were imposing measures on staff – well away from a useful business use that positively supported better results for their customers. Well, except of providing sound bites to politicians. I can sense that in Education, in some elements of Health provision, and rather fundamentally in the Police service. I’ve even done a drains-up some time ago that reflected on the way UK Police are measured, and tried trace the rationale back to source – which was a senior politician imploring them to reduce crime; blog post here. The subtlety of this was rather lost; the only control placed in their hands was that of compiling the associated statistics, and to make their behaviours on the ground align supporting that data collection, rather than going back to core principles of why they were there, and what their customers wanted of them.

Jeff Bezos (CEO of Amazon) has the right idea; everything they do aligns with the ultimate end customer, and everything else works back from there. Competition is something to be conscious of, but only to the extent of understanding how you can serve your own customers better. Something that’s also the central model that W. Edwards Deming used to help transform Japanese Industry, and in being disciplined to methodically improve “the system” without unnecessary distractions. Distractions which are extremely apparent to anyone who’s been subjected to his “Red Beads” experiment. But the central task is always “To start with the end in mind”.

With that, I saw a post by Simon Wardley today where Gartner released the results of a survey on “Top 10 Challenges for I&O Leaders”, which I guess is some analogue of what used to be referred to as “CIOs”. Most of which felt to me like a herd mentality – and divorced from the sort of issues i’d have expected to be present. In fact a complete reenactment of this sort of dialogue Simon had mentioned before.

Simon then cited the first 5 things he thought they should be focussed on (around Corrective Action), leaving the remainder “Positive Action” points to be mapped based on that appeared upon that foundation. This in the assumption that those actions would likely be unique to each organisation performing the initial framing exercise.

Simon’s excellent blog post is: My list vs Gartner, shortly followed by On Capabilities. I think it’s a great read. My only regret is that, while I understand his model (I think!), i’ve not had to work on the final piece between his final strategic map (for any business i’m active in) and articulating a pithy & prioritised list of actions based on the diagram created. And I wish he’d get the bandwidth to turn his Wardley Maps into a Book.

Until then, I recommend his Bits & Pieces Blog; it’s a quality read that deserves good prominence on every IT Manager’s (and IT vendors!) RSS feed.

CloudKit – now that’s how to do a secure Database for users

Data Breach Hand Brick Wall Computer

One of the big controversies here relates to the appetite of the current UK government to release personal data with the most basic understanding of what constitutes personal identifiable information. The lessons are there in history, but I fear without knowing the context of the infamous AOL Data Leak, that we are destined to repeat it. With it goes personal information that we typically hold close to our chests, which may otherwise cause personal, social or (in the final analysis) financial prejudice.

When plans were first announced to release NHS records to third parties, and in the absence of what I thought were appropriate controls, I sought (with a heavy heart) to opt out of sharing my medical history with any third party – and instructed my GP accordingly. I’d gladly share everything with satisfactory controls in place (medical research is really important and should be encouraged), but I felt that insufficient care was being exercised. That said, we’re more than happy for my wife’s Genome to be stored in the USA by 23andMe – a company that demonstrably satisfied our privacy concerns.

It therefore came as quite a shock to find that a report, highlighting which third parties had already been granted access to health data with Government mandated approval, ran to a total 459 data releases to 160 organisations (last time I looked, that was 47 pages of PDF). See this and the associated PDFs on that page. Given the level of controls, I felt this was outrageous. Likewise the plans to release HMRC related personal financial data, again with soothing words from ministers in whom, given the NHS data implications, appear to have no empathy for the gross injustices likely to result from their actions.

The simple fact is that what constitutes individual identifiable information needs to be framed not only with what data fields are shared with a third party, but to know the resulting application of that data by the processing party. Not least if there is any suggestion that data is to be combined with other data sources, which could in turn triangulate back to make seemingly “anonymous” records traceable back to a specific individual.Which is precisely what happened in the AOL Data Leak example cited.

With that, and on a somewhat unrelated technical/programmer orientated journey, I set out to learn how Apple had architected it’s new CloudKit API announced this last week. This articulates the way in which applications running on your iPhone handset, iPad or Mac had a trusted way of accessing personal data stored (and synchronised between all of a users Apple devices) “in the Cloud”.

The central identifier that Apple associate with you, as a customer, is your Apple ID – typically an email address. In the Cloud, they give you access to two databases on their cloud infrastructure; one a public one, the other private. However, the second you try to create or access a table in either, the API accepts your iCloud identity and spits back a hash unique to your identity and the application on the iPhone asking to process that data. Different application, different hash. And everyone’s data is in there, so it’s immediately unable to permit any triangulation of disparate data that can trace back to uniquely identify a single user.

Apple take this one stage further, in that any application that asks for any personal identifiable data (like an email address, age, postcode, etc) from any table has to have access to that information specifically approved by the handset owners end user; no explicit permission (on a per application basis), no data.

The data maintained by Apple, besides holding personal information, health data (with HealthKit), details of home automation kit in your house (with HomeKit), and not least your credit card data stored to buy Music, Books and Apps, makes full use of this security model. And they’ve dogfooded it so that third party application providers use exactly the same model, and the same back end infrastructure. Which is also very, very inexpensive (data volumes go into Petabytes before you spend much money).

There are still some nuances I need to work. I’m used to SQL databases and to some NoSQL database structures (i’m MongoDB certified), but it’s not clear, based on looking at the way the database works, which engine is being used behind the scenes. It appears to be a key:value store with some garbage collection mechanics that look like a hybrid file system. It also has the capability to store “subscriptions”, so if specific criteria appear in the data store, specific messages can be dispatched to the users devices over the network automatically. Hence things like new diary appointments in a calendar can be synced across a users iPhone, iPad and Mac transparently, without the need for each to waste battery power polling the large database on the server waiting for events that are likely to arrive infrequently.

The final piece of the puzzle i’ve not worked out yet is, if you have a large database already (say of the calories, carbs, protein, fat and weights of thousands of foods in a nutrition database), how you’d get that loaded into an instance of the public database in Apple’s Cloud. Other that writing custom loading code of course!

That apart, really impressed how Apple have designed the datastore to ensure the security of users personal data, and to ensure an inability to triangulate data between information stored by different applications. And that if any personal identifiable data is requested by an application, that the user of the handset has to specifically authorise it’s disclosure for that application only. And without the app being able to sense if the data is actually present at all ahead of that release permission (so, for example, if a Health App wants to gain access to your blood sampling data, it doesn’t know if that data is even present or not before the permission is given – so the app can’t draw inferences on your probably having diabetes, which would be possible if it could deduce if it knew that you were recording glucose readings at all).

In summary, impressive design and a model that deserves our total respect. The more difficult job will be to get the same mindset in the folks looking to release our most personal data that we shared privately with our public sector servants. They owe us nothing less.