Future Health: DNA is one thing, but 90% of you is not you


One of my pet hates is seeing my wife visit the doctor, getting hunches of what may be afflicting her health, and this leading to a succession of “oh, that didn’t work – try this instead” visits for several weeks. I just wonder how much cost could be squeezed out of the process – and lack of secondary conditions occurring – if the root causes were much easier to identify reliably. I then wonder if there is a process to achieve that, especially in the context of new sensors coming to market and their connectivity to databases via mobile phone handsets – or indeed WiFi enabled, low end Bluetooth sensor hubs aka the Apple Watch.

I’ve personally kept a record of what i’ve eaten, down to fat, protein and carb content (plus my Monday 7am weight and daily calorie intake) every day since June 2002. A precursor to the future where devices can keep track of a wide variety of health signals, feeding a trend (in conjunction with “big data” and “machine learning” analyses) toward self service health. My Apple Watch has a years worth of heart rate data. But what signals would be far more compelling to a wider variety of (lack of) health root cause identification if they were available?

There is currently a lot of focus on Genetics, where the Human Genome can betray many characteristics or pre-dispositions to some health conditions that are inherited. My wife Jane got a complete 23andMe statistical assessment several years ago, and has also been tested for the BRCA2 (pronounced ‘bracca-2’) gene – a marker for inherited pre-disposition to risk of Breast Cancer – which she fortunately did not inherit from her afflicted father.

A lot of effort is underway to collect and sequence the complete Genome sequences from the DNA of hundreds of thousands of people, building them into a significant “Open Data” asset for ongoing research. One gotcha is that such data is being collected by numerous organisations around the world, and the size of each individuals DNA (assuming one byte to each nucleotide component – A/T or C/G combinations) runs to 3GB of base pairs. You can’t do research by throwing an SQL query (let alone thousands of machine learning attempts) over that data when samples are stored in many different organisations databases, hence the existence of an API (courtesy of the GA4GH Data Working Group) to permit distributed queries between co-operating research organisations. Notable that there are Amazon Web Services and Google employees participating in this effort.

However, I wonder if we’re missing a big and potentially just as important data asset; that of the profile of bacteria that everyone is dependent on. We are each home to approx. 10 trillion human cells among the 100 trillion microbial cells in and on our own bodies; you are 90% not you.

While our human DNA is 99.9% identical to any person next to us, the profile of our MicroBiome are typically only 10% similar; our age, diet, genetics, physiology and use of antibiotics are also heavy influencing factors. Our DNA is our blueprint; the profile of the bacteria we carry is an ever changing set of weather conditions that either influence our health – or are leading indicators of something being wrong – or both. Far from being inert passengers, these little organisms play essential roles in the most fundamental processes of our lives, including digestion, immune responses and even behaviour.

Different MicroBiome ecosystems are present in different areas of our body, from our skin, mouth, stomach, intestines and genitals; most promise is currently derived from the analysis of stool samples. Further, our gut is only second to our brain in the number of nerve endings present, many of them able to enact activity independently from decisions upstairs. In other areas, there are very active hotlines between the two nerve cities.

Research is emerging that suggests previously unknown links between our microbes and numerous diseases, including obesity, arthritis, autism, depression and a litany of auto-immune conditions. Everyone knows someone who eats like a horse but is skinny thin; the composition of microbes in their gut is a significant factor.

Meanwhile, costs of DNA sequencing and compute power have dropped to a level where analysis of our microbe ecosystems costs from $100M a decade ago to some $100 today. It should continue on that downward path to a level where personal regular sampling could become available to all – if access to the needed sequencing equipment plus compute resources were more accessible and had much shorter total turnaround times. Not least to provide a rich Open Data corpus of samples that we can use for research purposes (and to feed back discoveries to the folks providing samples). So, what’s stopping us?

Data Corpus for Research Projects

To date, significant resources are being expended on Human DNA Genetics and comparatively little on MicroBiome ecosystems; the largest research projects are custom built and have sampling populations of less than 4000 individuals. This results in insufficient population sizes and sample frequency on which to easily and quickly conduct wholesale analyses; this to understand the components of health afflictions, changes to the mix over time and to isolate root causes.

There are open data efforts underway with the American Gut Project (based out of the Knight Lab in the University of San Diego) plus a feeder “British Gut Project” (involving Tim Spector and staff at University College London). The main gotcha is that the service is one-shot and takes several months to turn around. My own sample, submitted in January, may take up 6 months to work through their sequencing then compute batch process.

In parallel, VC funded company uBiome provide the sampling with a 6-8 week turnaround (at least for the gut samples; slower for the other 4 area samples we’ve submitted), though they are currently not sharing the captured data to the best of my knowledge. That said, the analysis gives an indication of the names, types and quantities of bacteria present (with a league table of those over and under represented compared to all samples they’ve received to date), but do not currently communicate any health related findings.

My own uBiome measures suggest my gut ecosystem is more diverse than 83% of folks they’ve sampled to date, which is an analogue for being more healthy than most; those bacteria that are over represented – one up to 67x more than is usual – are of the type that orally administered probiotics attempt to get to your gut. So a life of avoiding antibiotics whenever possible appears to have helped me.

However, the gut ecosystem can flex quite dramatically. As an example, see what happened when one person contracted Salmonella over a three pay period (the green in the top of this picture; x-axis is days); you can see an aggressive killing spree where 30% of the gut bacteria population are displaced, followed by a gradual fight back to normality:

Salmonella affecting MicroBiome PopulationUnder usual circumstances, the US/UK Gut Projects and indeed uBiome take a single measure and report back many weeks later. The only extra feature that may be deduced is the delta between counts of genome start and end sequences, as this will give an indication to the relative species population growth rates from otherwise static data.

I am not aware of anyone offering a faster turnaround service, nor one that can map several successively time gapped samples, let alone one that can convey health afflictions that can be deduced from the mix – or indeed from progressive weather patterns – based on the profile of bacteria populations found.

My questions include:

  1. Is there demand for a fast turnaround, wholesale profile of a bacterial population to assist medical professionals isolating a indicators – or the root cause – of ill health with impressive accuracy?
  2. How useful would a large corpus of bacterial “open data” be to research teams, to support their own analysis hunches and indeed to support enough data to make use of machine learning inferences? Could we routinely take samples donated by patients or hospitals to incorporate into this research corpus? Do we need the extensive questionnaires the the various Gut Projects and uBiome issue completed alongside every sample?
  3. What are the steps in the analysis pipeline that are slowing the end to end process? Does increased sample size (beyond a small stain on a cotton bud) remove the need to enhance/copy the sample, with it’s associated need for nitrogen-based lab environments (many types of bacteria are happy as Larry in the Nitrogen of the gut, but perish with exposure to oxygen).
  4. Is there any work active to make the QIIME (pronounced “Chime”) pattern matching code take advantage of cloud spot instances, inc Hadoop or Spark, to speed the turnaround time from Sequencing reads to the resulting species type:volume value pairs?
  5. What’s the most effective delivery mechanism for providing “Open Data” exposure to researchers, while retaining the privacy (protection from financial or reputational prejudice) for those providing samples?
  6. How do we feed research discoveries back (in English) to the folks who’ve provided samples and their associated medical professionals?

New Generation Sequencing works by splitting DNA/RNA strands into relatively short read lengths, which then need to be reassembled against known patterns. Taking a poop sample with contains thousands of different bacteria is akin to throwing the pieces of many thousand puzzles into one pile and then having to reconstruct them back – and count the number of each. As an illustration, a single HiSeq run may generate up to 6 x 10^9 sequences; these then need reassembling and the count of 16S rDNA type:quantity value pairs deduced. I’ve seen estimates of six thousand CPU hours to do the associated analysis to end up with statistically valid type and count pairs. This is a possible use case for otherwise unused spot instance capacity at large cloud vendors if the data volumes could be ingested and processed cost effectively.

Nanopore sequencing is another route, which has much longer read lengths but is much more error prone (1% for NGS, typically up to 30% for portable Nanopore devices), which probably limits their utility for analysing bacteria samples in our use case. Much more useful if you’re testing for particular types of RNA or DNA, rather than the wholesale profiling exercise we need. Hence for the time being, we’re reliant on trying to make an industrial scale, lab based batch process turn around data as fast we are able – but having a network accessible data corpus and research findings feedback process in place if and when sampling technology gets to be low cost and distributed to the point of use.

The elephant in the room is in working out how to fund the build of the service, to map it’s likely cost profile as technology/process improvements feed through, and to know to what extent it’s diagnosis of health root causes will improve it’s commercial attractiveness as a paid service over time. That is what i’m trying to assess while on the bench between work contracts.

Other approaches

Nature has it’s way of providing short cuts. Dogs have been trained to be amazingly prescient at assessing whether someone has Parkinson’s just by smelling their skin. There are other techniques where a pocket sized spectrometer can assess the existence of 23 specific health disorders. There may well be other techniques that come to market that don’t require a thorough picture of a bacterial population profile to give medical professionals the identity of the root causes of someone’s ill health. That said, a thorough analysis may at least be of utility to the research community, even if we get to only eliminate ever rarer edge cases as we go.

Coming full circle

One thing that’s become eerily apparent to date is some of the common terminology between MicroBiome conditions and terms i’ve once heard used by Chinese Herbal Medicine (my wife’s psoriasis was cured after seeing a practitioner in Newbury for several weeks nearly 20 years ago). The concept of “balance” and the existence of “heat” (betraying the inflammation as your bacterial population of different species ebbs and flows in reaction to different conditions). Then consumption or application of specific plant matter that puts the bodies bacterial population back to operating norms.

Lingzhi Mushroom

Wild mushroom “Lingzhi” in China: cultivated in the far east, found to reduce Obesity

We’ve started to discover that some of the plants and herbs used in Chinese Medicine do have symbiotic effects on your bacterial population on conditions they are reckoned to help cure. With that, we are starting to see some statistically valid evidence that Chinese and Western medicine may well meet in the future, and be part of the same process in our future health management.

Until then, still work to do on the business plan.

Hey (you): Keep it short, use a name, profit

Bang!

Seeing various bits and bobs about writing better emails today (or getting attention for your words among the surroundings in a typical email inbox). One from KissMetrics that advises keeping subject lines to 35 characters (which means that the full text of the subject fits on an Apple iPhone screen) and to start off with the recipients name. More (in multiple subjects squeezed together and a few sample short bits of email subject line click bait) here.

I had an account manager from AWS ask me where the follow facility was on my blog, and i’ve realised there is no easy link – so i’m currently building one using the very impressive Mailchimp. This has an associated WordPress Plugin which appears to have many 5* reviews and a vast majority of the support posts with quick answers. So, a small project to finish this weekend, and a side use to boot a very short message to some of my LinkedIn contacts using the same facility.

That apart, i’ve done my share of reading to try to improve my own writing. All the way from “Write Like the Pros: Using the Secrets of Ad Writers and Journalists in Business” by Mark Bacon, to revising from “How to Write Sales Letters that Sell” by Drayton Bird. Even to buying and listening to the three videos in Drayton Birds “How to Write (and Persuade) Better“. I hope it shows!

In the meantime, I notice Drayton’s off on a rant about a menu he’s been subjected to at a Restaurant up the Shard today. Typical Drayton, though he’s got a lot more mild since he phoned up the CEO’s office at Thus (ScottishTelecom as was, before being subsumed into Cable & Wireless, then Vodafone). I wouldn’t dare repeat what he said, but it caused some immediate impact, and he got some business out of being so explicit at the time. I just cringed.

Officially Certified: AWS Business Professional

AWS Business Professional Certification

That’s added another badge, albeit the primary reason was to understand AWS’s products and services in order to suss how to build volumes via resellers for them – just in case I can get the opportunity to be asked how i’d do it. However, looking over the fence at some of the technical accreditation exams, I appear to know around half of the answers there already – but need to do those properly and take notes before attempting those.

(One of my old party tricks used to be that I could make it past the entrance exam required for entry into technical streams at Linux related conferences – a rare thing for a senior manager running large Software Business Operations or Product Marketing teams. Being an ex programmer who occasionally fiddles under the bonnet on modern development tools is a useful thing – not least to feed an ability to be able to spot bullshit from quite a distance).

The only AWS module I had any difficulty with was the pricing. One of the things most managers value is simplicity and predictability, but a lot of the pricing of core services have pricing dependencies where you need to know data sizes, I/O rates or the way your demand goes through peaks and troughs in order to arrive at an approximate monthly price. While most of the case studies amply demonstrate that you do make significant savings compared to running workloads on your own in-house infrastructure, I guess typical values for common use cases may be useful. For example, if i’m running a SAP installation of specific data and access dimensions, what operationally are typically running costs – without needing to insert probes all over a running example to estimate it using the provided calculator?

I’d come back from a 7am gym session fairly tired and made the mistake of stepping through the pricing slides without making copious notes. I duly did all that module again and did things properly the next time around – and passed it to complete my certification.

The lego bricks you snap together to design an application infrastructure are simple in principle, loosely connected and what Amazon have built is very impressive. The only thing not provided out of the box is the sort of simple developer bundle of an EC2 instance, some S3 and MySQL based EBD, plus some open source AMIs preconfigured to run WordPress, Joomla, Node.js, LAMP or similar – with a simple weekly automatic backup. That’s what Digital Ocean provide for a virtual machine instance, with specific storage and high Internet Transfer Out limits for a fixed price/month. In the case of the WordPress network on which my customers and this blog runs, that’s a 2-CPU server instance, 40GB of disk space and 4TB/month data traffic for $20/month all in. That sort of simplicity is why many startup developers have done an exit stage left from Rackspace and their ilk, and moved to Digital Ocean in their thousands; it’s predictable and good enough as an experimental sandpit.

The ceiling at AWS is much higher when the application slips into production – which is probably reason enough to put the development work there in the first place.

I have deployed an Amazon Workspace to complete my 12 years of Nutrition Data Analytics work using the Windows-only Tableau Desktop Professional – in an environment where I have no Windows PCs available to me. Just used it on my MacBook Air and on my iPad Mini to good effect. That will cost be just north of £21 ($35) for the month.

I think there’s a lot that can be done to accelerate adoption rates of AWS services in Enterprise IT shops, both in terms of direct engagement and with channels to market properly engaged. My real challenge is getting air time with anyone to show them how – and in the interim, getting some examples ready in case I can make it in to do so.

That said, I recommend the AWS training to anyone. There is some training made available the other side of applying to be a member of the Amazon Partner Network, but there are equally some great technical courses that anyone can take online. See http://aws.amazon.com/training/ for further details.

Coding for Young Kids: two weeks, only £10,000 to go

ScratchJr Screenshot

ScratchJr Logo

I’m one backer on Kickstarter of a project to bring the programming language Scratch to 5-7 year olds. Called ScratchJr, it’s already showing great promise on iPads in schools in Massachussetts. The project has already surpassed it’s original $25,000 goal to finish it’s development for the iPad, and last week made it over the $55,000 stretch goal to release an Android version too. With two weeks to go, we are some $15,000 short of the last remaining stretch target ($80,000) needed to fund associated curriculum and teaching notes.

The one danger of tablets is that they tend to be used in “lean back” applications, primarily as a media consumer delivery devices. Hence a fear amongst some teachers that we’re facing a “Disneyfication” of use, almost like teaching people to read, but not to write. ScratchJr will give young students their first exposure to the joy of programming; not only useful for a future in IT, but also providing logic and design skills useful for many other fields that may stimulate their interest. I thought the 7-year old kids in this video were brilliant and authoritative on what they’d achieved to date:

I opted to pledge $45 to contribute and to buy a branded project t-shirt for my 2 year old granddaughter; there are a range of other funding options:

  • $5 for an email from the ScratchJr Cat
  • $10 for your name in the credits
  • $20 for a ScratchJr Colouring Book
  • $35 for some ScratchJr Stickers
  • $40 (+$5 for outside USA delivery) ScratchJr T-Shirt (Kid or Adult sizes)
  • $50 for an invite to a post launch webinar
  • $100 for a pre-launch webinar with the 2 project leaders
  • $300 to receive a beta version ahead of the public launch
  • $500 for a post-launch workshop in the Boston, Mass area
  • $1000+ for a pre-launch workshop in the Boston, Mass area
  • $2000+ to be named as a Platinum Sponsor in the Credits
  • $5000+ for lunch for up to 4 people with the 2 Project Leaders

I once had a project earlier in my career where we managed to get branded teaching materials (about “The Internet”) professionally produced and used in over 95% of UK secondary schools for an investment of £50,000 – plus a further £10,000 to pay for individual and school prizes. In that context, the price of this program is an absolute steal, and I feel well worth every penny. Being able to use this across the full spectrum of Primary Schools in the UK would be fantastic if teachers here could take full advantage of this great work.

So, why not join the backers? Deadline for pledges is 30th April, so please be quick! If you’d like to do so, contributions can be pledged here on Kickstarter.

ScratchJr Logo

Footnote: a TED video that covers Project Leaders Mitch Resnick’s earlier work on Scratch (taught to slightly older kids) lasts 15 minutes and can be found here. Scratch is also available for the Raspberry Pi; for a 10 minute course on how to code in it, i’d recommend this from Miles Berry of Roehampton University.

Gute Fahrt – 3 simple tips to make your drivers safer

Gute Fahrt - Safe Journey

That’s German for “Safe Journey”. Not directly related to computers or the IT industry, but a symptom of the sort of characters it attracts to the Sales ranks. A population of relatively young drivers of fairly expensive and quite powerful cars. In our case, one female manager in Welwyn who took it as a personal affront to be overtaken in her BMW. Another Salesman in Bristol, driving his boss to a meeting in Devon in his new Ford Capri 2.8 Injection; the mistake was to leave very late and to be told by his Manager to “step on it”. I think he’s still trying to remove the stain from the passenger seat.

With that, the rather inevitable bad accident statistics, not least as statistics suggest that 90% of drivers think they are better than average. As a result, every driver in the company got put on a mandatory one day course, an attempt to stem that tide. The first thing that surprised me that the whole one day course was spent in a classroom, and not a single minute driving a car. But the end result of attending that one class was very compelling.

As was the business change example given previously (in http://www.ianwaring.com/2014/03/24/jean-louis-gassee-priorities-targets-and-aims/), there were only three priorities that everyone followed to enact major changes – and to lower the accident rate considerably. In doing so, even my wife noticed subtle changes the very next time she rode in our car with me (a four hour family trip to Cornwall).

The three fundamentals were:

  1. Stay at least 2 seconds behind the car in front, independent of your speed. Just pick any fixed roadside object that the car in front goes past, and recite “only a fool breaks the two second rule”. As long as you haven’t passed the same object by the time you’ve finished reciting that in your mind, you’re in good shape. In rain, make that 4 seconds.
  2. If you’re stationary and waiting to turn right in the UK (or turning left in countries that drive on the right hand side of the road), keep the front wheels of your car facing directly forward. Resist all urges to point the wheels toward the road you’re turning into. A big cause of accidents is being rear-ended by a car behind you. If your front wheels are straight, you will just roll straight down the road; if turned, you’ll most likely to fund yourself colliding head on with fast oncoming traffic.
  3. Chill. Totally. Keep well away from other drivers who are behaving aggressively or taking unnecessary risks. Let them pass, keep out of the way and let them have their own accidents without your involvement. If you feel aggrieved, do not give chase; it’s an unnecessary risk to you, and if you catch them, you’ll only likely embarrass yourself and others. And never, ever seek solace in being able to prove blame; if you get to the stage when you’re trying to argue who’s fault it is, you’ve lost already. Avoid having the accident in the first place.

There were supplementary videos to prove the points, including the customary “spot the lorry drivers cab after the one at the back ran into another in front”. But the points themselves were easy to remember. After the initial running of the course in the branch office with the worst accident statistics, they found:

  • The accident rate effectively went to zero in the first three months since that course was run
  • The number of “unattended” accidents – such as those alleged in car parks when the driver was not present – also dropped like a stone. Someone telling porkie pies before!
  • As a result, overall costs reduced at the same time as staff could spend more face time with customers

That got replicated right across the company. If in doubt, try it. I bet everyone else who rides with you will notice – and feel more relaxed and comfortable by you doing so.

Focus on End Users: a flash of the bleeding obvious

Lightbulb

I’ve been re-reading Terry Leahy’s “Management in 10 Words”; Sir Terry was the CEO of Tesco until recently. I think the piece in the book introduction relating to sitting in front of some Government officials was quite funny – if it weren’t a blinding dose of the obvious that most IT organisations miss:

He was asked “What was it that turned Tesco from being a struggling supermarket, number three retail chain in the UK, into the third largest retailer in the World?”. He said: “It’s quite simple. We focussed on delivering for customers. We set ourselves some simple aims, and some basic values to live by. And we then created a process to achieve them, making sure that everyone knew what they were responsible for”.

Silence. Polite coughing. Someone poured out some water. More silence. “Was that it?” an official finally asked. And the answer to that was ‘yes’.

The book is a good read and one we can all learn from. Not least as many vendors in the IT and associated services industry and going in exactly the opposite direction compared to what he did.

I was listening to a discussion contrasting the different business models of Google, Facebook, Microsoft and Apple a few days back. The piece I hadn’t rationalised before is that of this list, only Apple have a sole focus on the end user of their products. Google and Facebook’s current revenue streams are in monetising purchase intents to advertisers, while trying to not dissuade end users from feeding them the attention and activity/interest/location signals to feed their business engines. Microsoft’s business volumes are heavily skewed towards selling software to Enterprise IT departments, and not the end users of their products.

One side effect of this is an insatiable need focus on competition rather than on the user of your products or services. In times of old, it became something of a relentless joke that no marketing plan would be complete without the customary “IBM”, “HP” or “Sun” attack campaign in play. And they all did it to each other. You could ask where the users needs made it into these efforts, but of the many I saw, I don’t remember a single one of those featured doing so at all. Every IT vendor was playing “follow the leader” (and ignoring the cliffs they may drive over while doing so), where all focus should have been on your customers instead.

The first object lesson I had was with the original IBM PC. One of the biggest assets IBM had was the late Philip “Don” Estridge, who went into the job running IBM’s first foray into selling PCs having had personal experience of running an Apple ][ personal computer at home. The rest of the industry was an outgrowth of a hobbyist movement trying to sell to businesses, and business owners craved “sorting their business problems” simply and without unnecessary surprises. Their use of Charlie Chaplin ads in their early years was a masterstroke. As an example, spot the competitive knockoff in this:

There isn’t one! It’s a focus on the needs of any overworked small business owner, where the precious asset is time and business survival. Trading blows trying to sell one computer over another completely missing.

I still see this everywhere. I’m a subscriber to “Seeking Alpha“, which has a collection of both buy-side and sell-side analysts commentating on the shares of companies i’ve chosen to watch. More often than not, it’s a bit like sitting in an umpires chair during a tennis match; lots of noise, lots of to-and-fro, discussions on each move and never far away from comparing companies against each other.

One of the most prescient things i’ve heard a technology CEO say was from Steve Jobs, when he told an audience in 1997 that “We have to get away from the notion that for Apple to win, Microsoft have to lose”. Certainly, from the time the first iPhone shipped onwards, Apple have had a relentless focus on the end user of their products.

Enterprise IT is still driven largely by vendor inspired fads and with little reference to end user results (one silly data point I carry in my head is waiting to hear someone at a Big Data conference mention a compelling business impact of one of their Hadoop deployments that isn’t related to log file or Twitter sentiment analyses. I’ve seen the same software vendor platform folks float into Big Data conferences for around 3 years now, and have not heard one yet).

One of the best courses I ever went on was given to us by Citrix, specifically on selling to CxO/board level in large organisations. A lot of it is being able to relate small snippets of things you discover around the industry (or in other industries) that may help influence their business success. One example that I unashamedly stole from Martin Clarkson was that of a new Tesco store in South Korea that he once showed to me:

I passed this onto to the team in my last company that sold to big retailers. At least four board level teams in large UK retailers got to see that video and to agonise if they could replicate Tesco’s work in their own local operations. And I dare say the salespeople bringing it to their attention gained a good reputation for delivering interesting ideas that may help their client organisations future. That’s a great position to be in.

With that, i’ve come full circle from and back to Tesco. Consultative Selling is a good thing to do, and that folks like IBM are complete masters at it; if you’re ever in an IBM facility, be sure to steal one of their current “Institute for Business Value” booklets (or visit their associated group on LinkedIn). Normally brim full of surveys and ideas to stimulate the thought processes of the most senior users running businesses.

We’d do a better job in the IT industry if we could replicate that focus on our end users from top to bottom – and not to spend time elbowing competitors instead. In the meantime, I suspect those rare places that do focus on end users will continue to reap a disproportionate share of the future business out there.

12 years, Google Fusion Tables then Gold Nuggets

Making Sense of Data Course Logo

I’ve had a secret project going since June 2002, entering every component and portion size of my food intake – and exercise – religiously into web site www.weightlossresources.co.uk. Hence when Google decided to run an online course on “Making Sense of Data”, I asked Rebecca Walton at the company if she would be able to get daily intake summary for me in machine readable form: Calorie Intake, Carbs, Protein, Fat weights in grams, plus Exercise calories for every day since I started. Some 3,500 days worth of data. She emailed the spreadsheet to me less than two hours later – brilliant service.

WLR Food Diary

Over that time, i’ve also religiously weighed myself almost every Monday morning, and entered that into the site too. I managed to scrape those readings off the site, and after a few hours work, combined all the data into a single Google Spreadsheet; that’s a free product these days, and has come on leaps and bounds in the last year (i’d not used Excel in anger at all now since late 2012).

Google Spreadsheets Example Sheet - Ian's Weight Loss Stats

With that, I then used the data for the final project of the course, loading the data into Google’s new Fusion Tables Analytics tool on which the course was based.

I’m currently in a 12 week competition at my local gym, based on a course of personal training and bi-weekly progress measures on a Boditrax machine. Effectively a special set of bathroom scales that can shoot electrical signals up one foot and down to the other, and to indicate your fat, muscle and water content. The one thing i’ve found strange is that a lot of the work i’m given is on weights, resulting in a muscle build up, a drop in fat – but at the same time, minimal weight loss. I’m usually reminded that muscle weighs much more than fat; my trainer tells me that the muscle will up my metabolism and contribute to more effective weight loss in future weeks.

Nevertheless, I wanted to analyse all my data and see if I could draw any historical inferences from it that could assist my mission to lose weight this side of the end of the competition (at the end of April). My main questions were:

  1. Is my weekly weight loss directly proportional to the number of calories I consume?
  2. Does the level of exercise I undertake likewise have a direct effect on my weight loss?
  3. Are there any other (nutritional) factors that directly influence my weekly weight loss?

Using the techniques taught in this course, I managed to work out answers to these. I ended up throwing scatter plots like this:

Ian Intake vs Weight Change Scatter Plot

Looking at it, you could infer there was a trend. Sticking a ruler on it sort of suggests that I should be keeping my nett calories consumption around the 2,300 mark to achieve a 2lb/week loss, which is some 200 calories under what i’d been running at with the www.weightlossresources.co.uk site. So, one change to make.

Unlike Tableau Desktop Professional, the current iteration of Google Fusion Tables can’t throw a straight trend line through a scatter chart. You instead have to do a bit of a hop, skip and jump in the spreadsheet you feed in first, using the Google Spreadsheet trend() function – and then you end up with something that looks like this:

Nett Calorie Intake vs Weight Change Chart

The main gotcha there is that every data element in the source data has to be used to draw the trend line. In my case, there were some days when i’d recorded my breakfast food intake, and then been maxed out with work all day – and hence created some outliers I needed to filter out before throwing the trend line. In my case, having the outliers present made the line much shallower than it should have been. Hence one enhancement request for Fusion Tables – please add a “draw a trend line” option that I can invoke to draw a straight line through after filtering out unwanted data. That said, the ability of Fusion Tables to draw data using Maps is fantastic – just not applicable in this, my first, use case.

Some kinks, but a fantastic, easy to use analytics tool – and available as a free add-on to anyone using Google Drive. But the real kudos has to go to Google Spreadsheets; it’s come on leaps and bounds and i’d no longer routinely need Excel any more – and it already now does a lot more. It simply rocks.

The end results of the exercise were:

  1. I need to drop my daily nett calorie intake from 2,511 to 2,300 or so to maintain a 2lb/week loss.
  2. Exercise cals by themselves do not directly influence weight loss performance; there is no direct correlation here at all.
  3. Protein and Fat intake from food have no discernable effect on changes to my weight. However, the level of Carbs I consume have a very material effect; less carbs really help. Reducing the proportion of my carbs intake from the recommended 50% (vs Protein at 20% and Fat at 30%) has a direct correlation to more consistent 2lbs/week losses.

One other learning (from reverse engineering the pie charts in www.weightlossresources.co.uk web site) was that 1g of carbs contains approx 3.75 cals, 1g of Protein maps to 4.0 cals, and 1g of fat to 9.0 cals – and hence why the 30% of a balanced diet attributable to fat consumption is, at face value, quite high.

And then I got certified:

Google Making Sense of Data Course Completion Certificate

So, job done.  One more little exercise to test a theory that dieting one week most often gives the most solid results over a week later, but that can wait for another day (has no material effect if i’m being good every week!). Overall, happy that I can use Google’s tools to do ad-hoc data analysis whenever useful for the future. And a big thankyou to Rebecca Walton and her staff at www.weightlossresources.co.uk, and to Amit, Max and the rest of the staff at Google for an excellent course. Thank you.

Now, back to learning  the structure and nuances of Amazon and Google public Cloud services – a completely different personal simplification project.

-ends-

Footnote: If you ever need to throw a trend line in Google Spreadsheets – at least until that one missing capability makes it into the core product – the process using a simplified sheet is as follows:

Trend Line through Scatter Plot Step 1

Scatter plot initially looks like this:

Trend Line through Scatter Plot Step 2

Add an “=trend()” function to the top empty cell only:

Trend Line through Scatter Plot Step 3

That then writes all the trendline y positions of for all x co-ordinates right down all entries in one take:

Trend Line through Scatter Plot Step 4

which then, when replotted, looks like this. The red dots represent the trend line:

Trend Line through Scatter Plot Step 5

Done!

“Big Data” is really (not so big) Data-based story telling

Aircraft Cockpit

I’m me. My key skill is splicing together data from disparate sources into a compelling, graphical and actionable story that prioritises the way(s) to improve a business. When can I start? Eh, Hello, is anyone there??

One characteristic of the IT industry is its penchants for picking snappy sounding themes, usually illustrative of a future perceived need that their customers may wish to aspire to. And to keep buying stuff toward that destination. Two of these terms de rigueur at the moment are “Big Data” and “Analytics”. There are attached to many (vendor) job adverts and (vendor) materials, though many searching for the first green shoots of demand for most commercial organisations. Or at least a leap of faith that their technology will smooth the path to a future quantifiable outcome.

I’m sure there will be applications aplenty in the future. There are plenty of use cases where sensors will start dribbling out what becomes a tidal wave of raw information, be it on you personally, in your mobile handset, in lower energy bluetooth beacons, and indeed plugged into the “On Board Diagnostics Bus” in your car. And aggregated up from there. Or in the rare case that the company has enough data locked down in one place to get some useful insights already, and has the IT hardware to crack the nut.

I often see desired needs for “Hadoop”, but know of few companies who have the hardware to run it, let alone the Java software smarts to MapReduce anything effectively on a business problem with it. If you do press a vendor, you often end up with a use case for “Twitter sentiment analysis” (which, for most B2B and B2C companies, is a small single digit percentage of their customers), or of consolidating and analysing machine generated log files (which is what Splunk does, out of the box).

Historically, the real problem is data sitting in silos and an inability (for a largely non-IT literate user) to do efficient cross tabulations to eek a useful story out. Where they can, the normal result is locking in on a small number of priorities to make a fundamental difference to a business. Fortunately for me, that’s a thread that runs through a lot of the work i’ve done down the years. Usually in an environment where all hell is breaking loose, where everyone is working long hours, and high priority CEO or Customer initiated “fire drill” interruptions are legion. Excel, Text, SQLserver, MySQL or MongoDB resident data – no problem here. A few samples, mostly done using Tableau Desktop Professional:

  1. Mixing a years worth of Complex Quotes data with a Customer Sales database. Finding that one Sales Region was consuming 60% of the teams Cisco Configuration resources, while at the same time selling 10% of the associated products. Digging deeper, finding that one customer was routinely asking our experts to configure their needs, but their purchasing department buying all the products elsewhere. The Account Manager duly equipped to have a discussion and initiate corrective actions. Whichever way that went, we made more money and/or better efficiency.
  2. Joining data from Sales Transactions and from Accounts Receivable Query logs, producing daily updated graphs on Daily Sales Outstanding (DSO) debt for each sales region, by customer, by vendor product, and by invoices in priority order. The target was to reduce DSO from over 60 days to 30; each Internal Sales Manager had the data at their fingertips to prioritise their daily actions for maximum reduction – and to know when key potential icebergs were floating towards key due dates. Along the way, we also identified one customer who had instituted a policy of querying every single invoice, raising our cost to serve and extending DSO artificially. Again, Account Manager equipped to address this.
  3. I was given the Microsoft Business to manage at Metrologie, where we were transacting £1 million per month, not growing, but with 60% of the business through one retail customer, and overall margins of 1%. There are two key things you do in a price war (as learnt when i’d done John Winkler Pricing Strategy Training back in 1992), which need a quick run around customer and per product analyses. Having instituted staff licensing training, we made the appropriate adjustments to our go-to-market based on the Winkler work. Within four months, we were trading at £5 million/month and at the same time, doubled gross margins, without any growth from that largest customer.
  4. In several instances that demonstrated 7/8-figure Software revenue and profit growth, using a model to identify what the key challenges (or reasons for exceptional performance) were in the business. Every product and subscription business has four key components that, mapped over time, expose what is working and what is an area where corrections are needed. You then have the tools to ask the right questions, assign the right priorities and to ensure that the business delivers its objectives. This has worked from my time in DECdirect (0-$100m in 18 months), in Computacenter’s Software Business Units growth from £80-£250m in 3 years, and when asked to manage a team of 4, working with products from 1,072 different vendors (and delivering our profit goals consistently every quarter). In the latter case, our market share in our largest vendor of the 1,072 went from 7% UK share to 21% in 2 years, winning their Worldwide Solution Provider of the Year Award.
  5. Correlating Subscription Data at Demon against the list of people we’d sent Internet trial CDs to, per advertisement. Having found that the inbound phone people were randomly picking the first “this is where I saw the advert” choice on their logging system, we started using different 0800 numbers for each advert placement, and took the readings off the switch instead. Given that, we could track customer acquisition cost per publication, and spot trends; one was that ads in “The Sun” gave nominal low acquisition costs per customer up front, but were very high churn within 3 months. By regularly looking at this data – and feeding results to our external media buyers weekly to help their price negotiations – we managed to keep per retained customer landing costs at £30 each, versus £180 for our main competitor at the time.

I have many other examples. Mostly simple, and not in the same league as Hans Rosling or Edward Tufte examples i’ve seen. That said, the analysis and graphing was largely done out of hours during days filled with more customer focussed and internal management actions – to ensure our customer experience was as simple/consistent as possible, that the personal aspirations of the team members are fulfilled, and that we deliver all our revenue and profit objectives. I’m good at that stuff, too (ask any previous employer or employee).

With that, i’m off writing some Python code to extract some data ready ahead of my Google “Making Sense of Data” course next week. That to extend my 5 years of Tableau Desktop experience with use of some excellent looking Google hosted tools. And to agonise how to get to someone who’ll employ me to help them, without HR dissing my chances of interview airtime for my lack of practical Hadoop or MapR experience.

The related Business and People Management Smarts don’t appear to get onto most “Requirements” sheet. Yet. A savvy Manager is all I need air time with…

The “M” in MOOC shouldn’t stand for “Maddening”

Mad man pulling his hair out in Frustration

There was a post in Read/Write yesterday entitled “I failed my online course – but learned a lot about Education”: full story here. The short version is that on her Massive Open Online Course, the instructor had delegated out the marking of essays to fellow students on the course, 4/5 of which had unjustifiably marked an essay of hers below the pass mark. With that, the chance of completing the course successfully evaporated, and she left it.

Talking to companies that run these courses to over a thousand (sometimes over 100,000) participants, she cites a statistic that only 6.8% of those registering make it through to the end of the course. That said, my own personal exposure to these things comes down to a number of factors:

  1. If the course is inexpensive or free, there will be a significant drop between the number of registrants and the number of people who even invoke the first lesson. Charges (or availability of an otherwise unobtainable useful skill) will dictate a position in each persons time priorities.
  2. The course must go through a worked example of a task before expecting participants to have the skills to complete a test.
  3. Subjective or Ambiguous answers demotivate people and should be avoided at all costs. Further, course professors or assistants should be active on associated forums to ensure students aren’t frustrated by omissions in the course material. You keep students engaged and have some pointers on how to improve the course next time it’s run.
  4. Above all, participants need to have a sense that they are learning something which they can later apply, and any tests that prove that do add weight to their willingness to plough on.
  5. The final test is meaty, aspirational (at least when the course has started) and proves that the certificate at the end is a worthwhile accomplishment to be personally proud of, and for your peers to respect.

I did two courses on MongoDB a year ago, one “MongoDB for Python Programmers”, the other “MongoDB for DBAs” (that’s Database Administrators for those not familiar with the acronym). Their churn waterfall looked to be much less dramatic than the 6.8% completion rate reported in the post; they started with 6,600 and 6,400 registrants respectively in the courses I participated in, and appear to get completion rates in the scale of 19-24% from then and ever since. Hence a lot of people out there with skills to evangelise and use their software.

The only time any of the above hit me was on Week 2 of the Programmers course, which said on the prerequisites that you didn’t need to have experience in Python to complete the course – given it is easy to learn. In the event, we were asked to write a Python program from scratch to perform some queries on a provided dataset – but before any code that did any interaction with a MongoDB database had been shown.

Besides building loop constructs in Python, the biggest gap was how the namespace of variables in Python mapped onto field names within MongoDB. After several frustrating hours, I put an appeal on the course forum for just one small example that showed how things interacted – and duly received a small example the next morning. Armed with that, I wrote my code, found it came out with one of the multiple choice answers, and all done.

I ended up getting 100% passes with distinction in both courses, and could routinely show a database built, sharded and replicated across several running instances on my Mac. The very sort of thing you’d have to provide in a work setting, having had zero experience of NoSQL databases when the course had started 7 weeks earlier. If you are interested in how they set their courses up, there’s plenty of meat to chew at their Education Blog.

MongoDB for Developers Course CertificateMongoDB for DBAs Course Certificate

I did register for a Mobile Web Engineering Course with iversity but gave that up 2 weeks in. This was the first course i’d attended where fellow students marked my work (and me them – had to mark 7 other students work each week). The downfall there was vague questions on exercises that weren’t covered in the course materials, and where nuances were only outlined in lectures given in German. Having found fellow students were virtually universally confused, an absence of explanation from the course professors or assistants to our cries for guidance, and everyone appearing to spend inordinate, frustrating hours trying to reverse engineer what the answers requested should look like, I started thinking. What have I learnt so far?

Answer: How to deploy a virtual machine on my Mac. How to get German language Firefox running in English. What a basic HTML5/Css3 mobile template looked like. And that i’d spent 6 hours or so getting frustrated trying to reverse engineer the JavaScript calls from a German language Courseware Authoring System, without any idea of what level of detail from the function calling hierarchy was needed for a correct answer in our test. In summary, a lot of work that reading a book could have covered in the first few pages. With that, I completed my assignment that week as best I could, marked the 7 other students as per my commitments that week, and once done, deregistered from the course. I’ve bought some O’Reilly books instead to cover Mobile App Development, so am sure i’ll have a body of expertise to build from soon.

Next week I will be starting the Google “Making Sense of Data” course which looks very impressive and should improve some of analytics and display skills. Really looking forward to it. And given the right content, well engineered like the MongoDB courses, i’m sure Massively Open Online Courses will continue to enhance the skills of people, like me, who are keen to keep learning.

What do you call a good version of “scarred for life”?

Pricing for Results Front Cover

There was an experiment some time ago where Students of a University were asked which lecturers had the most profound effect on their learning experience – 10 years after they’d left higher education. The names cited were rarely the ones that earnt the most, nor recognised for that achievement. I think I can relate to this in a couple of ways.

One from my education at Theale Grammar School – situated in the village of Theale, just west of Reading, who’s previous status as the half way stopping point on the two day London to Bath Stagecoach run blessed it with more pubs per head of population than any other village in the country. These days they call that route the A4, supplanted in most use by the M4 motorway in 1971 or so. I recall four morning assemblies of the hundreds served in my 7 years there (more detail in the footnotes if those are of any interest).

Secondly, while employed in my 17 years at Digital, I was blessed with many experiences that have had a material effect on several businesses since. The one standout has got to be two days spent in the company of John Winkler, who Paul Mears retained to take 30 of us through the art of Pricing; we did this in the surroundings of Newbury Racecourse in (I believe) 1992.

The first day ended with an overnight exercise to come up with a list of ways we could treble our retained profits based on learnings so far. Instead of going straight home, I took a detour back into my office in DECpark Reading, sat in front of my 19″ VAXstation, and started hacking around our recent sales transactions. Got home late, but was armed brimming with ideas for Day 2. Looked like countless ways of doing it.

Next day everyone gave their ideas, had some training on how to negotiate pricing, and were given guidance on how to behave in a price war. With that, the course finished, we thanked John and disappeared away into the night, armed with the training notes.

Fast forward to when I worked for IT Distributor Metrologie. They had bought Olivetti Software Distribution a year or so earlier, and moved their staff into the HQ office in High Wycombe. They were, at that point, one of Microsofts five distributors in the UK, all of whom were conscious of the vendors desire to reduce their Distributor line up. The guy brought in to run the Microsoft Business elected to leave, and in January 1997, Metrologie got slapped with what’s termed a “Productivity Improvement Plan”; basic Microsoft parlance for the path to the Firing Squad. Well, that’s what the Directors knew – I wasn’t told.

I was asked to park my other work and to go fix the Microsoft Business, and given a Purchasing Person who had ambitions to be a Product Manager, plus one buyer. We were doing around £1 million per month, 60% of the business through Dixons Stores Group, and (like most Microsoft distributors) tracking along at 1% gross margin.

The first few days were me just asking questions of reseller staff who bought Microsoft products from several distributors. Distribution staff turnover, lack of consistent/knowledgeable licensing expertise and price inconsistency across several phone calls were consistent concerns. We also had the concern of having one very large customer, who consumed peoples time like no tomorrow and had a few unfortunate ways of doing business:

  • obeying edicts from the top not to pay suppliers to agreed terms at certain points
  • routinely doing “reverse ram raids” at the Warehouse door, sending 40 ton trucks full of returned products for credit close to the end of a trading month
  • bad mouthing our performance in pursuit of a goal to trade directly with the vendor

We employed the learnings from John Winklers Course – in particular the guidelines of how to behave in a price war – and it worked with a vengeance. While the DSG business didn’t grow, the overall business went from £1m/month to £5/month in four months, and at doubled margins. Due to the dynamics of how a Distribution business works, and major suppliers being very strict on payment terms, I learnt how “overtrading” feels at close range. But at that point, i’d been headhunted for a role at Demon Internet, but extended my notice by 4 weeks in an attempt to avert the Firing Squad which i’d since learnt about on the journey. We already found we weren’t being invited to Microsoft Social Events that our status should have conferred on us.

I went into a meeting at Microsoft with my Chairman and the Group Marketing Director from France; their body language going into the meeting was all wrong, and we were told that despite our recent performance, that Microsoft were going to lose us as a Distributor. We lodged an appeal, and I left to my new role in Demon Internet; Product Manager Tracy left shortly afterward to a Software Business Manager role at reseller BSG, doubling her salary in the four months we’d worked together. A week in, I got a phone call from my immediate ex-boss, Bob Grindley, to be told that the Microsoft Contract had in fact been retained.

I learnt one set of very useful guidelines on how to measure and improve any business from our then Microsoft Account Manager, Edward Hyde, early in my time in that role – the core ones I still use to this day. That apart, the work on pricing I learnt from John Winkler made a material difference; I can think of no other reason that it took 4 months to grow the business 5x in revenue and 2x in margin in the middle of a 5-way price war.

The dirty secret is that the 2-day course is condensed into a paperback book entitled “Pricing for Results” by John Winkler himself. Now out of print, but if you’re quick, available for the princely sum of 1p plus postage from several third party sellers on Amazon. A real steal. Or you can hire me to assist with any business improvement project!

Pricing for Result - Back Cover Text

Footnote: John Winkler still appears to be running his Pricing Courses: www.winklers.co.uk/business

Continue reading