IT Trends into 2017 – or the delusions of Ian Waring

Bowling Ball and Pins

My perception is as follows. I’m also happy to be told I’m mad, or delusional, or both – but here goes. Most reflect changes well past the industry move from CapEx led investments to Opex subscriptions of several years past, and indeed the wholesale growth in use of Open Source Software across the industry over the last 10 years. Your own Mileage, or that of your Organisation, May Vary:

  1. if anyone says the words “private cloud”, run for the hills. Or make them watch https://youtu.be/URvWSsAgtJE. There is also an equivalent showing how to build a toaster for $15,000. The economics of being in the business of building your own datacentre infrastructure is now an economic fallacy. My last months Amazon AWS bill (where I’ve been developing code – and have a one page site saying what the result will look like) was for 3p. My Digital Ocean server instance (that runs a network of WordPress sites) with 30GB flash storage and more bandwidth than I can shake a stick at, plus backups, is $24/month. Apart from that, all I have is subscriptions to Microsoft, Github and Google for various point services.
  2. Most large IT vendors have approached cloud vendors as “sell to”, and sacrificed their own future by not mapping customer landscapes properly. That’s why OpenStack is painting itself into a small corner of the future market – aimed at enterprises that run their own data centres and pay support costs on a per software instance basis. That’s Banking, Finance and Telco land. Everyone else is on (or headed to) the public cloud, for both economic reasons and “where the experts to manage infrastructure and it’s security live” at scale.
  3. The War stage of Infrastructure cloud is over. Network effects are consolidating around a small number of large players (AWS, Google Cloud Platform, Microsoft Azure) and more niche players with scale (Digital Ocean among SME developers, Softlayer in IBM customers of old, Heroku with Salesforce, probably a few hosting providers).
  4. Industry move to scale out open source, NoSQL (key:value document orientated) databases, and components folks can wire together. Having been brought up on MySQL, it was surprisingly easy to set up a MongoDB cluster with shards (to spread the read load, scaled out based on index key ranges) and to have slave replicas backing data up on the fly across a wide area network. For wiring up discrete cloud services, the ground is still rough in places (I spent a couple of months trying to get an authentication/login workflow working between a single page JavaScript web app, Amazon Cognito and IAM). As is the case across the cloud industry, the documentation struggles to keep up with the speed of change; developers have to be happy to routinely dip into Github to see how to make things work.
  5. There is a lot of focus on using Containers as a delivery mechanism for scale out infrastructure, and management tools to orchestrate their environment. Go, Chef, Jenkins, Kubernetes, none of which I have operational experience with (as I’m building new apps have less dependencies on legacy code and data than most). Continuous Integration and DevOps often cited in environments were custom code needs to be deployed, with Slack as the ultimate communications tool to warn of regular incoming updates. Having been at one startup for a while, it often reminded me of the sort of military infantry call of “incoming!” from the DevOps team.
  6. There are some laudable efforts to abstract code to be able to run on multiple cloud providers. FOG in the Ruby ecosystem. CloudFoundry (termed BlueMix in IBM) is executing particularly well in large Enterprises with investments in Java code. Amazon are trying pretty hard to make their partners use functionality only available on AWS, in traditional lock-in strategy (to avoid their services becoming a price led commodity).
  7. The bleeding edge is currently “Function as a Service”, “Backend as a Service” or “Serverless apps” typified with Amazon Lambda. There are actually two different entities in the mix; one to provide code and to pay per invocation against external events, the other to be able to scale (or contract) a service in real time as demand flexes. You abstract all knowledge of the environment  away.
  8. Google, Azure and to a lesser extent AWS are packaging up API calls for various core services and machine learning facilities. Eg: I can call Google’s Vision API with a JPEG image file, and it can give me the location of every face (top of nose) on the picture, face bounds, whether each is smiling or not). Another that can describe what’s in the picture. There’s also a link into machine learning training to say “does this picture show a cookie” or “extract the invoice number off this image of a picture of an invoice”. There is an excellent 35 minute discussion on the evolving API landscape (including the 8 stages of API lifecycle, the need for honeypots to offset an emergent security threat and an insight to one impressive Uber API) on a recent edition of the Google Cloud Platform Podcast: see http://feedproxy.google.com/~r/GcpPodcast/~3/LiXCEub0LFo/
  9. Microsoft and Google (with PowerApps and App Maker respectively) trying to remove the queue of IT requests for small custom business apps based on company data. Though so far, only on internal intranet type apps, not exposed outside the organisation). This is also an antithesis of the desire for “big data”, which is really the domain of folks with massive data sets and the emergent “Internet of Things” sensor networks – where cloud vendor efforts on machine learning APIs can provide real business value. But for a lot of commercial organisations, getting data consolidated into a “single version of the truth” and accessible to the folks who need it day to day is where PowerApps and AppMaker can really help.
  10. Mobile apps are currently dogged by “winner take all” app stores, with a typical user using 5 apps for almost all of their mobile activity. With new enhancements added by all the major browser manufacturers, web components will finally come to the fore for mobile app delivery (not least as they have all the benefits of the web and all of those of mobile apps – off a single code base). Look to hear a lot more about Polymer in the coming months (which I’m using for my own app in conjunction with Google Firebase – to develop a compelling Progressive Web app). For an introduction, see: https://www.youtube.com/watch?v=VBbejeKHrjg
  11. Overall, the thing most large vendors and SIs have missed is to map their customer needs against available project components. To map user needs against axes of product life cycle and value chains – and to suss the likely movement of components (which also tells you where to apply six sigma and where agile techniques within the same organisation). But more eloquently explained by Simon Wardley: https://youtu.be/Ty6pOVEc3bA

There are quite a range of “end of 2016” of surveys I’ve seen that reflect quite a few of these trends, albeit from different perspectives (even one that mentioned the end of Java as a legacy language). You can also add overlays with security challenges and trends. But – what have I missed, or what have I got wrong? I’d love to know your views.

Future Health: DNA is one thing, but 90% of you is not you


One of my pet hates is seeing my wife visit the doctor, getting hunches of what may be afflicting her health, and this leading to a succession of “oh, that didn’t work – try this instead” visits for several weeks. I just wonder how much cost could be squeezed out of the process – and lack of secondary conditions occurring – if the root causes were much easier to identify reliably. I then wonder if there is a process to achieve that, especially in the context of new sensors coming to market and their connectivity to databases via mobile phone handsets – or indeed WiFi enabled, low end Bluetooth sensor hubs aka the Apple Watch.

I’ve personally kept a record of what i’ve eaten, down to fat, protein and carb content (plus my Monday 7am weight and daily calorie intake) every day since June 2002. A precursor to the future where devices can keep track of a wide variety of health signals, feeding a trend (in conjunction with “big data” and “machine learning” analyses) toward self service health. My Apple Watch has a years worth of heart rate data. But what signals would be far more compelling to a wider variety of (lack of) health root cause identification if they were available?

There is currently a lot of focus on Genetics, where the Human Genome can betray many characteristics or pre-dispositions to some health conditions that are inherited. My wife Jane got a complete 23andMe statistical assessment several years ago, and has also been tested for the BRCA2 (pronounced ‘bracca-2’) gene – a marker for inherited pre-disposition to risk of Breast Cancer – which she fortunately did not inherit from her afflicted father.

A lot of effort is underway to collect and sequence the complete Genome sequences from the DNA of hundreds of thousands of people, building them into a significant “Open Data” asset for ongoing research. One gotcha is that such data is being collected by numerous organisations around the world, and the size of each individuals DNA (assuming one byte to each nucleotide component – A/T or C/G combinations) runs to 3GB of base pairs. You can’t do research by throwing an SQL query (let alone thousands of machine learning attempts) over that data when samples are stored in many different organisations databases, hence the existence of an API (courtesy of the GA4GH Data Working Group) to permit distributed queries between co-operating research organisations. Notable that there are Amazon Web Services and Google employees participating in this effort.

However, I wonder if we’re missing a big and potentially just as important data asset; that of the profile of bacteria that everyone is dependent on. We are each home to approx. 10 trillion human cells among the 100 trillion microbial cells in and on our own bodies; you are 90% not you.

While our human DNA is 99.9% identical to any person next to us, the profile of our MicroBiome are typically only 10% similar; our age, diet, genetics, physiology and use of antibiotics are also heavy influencing factors. Our DNA is our blueprint; the profile of the bacteria we carry is an ever changing set of weather conditions that either influence our health – or are leading indicators of something being wrong – or both. Far from being inert passengers, these little organisms play essential roles in the most fundamental processes of our lives, including digestion, immune responses and even behaviour.

Different MicroBiome ecosystems are present in different areas of our body, from our skin, mouth, stomach, intestines and genitals; most promise is currently derived from the analysis of stool samples. Further, our gut is only second to our brain in the number of nerve endings present, many of them able to enact activity independently from decisions upstairs. In other areas, there are very active hotlines between the two nerve cities.

Research is emerging that suggests previously unknown links between our microbes and numerous diseases, including obesity, arthritis, autism, depression and a litany of auto-immune conditions. Everyone knows someone who eats like a horse but is skinny thin; the composition of microbes in their gut is a significant factor.

Meanwhile, costs of DNA sequencing and compute power have dropped to a level where analysis of our microbe ecosystems costs from $100M a decade ago to some $100 today. It should continue on that downward path to a level where personal regular sampling could become available to all – if access to the needed sequencing equipment plus compute resources were more accessible and had much shorter total turnaround times. Not least to provide a rich Open Data corpus of samples that we can use for research purposes (and to feed back discoveries to the folks providing samples). So, what’s stopping us?

Data Corpus for Research Projects

To date, significant resources are being expended on Human DNA Genetics and comparatively little on MicroBiome ecosystems; the largest research projects are custom built and have sampling populations of less than 4000 individuals. This results in insufficient population sizes and sample frequency on which to easily and quickly conduct wholesale analyses; this to understand the components of health afflictions, changes to the mix over time and to isolate root causes.

There are open data efforts underway with the American Gut Project (based out of the Knight Lab in the University of San Diego) plus a feeder “British Gut Project” (involving Tim Spector and staff at University College London). The main gotcha is that the service is one-shot and takes several months to turn around. My own sample, submitted in January, may take up 6 months to work through their sequencing then compute batch process.

In parallel, VC funded company uBiome provide the sampling with a 6-8 week turnaround (at least for the gut samples; slower for the other 4 area samples we’ve submitted), though they are currently not sharing the captured data to the best of my knowledge. That said, the analysis gives an indication of the names, types and quantities of bacteria present (with a league table of those over and under represented compared to all samples they’ve received to date), but do not currently communicate any health related findings.

My own uBiome measures suggest my gut ecosystem is more diverse than 83% of folks they’ve sampled to date, which is an analogue for being more healthy than most; those bacteria that are over represented – one up to 67x more than is usual – are of the type that orally administered probiotics attempt to get to your gut. So a life of avoiding antibiotics whenever possible appears to have helped me.

However, the gut ecosystem can flex quite dramatically. As an example, see what happened when one person contracted Salmonella over a three pay period (the green in the top of this picture; x-axis is days); you can see an aggressive killing spree where 30% of the gut bacteria population are displaced, followed by a gradual fight back to normality:

Salmonella affecting MicroBiome PopulationUnder usual circumstances, the US/UK Gut Projects and indeed uBiome take a single measure and report back many weeks later. The only extra feature that may be deduced is the delta between counts of genome start and end sequences, as this will give an indication to the relative species population growth rates from otherwise static data.

I am not aware of anyone offering a faster turnaround service, nor one that can map several successively time gapped samples, let alone one that can convey health afflictions that can be deduced from the mix – or indeed from progressive weather patterns – based on the profile of bacteria populations found.

My questions include:

  1. Is there demand for a fast turnaround, wholesale profile of a bacterial population to assist medical professionals isolating a indicators – or the root cause – of ill health with impressive accuracy?
  2. How useful would a large corpus of bacterial “open data” be to research teams, to support their own analysis hunches and indeed to support enough data to make use of machine learning inferences? Could we routinely take samples donated by patients or hospitals to incorporate into this research corpus? Do we need the extensive questionnaires the the various Gut Projects and uBiome issue completed alongside every sample?
  3. What are the steps in the analysis pipeline that are slowing the end to end process? Does increased sample size (beyond a small stain on a cotton bud) remove the need to enhance/copy the sample, with it’s associated need for nitrogen-based lab environments (many types of bacteria are happy as Larry in the Nitrogen of the gut, but perish with exposure to oxygen).
  4. Is there any work active to make the QIIME (pronounced “Chime”) pattern matching code take advantage of cloud spot instances, inc Hadoop or Spark, to speed the turnaround time from Sequencing reads to the resulting species type:volume value pairs?
  5. What’s the most effective delivery mechanism for providing “Open Data” exposure to researchers, while retaining the privacy (protection from financial or reputational prejudice) for those providing samples?
  6. How do we feed research discoveries back (in English) to the folks who’ve provided samples and their associated medical professionals?

New Generation Sequencing works by splitting DNA/RNA strands into relatively short read lengths, which then need to be reassembled against known patterns. Taking a poop sample with contains thousands of different bacteria is akin to throwing the pieces of many thousand puzzles into one pile and then having to reconstruct them back – and count the number of each. As an illustration, a single HiSeq run may generate up to 6 x 10^9 sequences; these then need reassembling and the count of 16S rDNA type:quantity value pairs deduced. I’ve seen estimates of six thousand CPU hours to do the associated analysis to end up with statistically valid type and count pairs. This is a possible use case for otherwise unused spot instance capacity at large cloud vendors if the data volumes could be ingested and processed cost effectively.

Nanopore sequencing is another route, which has much longer read lengths but is much more error prone (1% for NGS, typically up to 30% for portable Nanopore devices), which probably limits their utility for analysing bacteria samples in our use case. Much more useful if you’re testing for particular types of RNA or DNA, rather than the wholesale profiling exercise we need. Hence for the time being, we’re reliant on trying to make an industrial scale, lab based batch process turn around data as fast we are able – but having a network accessible data corpus and research findings feedback process in place if and when sampling technology gets to be low cost and distributed to the point of use.

The elephant in the room is in working out how to fund the build of the service, to map it’s likely cost profile as technology/process improvements feed through, and to know to what extent it’s diagnosis of health root causes will improve it’s commercial attractiveness as a paid service over time. That is what i’m trying to assess while on the bench between work contracts.

Other approaches

Nature has it’s way of providing short cuts. Dogs have been trained to be amazingly prescient at assessing whether someone has Parkinson’s just by smelling their skin. There are other techniques where a pocket sized spectrometer can assess the existence of 23 specific health disorders. There may well be other techniques that come to market that don’t require a thorough picture of a bacterial population profile to give medical professionals the identity of the root causes of someone’s ill health. That said, a thorough analysis may at least be of utility to the research community, even if we get to only eliminate ever rarer edge cases as we go.

Coming full circle

One thing that’s become eerily apparent to date is some of the common terminology between MicroBiome conditions and terms i’ve once heard used by Chinese Herbal Medicine (my wife’s psoriasis was cured after seeing a practitioner in Newbury for several weeks nearly 20 years ago). The concept of “balance” and the existence of “heat” (betraying the inflammation as your bacterial population of different species ebbs and flows in reaction to different conditions). Then consumption or application of specific plant matter that puts the bodies bacterial population back to operating norms.

Lingzhi Mushroom

Wild mushroom “Lingzhi” in China: cultivated in the far east, found to reduce Obesity

We’ve started to discover that some of the plants and herbs used in Chinese Medicine do have symbiotic effects on your bacterial population on conditions they are reckoned to help cure. With that, we are starting to see some statistically valid evidence that Chinese and Western medicine may well meet in the future, and be part of the same process in our future health management.

Until then, still work to do on the business plan.

Another lucid flurry of Apple thinking it through – unlike everyone else

Apple Watch Home Screen

This happens every time Apple announce a new product category. Audience reaction, and the press, rush off to praise or condemn the new product without standing back and joining the dots. The Kevin Lynch presentation at the Keynote also didn’t have a precursor of a short video on-ramp to help people understand the full impact of what they were being told. With that, the full impact is a little hidden. It’s a lot more than having Facebook, Twitter, Email and notifications on your wrist when you have your phone handset in your pocket.

There were a lot of folks focussing on it’s looks and comparisons to the likely future of the Swiss watch industry. For me, the most balanced summary of the luxury esthetics from someone who’s immersed in that industry can be found at:  http://www.hodinkee.com/blog/hodinkee-apple-watch-review

Having re-watched the keynote, and seen all the lame Androidware, Samsung, LG and Moto 360 comparisons, there are three examples that explode almost all of the “meh” reactions in my view. The story is hidden my what’s on that S1 circuit board inside the watch, and the limited number of admissions of what it can already do. Three scenarios:

1. Returning home at the end of a working day (a lot of people do this).

First thing I do after I come indoors is to place my mobile phone on top of the cookery books in our kitchen. Then for the next few hours i’m usually elsewhere in the house or in the garden. Talking around, that behaviour is typical. Not least as it happens in the office too, where if i’m in a meeting, i’d normally leave my handset on silent on my desk.

With every Android or Tizen Smart Watch I know, the watch loses the connection as soon as I go out of Bluetooth range – around 6-10 meters away from the handset. That smart watch is a timepiece from that point on.

Now, who forgot to notice that the Apple Watch has got b/g WiFi integrated on their S1 module? Or that it it can not only tell me of an incoming call, but allow me to answer it, listen and talk – and indeed to hand control back to my phone handset when I return to it’s current proximity?

2. Sensors

There are a plethora of Low Energy Bluetooth sensors around – and being introduced with great regularity – for virtually every bodily function you can think of. Besides putting your own fitness tracking sensors on at home, there are probably many more that can be used in a hospital setting. With that, a person could be quite a walking network of sensors and wander to different wards or labs during their day, or indeed even be released to recuperate at home.

Apple already has some sensors (heart rate, and probably some more capabilities to be announced in time, using the infrared related ones on the skin side of the Apple watch), but can act as a hub to any collection of external bluetooth sensors at the same time. Or in smart pills you can swallow. Low Energy Bluetooth is already there on the Apple Watch. That, in combination with the processing power, storage and b/g WiFi makes the watch a complete devices hub, virtually out of the box.

If your iPhone is on the same WiFi, everything syncs up with the Health app there and the iCloud based database already – which you can (at your option) permit an external third party to have access to. Now, tell me about the equivalent on any other device or service you can think of.

3. Paying for things.

The iPhone 5S, 6 and 6 Plus all have integrated finger print scanners. Apple have put some functionality into iOS 8 where, if you’re within Bluetooth range (6-10 meters of your handset), you can authenticate (with your fingerprint) the fact your watch is already on your wrist. If the sensors on the back have any suspicion that the watch leaves your wrist, it immediately invalidates the authentication.

So, walk up to a contactless till, see the payment amount appear on the watch display, one press of the watch pays the bill. Done. Now try to do that with any other device you know.

Developers, developers, developers.

There are probably a million other applications that developers will think of, once folks realise there is a full UNIX computer on that SoC (System on a Chip). With WiFi. With Bluetooth. With a Taptic feedback mechanism that feels like someone is tapping your wrist (not loudly vibrating across the table, or flashing LED lights at you). With a GPU driving a high quality, touch sensitive display. Able to not only act as a remote control for your iTunes music collection on another device, but to play it locally when untethered too (you can always add bluetooth earbuds to keep your listening private). I suspect some of the capabilities Apple have shown (like the ability to stream your heartbeat to another Apple Watch user) will evolve into potential remote health visit applications that can work Internet wide.

Meanwhile, the tech press and the discussion boards are full of people lamenting the fact that there is no GPS sensor in the watch itself (like every other Smart Watch I should add – GPS location sensing is something that eats battery power for breakfast; better to rely on what’s in the phone handset, or to wear a dedicated bluetooth GPS band on the other wrist if you really need it).

Don’t be distracted; with the electronics already in the device, the Apple Watch is truly only the beginning. We’re now waiting for the full details of the WatchKit APIs to unleash that ecosystem with full force.

Yo! Minimalist Notifications, API and the Internet of Things

Yo LogoThought it was a joke, but having 4 hours of code resulting in $1m of VC funding, at an estimated $10M company valuation, raised quite a few eyebrows. The Yo! project team have now released their API, and with it some possibilities – over and above the initial ability to just say “Yo!” to a friend. At the time he provided some of the funds, John Borthwick of Betaworks said that there is a future of delivering binary status updates, or even commands to objects to throw an on/off switch remotely (blog post here). The first green shoots are now appearing.

The main enhancement is the ability to carry a payload with the Yo!, such as a URL. Hence your Yo!, when received, can be used to invoke an application or web page with a bookmark already put in place. That facilitates a notification, which is effectively guaranteed to have arrived, to say “look at this”. Probably extensible to all sorts of other tasks.

The other big change is the provision of an API, which allows anyone to create a Yo! list of people to notify against a defined name. So, in theory, I could create a virtual user called “IANWARING-SIMPLICITY-SELLS”, and to publicise that to my blog audience. If any user wants to subscribe, they just send a “Yo!” to that user, and bingo, they are subscribed and it is listed (as another contact) on their phone handset. If I then release a new blog post, I can use a couple of lines of Javascript or PHP to send the notification to the whole subscriber base, carrying the URL of the new post; one key press to view. If anyone wants to unsubscribe, they just drop the username on their handset, and the subscriber list updates.

Other applications described include:

  • Getting a Yo! when a FedEx package is on it’s way
  • Getting a Yo! when your favourite sports team scores – “Yo us at ASTONVILLA and we’ll Yo when we score a goal!
  • Getting a Yo! when someone famous you follow tweets or posts to Instagram
  • Breaking News from a trusted source
  • Tell me when this product comes into stock at my local retailer
  • To see if there are rental bicycles available near to you (it can Yo! you back)
  • You receive a payment on PayPal
  • To be told when it starts raining in a specific town
  • Your stocks positions go up or down by a specific percentage
  • Tell me when my wife arrives safely at work, or our kids at their travel destination

but I guess there are other “Internet of Things” applications to switch on home lights, open garage doors, switch on (or turn off) the oven. Or to Yo! you if your front door has opened unexpectedly (carrying a link to the picture of who’s there?). Simple one click subscriptions. So, an extra way to operate Apple HomeKit (which today controls home appliance networks only through Siri voice control).

Early users are showing simple Restful URLs and http GET/POSTs to trigger events to the Yo! API. I’ve also seen someone say that it will work with CoPA (Constrained Application Protocol), a lightweight protocol stack suitable for use within simple electronic devices.

Hence, notifications that are implemented easily and over which you have total control. Something Apple appear to be anal about, particularly in a future world where you’ll be walking past low energy bluetooth beacons in retail settings every few yards. Your appetite to be handed notifications will degrade quickly with volumes if there are virtual attention beggars every few paces. Apple have been locking down access to their iBeacon licensees to limit the chance of this happening.

With the Yo! API, the first of many notification services (alongside Google Now, and Apples own notification services), and a simple one at that. One that can be mixed with IFTTT (if this, then that), a simple web based logic and task action system also produced by Betaworks. And which may well be accessible directly from embedded electronics around us.

The one remaining puzzle is how the authors will be able to monetise their work (their main asset is an idea of the type and frequency of notifications you welcome receiving, and that you seek). Still a bit short of Google’s core business (which historically was to monetise purchase intentions) at this stage in Yo!’s development. So, suggestions in the case of Yo! most welcome.

 

Nadella: Heard what he said, knew what he meant

Satya Nadella

That’s a variation of an old “Two Ronnies” song in the guise of “Jehosaphat & Jones” entitled “I heard what she said, but knew what she meant” (words or three minutes into this video). Having read Satya Nadella’s Open Letter to employees issued at the start of Microsoft’s new fiscal year, I did think it was long. However, the real delight was reading Jean-Louis Gassee – previously the CTO of Apple – not only pulling it apart, but then having a crack at showing how it should have been written:

Team,

This is the beginning of our new FY 2015 – and of a new era at Microsoft. I have good news and bad news.The bad news is the old Devices and Services mantra won’t work. For example: I’ve determined we’ll never make money in tablets or smartphones.

So, do we continue to pretend we’re “all in” or do we face reality and make the painful decision to pull out so we can use our resources – including our integrity – to fight winnable battles? With the support of the Microsoft Board, I’ve chosen the latter.

We’ll do our utmost to minimize the pain that will naturally arise from this change. Specifically, we’ll offer generous transitions arrangements in and out of the company to concerned Microsoftians and former Nokians.

The good news is we have immense resources to be a major player in the new world of Cloud services and Native Apps for mobile devices.

We let the first innings of that game go by, but the sting energizes us. An example of such commitment is the rapid spread of Office applications – and related Cloud services – on any and all mobile devices. All Microsoft Enterprise and Consumer products/services will follow, including Xbox properties.

I realize this will disrupt the status quo and apologize for the pain to come. We have a choice: change or be changed.

Stay tuned.

Satya.

Jean-Louis Gassee’s  full take-home on the original is provided here. Satya Nadella should hire him.

The Moving Target that is Enterprise IT infrastructures

Docker Logo

A flurry of recent Open Source Enterprise announcements, one relating to Docker – allowing Linux containers containing all their needed components to be built, distributed and then run atop Linux based servers. With this came the inference that Virtualisation was likely to get relegated to legacy application loads. Docker appears to have support right across the board – at least for Linux workloads – covering all the major public cloud vendors. I’m still unsure where that leaves the other niche that is Windows apps.

The next announcement was that of Apache Mesos, which is the software originally built by ex-Google Twitter engineers – largely the replicate the Google Borg software used to fire up multi-server workloads across Google’s internal infrastructure. This used to good effect to manage Twitters internal infrastructure and to consign their “Fail Whale” to much rarer appearances. At the same time, Google open sourced a version of their software – I’ve not yet made out if it’s derived from the 10+ year old Borg or more recent Omega projects – to do likewise, albeit at smaller scale than Google achieve inhouse. The one thing that bugs me is that I can never remember it’s name (i’m off trying to find reference to it again – and now I return 15 minutes later!).

“Google announced Kubernetes, a lean yet powerful open-source container manager that deploys containers into a fleet of machines, provides health management and replication capabilities, and makes it easy for containers to connect to one another and the outside world. (For the curious, Kubernetes (koo-ber-nay’-tace) is Greek for “helmsman” of a ship)”.

That took some finding. Koo-ber-nay-tace. No exactly memorable.

However, it looks like it’ll be a while before these packaging, deployment and associated management technologies get ingrained in Enterprise IT workloads. A lot of legacy systems out there are simply not architected to run on scale-out infrastructures yet, and it’s a source of wonder what the major Enterprise software vendors are running in their own labs. If indeed they have an appetite to disrupt themselves before others attempt to.

I still cringe with how one ERP system I used to use had the cost collection mechanisms running as a background batch process, and the margins of the running business went all over the place like a skidding car as orders were loaded. Particularly at end of quarter customer spend spikes, where the complexity of relational table joins had a replicated mirror copy of the transaction system consistently running 20-25 minutes behind the live system. I should probably cringe even more given there’s no obvious attempt by startups to fundamentally redesign an ERP system from the ground up using modern techniques. At least yet.

Startups appear to be much more heavily focussed on much lighter mobile based applications – of which there are a million different bets chasing VC money. Moving Enterprise IT workloads into much more cost effective (but loosely coupled) public cloud based infrastructure – and that take full advantage of its economics – is likely to take a little longer. I sometimes agonise over what change(s) would precipitate that transition – and whether that’s a monolith app, or a network of simple ones daisy chained together.

I think we need a 2014 networked version of Silicon Office or Hypercard to trigger some progress. Certainly their abject simplicity is no more, and we’re consigned to the lower level, piecemeal building bricks – like JavaScript – which is what life was like in assembler before high level languages liberated us. Some way to go.

Explaining Distributed Data Consistency to IT novices? Well, …

Greek Shepherd

it’s all greek to me. Bruce Stidston cited a post on Google+ where Yonatan Zunger, Chief Architect of Google+, tried to explain Data Consistency by way of Greeks enacting laws onto statute books on disparate islands. Very long post here. It highlights the challenges of maintaining data consistency when pieces of your data are distributed over many locations, and the logistics of trying to keep them all in sync – in a way that should be understandable to the lay – albeit patient – reader.

The treatise missed out the concept of two-phased commit, which is a way of doing handshakes between two (identical copies) of a database to ensure a transaction gets played successfully on both the master and the replica sited elsewhere on a network. So, if you get some sort of failure mid transaction, both sides get returned to a consistent state without anything going down the cracks. Important if that data is monetary balance transfers between bank accounts for example.

The thing that impressed me most – and which i’d largely taken for granted – is how MongoDB (the most popular Open Source NoSQL Database in the world) can handle virtually all the use cases cited in the article out of the box, with no add-ons. You can specify “happy go lucky”, majority or all replicas consistent before confirming write completion. And if a definitive “Tyrant” fails, there’s an automatic vote among the surviving instances for which secondary copy becomes the new primary (and on rejoining, the changes are journaled back to consistency). And those instances can be distributed in different locations on the internet.

Bruce contended that Google may not like it’s blocking mechanics (which will slow down access while data is written) to retain consistency on it’s own search database. However, I think Google will be very read heavy, and it won’t usually be a disaster if changes are journaled onto new Google search results to its readers. No money to go between the cracks in their case, any changes just appear the next time you enact the same search; one very big moving target.

Ensuring money doesn’t go down the cracks is what Blockchains design out (majority votes, then change declines to update attempts after that’s achieved). That’s why it can take up to 10 minutes for a Bitcoin transaction to get verified. I wrote introductory pieces about Bitcoin and potential Blockchain applications some time back if those are of interest.

So, i’m sure there must be a more pithy summary someone could draw, but it would add blockchains to the discussion, and probably relate some of the artistry behind hashes and Git/Github to manage large, multiuser, multiple location code, data and writing projects. However, that’s for the IT guys. They should know this stuff, and know what to apply in any given business context.

Footnote: I’ve related MongoDB as that is the one NoSQL database I have accreditations in, having completed two excellent online courses with them (while i’m typically a senior manager, I like to dip into new technologies to understand their capabilities – and to act as a bullshit repellent!). Details of said courses here. The same functionality may well be available with other NoSQL databases.

Sometimes a picture is “How on earth did you do that”?

IBM3270ALLIN1

People often remember a startling or surprising first impression. Riverdance when they first appeared during the voting interval during Eurovision 1994. 19-year old Everton substitute Wayne Rooney being put on the pitch against a season-long unbeaten Arsenal side, and scoring. A young David Beckham doing likewise against Wimbledon from the half way line. Or Doug Flutie, Quarterback for Boston College, throwing the winning touchdown in a Rose Bowl final from an incredible distance with no time left on the clock. There is even a road in Boston called “Flutie Pass” named in memory of that sensational hail mary throw.

There are always lots of pressures on IT Managers and their staff, with tightening budgets, constrained resources and a precious shortage of time. We used to have a task to try and minimise the friction these folks had in buying Enterprise IT products and services from us or our reseller channels. A salesperson or vendor was normally the last person they wanted to have a dependency on for basic, routine “stuff”, especially for items they should be able to work out for themselves. At least if given the right information in lucid form, concise and free of surprises – immediately available at their fingertips.

The picture was one of the ones we put in the DECdirect Software Catalogue. It shows an IBM 3278 terminal, hooked up to an IBM Mainframe, with Digital’s VAX based ALL-IN-1 Office Automation Suite running on it. At the time, this was a startling revelation; the usual method for joining an IBM system to a DEC one at the time was to make the DEC machine look like a remotely connected IBM 2780 card reader. The two double page spreads following that picture showed how to piece this, and other forms of connections to IBM mainframes, together.

The DECdirect Software catalogue had an aim of being able to spit out all the configuration rules, needed part numbers and matching purchase prices with a minimal, simple and concise read. Our target for our channel salesforce(s) was to enable them to extract a correct part number and price for any of our 550 products – across between 20-48 different pricing tiers each – within their normal attention span. Which we assumed was 30 seconds. Given appropriate focus, Predictability, Consistency and the removal of potential surprises can be designed in.

In the event, that business (for which I was the first employee in, working alongside 8 shared telesellers and 2 tech support staff) went 0-$100m in 18 months, with over 90% of the order volume coming in directly from customers, correctly priced at source. That got me a 2-level promotion and running the UK Software Products Business, 16 staff and the country software P&L as a result.

One of my colleagues in DEC Finland did a similar document for hardware options, entitled “Golden Eggs“. Everything in one place, with all the connections on the back of each system nicely documented, and any constraints right in front of you. A work of great beauty, and still maintained to this day for a wide range of other systems and options. The nearest i’ve seen more recently are sample architecture diagrams published by Amazon Web Services – though the basics for IT Managers seeing AWS (or other public cloud vendors offerings) for the first time are not yet apparent to me.

Things in the Enterprise IT world are still unnecessarily complicated, and the ability to stand in the end users shoes for a limited time bears real fruits. I’ve repeated that in several places before and since then with pretty spectacular results; it’s typically only a handful of things to do well in order to liberate end users, and to make resellers and other supply channels insanely productive. All focus then directed on keeping customers happy and their objectives delivered on time, and more often that not, under budget.

One of my friends (who works at senior level in Central Government) lamented to me today that “The (traditional vendor) big players are all trying to convince the world of their cloudy goodness, unfortunately using their existing big contract corporate teams who could not sell life to a dying man”.

I’m sure some of the Public Cloud vendors would be more than capable to arm people like him appropriately. I’d love to help a market leading one do it.

Footnote: I did a previous post on what Vendors, Distributors and Resellers want here.

Officially Certified: AWS Business Professional

AWS Business Professional Certification

That’s added another badge, albeit the primary reason was to understand AWS’s products and services in order to suss how to build volumes via resellers for them – just in case I can get the opportunity to be asked how i’d do it. However, looking over the fence at some of the technical accreditation exams, I appear to know around half of the answers there already – but need to do those properly and take notes before attempting those.

(One of my old party tricks used to be that I could make it past the entrance exam required for entry into technical streams at Linux related conferences – a rare thing for a senior manager running large Software Business Operations or Product Marketing teams. Being an ex programmer who occasionally fiddles under the bonnet on modern development tools is a useful thing – not least to feed an ability to be able to spot bullshit from quite a distance).

The only AWS module I had any difficulty with was the pricing. One of the things most managers value is simplicity and predictability, but a lot of the pricing of core services have pricing dependencies where you need to know data sizes, I/O rates or the way your demand goes through peaks and troughs in order to arrive at an approximate monthly price. While most of the case studies amply demonstrate that you do make significant savings compared to running workloads on your own in-house infrastructure, I guess typical values for common use cases may be useful. For example, if i’m running a SAP installation of specific data and access dimensions, what operationally are typically running costs – without needing to insert probes all over a running example to estimate it using the provided calculator?

I’d come back from a 7am gym session fairly tired and made the mistake of stepping through the pricing slides without making copious notes. I duly did all that module again and did things properly the next time around – and passed it to complete my certification.

The lego bricks you snap together to design an application infrastructure are simple in principle, loosely connected and what Amazon have built is very impressive. The only thing not provided out of the box is the sort of simple developer bundle of an EC2 instance, some S3 and MySQL based EBD, plus some open source AMIs preconfigured to run WordPress, Joomla, Node.js, LAMP or similar – with a simple weekly automatic backup. That’s what Digital Ocean provide for a virtual machine instance, with specific storage and high Internet Transfer Out limits for a fixed price/month. In the case of the WordPress network on which my customers and this blog runs, that’s a 2-CPU server instance, 40GB of disk space and 4TB/month data traffic for $20/month all in. That sort of simplicity is why many startup developers have done an exit stage left from Rackspace and their ilk, and moved to Digital Ocean in their thousands; it’s predictable and good enough as an experimental sandpit.

The ceiling at AWS is much higher when the application slips into production – which is probably reason enough to put the development work there in the first place.

I have deployed an Amazon Workspace to complete my 12 years of Nutrition Data Analytics work using the Windows-only Tableau Desktop Professional – in an environment where I have no Windows PCs available to me. Just used it on my MacBook Air and on my iPad Mini to good effect. That will cost be just north of £21 ($35) for the month.

I think there’s a lot that can be done to accelerate adoption rates of AWS services in Enterprise IT shops, both in terms of direct engagement and with channels to market properly engaged. My real challenge is getting air time with anyone to show them how – and in the interim, getting some examples ready in case I can make it in to do so.

That said, I recommend the AWS training to anyone. There is some training made available the other side of applying to be a member of the Amazon Partner Network, but there are equally some great technical courses that anyone can take online. See http://aws.amazon.com/training/ for further details.

Help available to keep malicious users away from your good work

Picture of a Stack of Tins of Spam Meat

One thing that still routinely shocks me is the shear quantity of malicious activity that goes on behind the scenes of any web site i’ve put up. When we were building Internet Vulnerability Testing Services at BT, around 7 new exploits or attack vectors were emerging every 24 hours. Fortunately, for those of us who use Open Source software, the protections have usually been inherent in the good design of the code, and most (OpenSSL heartbleed excepted) have had no real impact with good planning. All starting with closing off ports, and restricting access to some key ones from only known fixed IP addresses (that’s the first thing I did when I first provisioned our servers in Digital Ocean Amsterdam – just surprised they don’t give a template for you to work from – fortunately I keep my own default rules to apply immediately).

With WordPress, it’s required an investment in a number of plugins to stem the tide. Basic ones like Comment Control, that  can lock down pages, posts, images and attachments from having comments added to them (by default, spammers paradise). Where you do allow comments, you install the WordPress provided Akismet, which at least classifies 99% of the SPAM attempts and sticks them in the spam folder straight away. For me, I choose to moderate any comment from someone i’ve not approved content from before, and am totally ruthless with any attempt at social engineering; the latter because if they post something successfully with approval a couple of times, their later comment spam with unwanted links get onto the web site immediately until I later notice and take them down. I prefer to never let them get to that stage in the first place.

I’ve been setting up a web site in our network for my daughter in law to allow her to blog abound Mental Health issues for Children, including ADHD, Aspergers and related afflictions. For that, I installed BuddyPress to give her user community a discussion forum, and went to bed knowing I hadn’t even put her domain name up – it was just another set of deep links into my WordPress network at the time.

By the morning, 4 user registrations, 3 of them with spoof addresses. Duly removed, and the ability to register usernames then turned off completely while I fix things. I’m going into install WP-FB-Connect to allow Facebook users to work on the site based on their Facebook login credentials, and to install WangGuard to stop the “Splogger” bots. That is free for us for the volume of usage we expect (and the commercial dimensions of the site – namely non-profit and charitable), and appears to do a great job  sharing data on who and where these attempts come from. Just got to check that turning these on doesn’t throw up a request to login if users touch any of the other sites in the WordPress network we run on our servers, whose user communities don’t need to logon at any time, at all.

Unfortunately, progress was rather slowed down over the weekend by a reviewer from Kenya who published a list of best 10 add-ins to BuddyPress, #1 of which was a Social Network login product that could authenticate with Facebook or Twitter. Lots of “Great Article, thanks” replies. In reality, it didn’t work with BuddyPress at all! Duly posted back to warn others, if indeed he lets that news of his incompetence in that instance back to his readers.

As it is, a lot of WordPress Plugins (there are circa 157 of them to do social site authentication alone) are of variable quality. I tend to judge them by the number of support requests received that have been resolved quickly in the previous few weeks – one nice feature of the plugin listings provided. I also have formal support contracts in with Cyberchimps (for some of their themes) and with WPMU Dev (for some of their excellent Multisite add-ons).

That aside, we now have the network running with all the right tools and things seem to be working reliably. I’ve just added all the page hooks for Google Analytics and Bing Web Tools to feed from, and all is okay at this stage. The only thing i’d like to invest in is something to watch all the various log files on the server and to give me notifications if anything awry is happening (like MySQL claiming an inability to connect to the WordPress database, or Apache spawning multiple instances and running out of memory – something I had in the early days when the Google bot was touching specific web pages, since fixed).

Just a shame that there are still so many malicious link spammers out there; they waste 30 minutes of my day every day just clearing their useless gunk out. But thank god that Google are now penalising these very effectively; long may that continue, and hopefully the realisation of the error of their ways will lead to being a more useful member of the worldwide community going forward.