Why Networking is Critical to Serverless

As readers know, I have been thinking a lot about serverless lately (along with all other forms of technology deployment and management, since it is what I do professionally).

Recently, I came at it from another angle: network latency.

Two weeks ago, I presented at LinuxCon/ConainerCon Berlin on “Networking (Containers) in Ultra-Low-Latency Environments,” slides here.

I won’t go into the details – feel free to look at the slides and data, explore the code repo, reproduce the tests yourself, and contact me for help if you need to apply it to your circumstances – but I do want to highlight one of the most important takeaways.

For the majority of customers and the majority of network designs, the choice and its latency impact simply will not matter. Whether your container or VM talks to its neighbour in 25 μsec or 50 μsec is insufficient to have any impact on your application, unless you really are dealing in ultra-low-latency, like financial applications.

Towards the end, though, I pointed out a trend that could make the differences matter even for regular applications.

With monolithic applications, you have 1 app server talking to 1 database. For a moderately complex app, maybe it is 1 front-end app server with 5 different back-ends comprised of databases and other applications. The total number of communications is 5, so a 25 μsec difference adds up to 125 μsec, or 1/8 of a millisecond. It still doesn’t matter all that much for most.

Containers, however, enable and encourage us to break down those monolithic applications into separate services, or “microservices”. Where the boundaries of those services should be is a significant topic; I recommend reading Adrian Colyer‘s “Morning Paper” on it here.

As applications are decomposed, the previous single monolithic application with a single database, and thus one back-and-forth internal communication, now becomes 10 micro services, each with its own back-end. One communication just became ten, and our simple application’s 25 μsec difference just became 250 μsec, or 1/4 of a millisecond. It still doesn’t matter all that much, but it is moving towards mattering.

Similarly, our complex 6-part application became, say, 25 microservices and backends, leading to 625 μsec of additional delay, or almost 2/3 of a millisecond. Again, it doesn’t matter all that much, but it is getting ever closer.

However, with serverless, the unit of deployment no longer is a service, or even a microservice. Rather, it is a function. Even the simplest of applications have a lot of functions. Our simple application that went from 1 app and 1 database to 10 microservices actually has a not-unreasonable 250 functions in it; some of the open-source libraries I have written single-handedly have that many! If each of these is run independently in a FaaS/serverless environment, we now have 250 items communicating with others, a minimum of 250*25 μsec = 6,250 μsec or 6.25 milliseconds delay.

For our simple application, with “just” those 250 functions, the difference of a few tens of microseconds, determined by your inter-function (inter-container, under the covers) networking choice, makes a big difference.

For our complex application, with 6-parts, each of which may have at least those 250 functions, we now have 250*6*25 μsec = 37,500 μsec or 37.5 milliseconds of additional delay. That is real time.

Of course, a serverless provider, like Amazon Lambda or Google Cloud Functions, is expected to invest the engineering effort to optimize the network so that the functions don’t simply run “anywhere” and connect “however”, creating unacceptable latency. To some degree, this is what we pay them for, and a barrier to entry for additional competitors. Packaging up a container image is easy; optimizing it to run with many others in a busy network on busy servers with minimal impact is hard.

As I have written often, PaaS and DevOps and by extension serverless will eliminate many system administration jobs, but it will create fewer but far more critical and valuable systems engineering jobs. The best sysadmins will go on to much more lucrative and, frankly, enjoyable work.

Many others will run serverless environments on their own, using OpenWhisk or other open-source products. Unlike Cloud Foundry or Deis, these will require serious design effort to ensure that applications do not end up with a painful mix of easy-to-manage, performant each part on its own, and impossibly slow in toto.

Hopefully, Amazon and Google are up to the task, as well as those deploying on their own. I hope they, and you, are, but I always am happy to offer my services to assist.

 

Posted in business, cloud, containers, technology | Tagged , , , , , , , | Comments Off on Why Networking is Critical to Serverless

Can rkt+kubernetes provide a real alternative to Docker?

Last week in LinuxCon/ContainerCon Berlin, I attended a presentation by Luca Bruno of CoreOS, where he described how kubernetes, the most popular container orchestration and scheduling service, and rkt integrate. As part of the presentation, Luca delved into the rkt architecture.

For those unaware – there are many, which is a major part of the problem – rkt (pronounced “rocket”, as in this) is CoreOS’s container management implementation. Nowadays, almost everyone who thinks containers, thinks “Docker”. Even Joyent’s Triton, while it uses SmartOS (a variant of Illumos, derived in turn from Solaris), has adopted Docker’s image format and API. You run containers on Triton by calling “docker run”, just pointing it at Triton URLs, rather than docker daemons.

I was impressed with how far CoreOS had come. I was convinced late last year that they had quietly abandoned the rkt effort in the face of the Docker steamroller. Clearly, they quietly plowed ahead, making significant advances.

As I was listening to Luca’s enjoyable presentation, the following thoughts came to mind:

  1. Docker Inc., in its search for revenue via customer capture, has expanded into the terrain of its ecosystems partners, including InfraKit (watch out Chef/Ansible/etc.) and Swarm (kubernetes). Those partners must view it as a threat. One person called it, “Docker’s IE moment,” referring to Microsoft’s attack on its software provider ecosystem when it integrated IE into Windows.
  2. Docker’s API, as good as it is, is very rarely used. With the exception of a developer running an instance manually, usually locally but sometimes on a cloud server, almost no one uses the docker remote API for real management. Almost all of the orchestration and scheduling systems use local agents: kubernetes runs kubelet on each node, Rancher’s Cattle runs rancher agent, etc.
  3. Docker is really easy to use locally, whether starting up a single container or using compose to build an app of multiple parts. Compose doesn’t work well for distributed production apps, but that is what kubernetes (and Swarm and Cattle) are there for.

As these went through my mind, I began to wonder if the backers of rkt+kubernetes intend to use rkt+kubernetes as a head-on alternative to Docker.

So… what would it take for rkt+kubernetes (or, as Luca called it, “rktnetes” pronounced “rocketnetes”) to present a viable alternative to Docker?

Ease Of Use

As described above, Docker is incredibly easy for developers to use on all three platforms – Linux, Mac and Windows – especially with the latest Docker for Mac/Windows releases. rkt, on the other hand, requires launching a VM using Vagrant, which means more work and installation, which slows the process down, which…. (you get the picture). For rkt to be a serious alternative, it must be as easy to use as Docker for developers.

Sure, in theory, it is possible to use Docker in development and rkt in production, but that is unlikely unless rkt provides some 10x advantage in production. Most companies prefer to keep things simple, and “one tool for running containers everywhere” is, well, simple. Even a company willing to make the change recognizes that the run-time parameters are different (even if rkt supports Docker image format) and the “works for me” problem can return, or at least be perceived to do so (which is as important as the reality).

Docker made headway because it won over developers, then operations and IT types.

At the same time, kubernetes is not as easy to use and has a significant learning curve. To some degree, its power and flexibility once in use make it harder to get to usage. That may (or may not) be fine for an orchestrated complex production deployment; it will not fly on a developer’s laptop or DigitalOcean droplet.

To their credit, the kubernetes team has released minikube, intended to make deployments easier. We will see how well it does. In the meantime, developers by the thousands learn how to do “docker run” and “docker-compose run” every day.

So:

  1. Starting and running containers must be made much easier.
  2. Starting and running container compositions must be made much easier.

Killer Capabilities

However, even if rkt+kubernetes manage to equal docker in ease-of-use and feature set, they still will just be playing catch-up, which is not a good game to play (ask AMD). In order to win over developers and systems engineers, rkt+kubernetes must be as easy to use as Docker and it must offer some unique capability that is not offered by Docker. Preferably, it cannot be offered by Docker without great difficulty due to some inherent market, corporate or technology architecture structure.

It needs to be something that is  inherently doable, even natural, due to rkt’s design, yet difficult due to Docker’s design. The goal would be to make Docker play challenging catch-up.

What would such a feature or capability set be? I have some early ideas, but that is the job of CoreOS’s (or Google’s or CNCF’s) product managers to figure out. That is what they pay them for.

Why Do I Want It?

I love Docker and CoreOS. I use their products on a regular basis. They have made my life, and those of many clients and colleagues, immensely easier. I have met people at both companies and respect their professional skills.

Even more than any one product or company, however, I do love competition. It benefits customers primarily, but even the competitors themselves are driven to better services and products, and hence profits.

I want rkt+kubernetes (or some other combination) to provide a serious, viable alternative to Docker to benefit me, my clients (current and future), my colleagues, all technology firms and IT departments, and especially to benefit Docker, CoreOS and Kubernetes themselves.

 

Posted in business, cloud, containers, product, technology | Tagged , , , , | Comments Off on Can rkt+kubernetes provide a real alternative to Docker?

DevOps in the 1990s

Last week, I had the pleasure of attending LinuxCon/ContainerCon Europe 2016 in Berlin. Besides visiting a fascinating historical capital – there is great irony, and victory, in seeing “Ben-Gurion-Strasse” – or “Ben Gurion Street” – named after the founding Prime Minister of Israel in the erstwhile capital of the Third Reich. And while I had many a hesitation about visiting, the amount of awareness, monuments and memorials to the activities of the regime in the 1930s and 1940s was impressive.

Being a technologist with a love of operations and improving them, I had many conversations with very smart people people about operations and especially DevOps.

While engaged in one of these conversations, I suddenly had a realization: we were doing DevOps in 1995! 

Morgan Stanley IT of the 1990s, for which I have many fond memories, and to which I am grateful (especially Rochelle who convinced me I would be insane to accept the other offers on the table… she was right), was a very cutting-edge, experimental place. We took risks, failed at some of them, and succeeded at others.

Our leaders rewarded risk-taking. Fail reasonably, get rewarded; succeed, get rewarded more. Do something really stupid, of course, let alone the same stupid thing twice, and you paid for it.

How did we do DevOps?

There are lots of definitions of DevOps. One of the premiere practitioners, Jez Humble, is quoted as saying it is:

a cross-disciplinary community of practice dedicated to the study of building, evolving and operating rapidly-changing resilient systems at scale.

Nowadays, that often involves several key elements, including:

  • Developers taking ownership of the entire application lifecycle, right down to pushing the button to deploy to production;
  • Automation to enable developers to deploy to production safely;
  • Automated testing systems – continuous integration pipeline – to guarantee full testing before something goes out to production.

The key element here, though, is the culture. It is one where everyone, not just the operations teams, is responsible for live systems. Developers, the people who actually build the software systems, are the ones to make them live, and they are the ones to be alerted, whether at 2pm or 2am, when something goes wrong.

While we did not have all of the elements, we had two very important ones.

  1. Every developer was responsible for his or her application from building it right through to pushing to production and making it live.
  2. Automation to enable that process.

Twenty years ago, we built a system that told developers, “you own it, you push it, you fix it.”

I wish I could tell you it was my brainchild, or that I was the chief architect. Unfortunately, I was neither. I just played my part. The real architect was Phil Moore.

Here was how it worked.

We had a single, global filesystem namespace. Anywhere you went in the world, if you were connected to the Morgan Stanley network, “/root/whatever” pointed to the exact same thing. And if you went to “/root/dist/something“, well, that something was a local replica, right here in your city (or even building) of a read-only application.

What was a developer’s workflow?

Developers built apps in “/root/dev/something“, where something was unique to their app. They built it in a version directory, e.g. “/root/dev/something/1.2.3/”.

When a developer was convinced it was ready, they’d run a command called “vms” (no relation to DEC VMS), that pushed that version 1.2.3 of something out… everywhere. Every single location in the world now had a local, replicated identical copy of something.

Here was the best part. When ready, the developer could run a different “vms” command to make version 1.2.3 live. From that moment (well, it took a few moments to propagate worldwide), anyone launching something would get that version. Not operations, not some release manager, but the developer who owned it.

Did it break stuff? All the time. I lost track of the number of things I broke myself (often saved by others). But we also delivered stuff at a pace unheard of at the time… and it showed.

We did DevOps at Morgan Stanley 15-20 years before it became popular. And I only realized that what we were doing was DevOps, way ahead of its time… last week!

 

Posted in business, cloud, technology | Tagged , , | Comments Off on DevOps in the 1990s

Why Aren’t Desktops Managed Like Containers?

Containers, the management and packaging technology for applications, are useful for many reasons:

  • Packaging is simpler and self-contained
  • Underlying operating system distribution becomes irrelevant
  • Performance, therefore density, and therefore cost, is much better when working without a hypervisor layer

To my mind, though, one of the most important elements in any technology is how it affects culture and incentives. For example, MVC development frameworks are helpful for many reasons, but the most important is that it encourages (and sometimes forces) a cleaner way of thinking about and building software.

The particular benefit of container packaging, and particularly Docker images and ACI, is the encouragement to think cleanly about immutable applications and mutable data as distinct entities.

When I run an application as a container, I need to think about both of those elements:

  1. My application binary is identical to every other instance of that application running everywhere else in my company and, if a public image, the entire world.
  2. My data is kept outside the container and made available to/from the container.

Practically, this means that I can take any computer instance at all (that supports the container format), tell it to run my given images as containers with a given set of data, and I am precisely where I started before.

Many systems have arisen over the years to simplify this replicability, from cloud-config to chef/puppet to docker-compose/fig.

This near-instant replicability is the basis of the “cattle not pets” paradigm of systems. If I can reconstruct any system with a high degree of reliability and speed, I don’t have to worry about each system; they become disposable.

This begs the question: why are desktops still pets?

This past weekend I upgraded my (very old) MacBook Air from El Capitan (OS X 10.11) to Sierra (macOS 10.12).

Since it is pretty old, and has accumulated a lot of “cruft” over the years, I decided to do a clean install:

  1. Back up all of my data. Actually, back it up twice: once to an online backup service for critical data, and the entire system via Time Machine to a local drive.
  2. Wipe my drive and do a clean Sierra install.
  3. Restore my data from my backup drive.

Unfortunately, I hit a few snags on the way with a poorly-unmounted 1TB USB drive, which led to a very long period of macOS running fsck and me praying the drive was recoverable. (Why they don’t use ZFS to eliminate these problems is a topic for another day…)

This is the way we have been doing desktop migrations and/or reinstalls for decades; very little has changed. Sure, we now have Time Machine to ease backups, online backup services like BackBlaze/Carbonite/CrashPlan, and Migration Assistant to ease the restore. Even so, it shouldn’t be that hard.

Here is what desktop migrations should look like:

  • All applications are immutable images, just like container images.
  • All data is separate from the application and available locally and encrypted online.
  • The entire set is stored as a single configuration, like a compose file.

Restoring my entire system requires just one piece of information: the configuration file.

Upon start, I feed my configuration file to the operating system, which does the following:

  1. Install the standard images of the correct versions of my applications
  2. Replicate my data from the cloud
  3. Finish

To some degree, this is how iOS works with iCloud backup. Of course, iOS has a much more rigid (and enforced) rules as to what an application can and cannot do. This allows it to constrain how apps are packaged; where data resides; and how many pieces of information are required to replicate them (exactly two: app ID+version; and data location).

However, achieving this on a desktop doesn’t require rigid constraints on a general purpose operating system. As long as three simple rules are met, applications will comply:

  1. Make is easier to follow the rules than not.
  2. Make all necessary services that an app could require available.
  3. Make it obvious to customers see how easy it is when an app “behaves”.

It shouldn’t be too hard to get general purpose operating systems like Mac, Linux and Windows there as well. It is high time desktop operating systems caught up to the advancements elsewhere in computing.

The 1990s are calling; they want their operating system management back.

Posted in business, cloud, containers, product, technology | Tagged , , , , , , , , | Comments Off on Why Aren’t Desktops Managed Like Containers?

Is the Real Uber Threat to Hertz?

It has become commonplace to forecast that Uber, Lyft and other ridesharing services are a strategic threat to car manufacturers. After all, if “everyone” uses Uber, why would they bother owning cars?

The problem with that argument is that it assumes that “everyone” lives where Uber and Lyft are headquartered: in a dense urban area with very little parking, going to other places nearby where there is lots of traffic and very little parking. Not everyone lives in San Francisco, New York, Paris and the City of London.

For people who spend the overwhelming majority of their days in such locations, car ownership always was a challenge. In the few years that I was young and single or newly married and working in Manhattan, I didn’t own a car either. I took the subway daily. If I needed a car for a few days, I just rented one.

However, many people live lives that do not quite fit that mold. They live in suburban or rural locations with several miles between places they go every day. To a city dweller, 3 miles is a distance. A suburban dweller can drive 20 miles on a normal day; a rural resident will do double that.

For those people, car ownership never was an expensive pain that supplements public transit. It was a necessary mode of transport, something that makes life livable. These people are highly unlikely to give up car ownership.

If the few urban who have not yet given up their cars will give them up, and the much larger base outside of urban cores will not, to whom is Uber a serious threat?

First of all, Uber poses a major threat to the taxi monopolies, the medallion owners. These groups own the monopolized channels by which a potential taxi driver could earn some income. With Uber and Lyft, these people can use regular (not specialized, and therefore expensive) cars they can afford to own, and do not have to pay exorbitant fees to license a medallion. This is a key reason why ridesharing often is half the price of a “normal” taxi, and why the medallion owners and the commissions and politicians in their pockets fight Uber and Lyft tooth and nail.

To whom else are they a threat?

Hertz. Avis. Enterprise. Car rental agencies.

The majority of the revenue for these companies comes from business and leisure travelers who arrive in SFO or EWR or ORD and rent a car. Public transport is great if you are becoming a temporary version of our urban visitor, staying in a hotel or Airbnb within short distance of all of your meetings.

But if you are covering a broader area, you need to get a car. Everyone who visits Silicon Valley, Northern Virginia, Los Angeles, New Jersey, Westchester County rents a car to get around.

After all:

  1. Taxis are too expensive to cover those distances
  2. Taxis often are unavailable at the desired hours or locations
  3. You need to know the numbers for the local taxi service, and sometimes have cash to pay them.

Ridesharing eliminates all of these issues.

  1. At half the cost of a “normal” taxi, ridesharing can be very competitive to car rentals
  2. With a very large network, often composed of local residents, it is far easier to hail an Uber than a taxi
  3. With a single app connected to your card, it is as easy to get and pay for a taxi in Madison as Manhattan, Prague as Palo Alto

This past week, I visited Silicon Valley, and for the first time in decades did not rent a car. I simply used Uber to get around, and life was better. To boot, it was cheaper. It wasn’t significantly cheaper, but the same price for more convenience wins out every time.

We must be careful not to extrapolate from a single case to broad market impact, but the market is made up of millions of single cases. If we can survive a visit to Silicon Valley without going to Hertz, who else is doing the same thing in thousands of places around the world every day?

 

Posted in business, pricing, product, technology | Tagged , , , , | Comments Off on Is the Real Uber Threat to Hertz?

Amazon Pricing Should Be Customer-Centric

Today, I had a very interesting discussion with Rich Miller, a consulting colleague who has been around the block more than a few times.

One of the interesting points he raised is that Amazon’s AWS pricing doesn’t quite work for enterprises.

Let’s explore how it is a problem and why it is so.

At first blush, Amazon’s pricing is intuitive: use an hour of an m4.xlarge, pay $0.239; use 2 hours, pay $0.478; use a whole month’s worth, pay $0.239*720 = $172.08.

Of course, if I know I am going to use that m4.xlarge (or a lot of them) for a whole year, I should be able to get a discount for committing. Indeed, Amazon offers that type of commitment pricing, with varying discounts that depend on if you commit for one year or three years.

This type of pricing seems to make sense. Amazon knows it will get paid to provide 8,760 hours of that m4.xlarge for the year. In exchange for that knowledge and its ability to plan, it can afford to give the committer (you) a decent discount.

What’s the problem with it?

If you are a small startup or business, not much. You figure out that you need 2x m4.large, 5x t2.small, several others, and just commit.

But enterprises don’t work that way. Enterprises are orders of magnitude larger than those small companies. They use hundreds, if not thousands, of each instance type.

To provide for these “elephant” use cases, Amazon has a sales team that is authorized to negotiate appropriately-priced deals, much larger discounts on the listed prices, or “rack rates”.

However, the pricing remains built around the instance types you order.

The reason is that Amazon sets its prices to suit the vendor, rather than to fit the customer.

An enterprise is not a small business with an application that just happens to be 3 orders of magnitude larger. An enterprise is a diverse conglomerations of multiple divisions and their many applications, some of which are quite large, others very small, and everything in between.

Enterprises are not a large business; they are a dynamic ecosystem of businesses.

As a dynamic ecosystem, their needs change over time, sometimes from day-to-day, but certainly within the timeframes of AWS commitments. It is nearly impossible for an enterprise to know upfront that it will need 100,000 hours of m4.xlarge and 2,500,000 hours of t2.small.

What they do know is that they will spend roughly the equivalent of 100,000 * m4.xlarge + 2,500,000 * t2.small over the next year. However, from the enterprise’s perspective, as a customer, they want to buy those as units of committed general usage, not committed specific usage.

What would buyer pricing look like? Surprisingly, it is much simpler: For $100,000 in annual committed total spend, get a 10% discount; for $1,000,000, get a 25% discount; etc.

The actual thresholds and discount ratios need not be those listed above, but the principles hold. This has several distinct advantages:

  • Like all transparent pricing, it eliminates a lot of the effort, or friction, of signing up large customers
  • It makes it possible for Amazon to eliminate a lot of the sales labour effort
  • Most importantly, it makes the pricing model fit the customers

So why doesn’t Amazon do it? I suspect several reasons:

  1. It involves bearing the usage risk
  2. It requires a lot of effort to migrate that risk
  3. It requires thinking as a customer, rather than as a vendor

Mindsets can be changed over time. Risk of instance supply-demand mismatch is an issue, but the reality is that customers are absorbing this risk every single day. It is costing them time and money to figure out how much of each to buy. Make it easy for them, and they will buy more.

Of course, every service for a customer – of which removal of risk and effort is one – is a business opportunity. Amazon can – and should – build it into the pricing. Enterprises happily will pay a slightly lower discount in exchange for the flexibility such a model provides.

Summary

Your pricing should match your customers’ needs, not your supply structure. If providing that kind of pricing model means you absorb risk and effort from our customers, it is a revenue opportunity; build it into your pricing.

Does your pricing reflect your customers’ needs? Ask us to help.

Posted in business, cloud, pricing, product, technology | Tagged , , , , | Comments Off on Amazon Pricing Should Be Customer-Centric

Architect Your Product Before It Holds You Back

Architecture determines capabilities.

This is not new. Anyone who has planned and architected a new product, or has tried to retrofit capabilities for which a platform has not been architected, knows it first-hand.

Yet, time and again, I come across products that have not been planned, and therefore architected, around reasonably expected capabilities.

Sometimes I see these as a user.

Last week, a client wanted to give me access to their Dropox Team account, so we could share information. They did the apparently smart thing: they sent me an offer to join their Dropbox Team.

That is where the problem began. If I join their team, my entire account is part of their team. Sure, I get all of their account permissions, and they pay if I use too much storage, but my entire account is open to them. I have some personal materiel there, but far more importantly, I have confidential client information there. Many clients have used Dropbox to share plans, materiel, even code with me.

This client should not have access to that data. It might even violate my confidentiality agreements to join their Dropbox Team.

Obviously, I am not the only person with this problem. What does Dropbox recommend?

Set up a personal and a business dropbox account and link them.

Of course, you only can do that for two accounts. They are asking me to limit how many to whom I can connect. Making it worse, each device on which I use Dropbox is connected to just one account.

The workflow I want is simple: let me remain me, and let a subset of me (a folder) become part of this client’s Team space. Dropbox does noted cannot provide this feature.

Essentially, Dropbox is designed and architected around the idea that a person’s account belongs to a team, or company, in its entirety. The idea that multiple people collaborate independently is foreign to how they first built Dropbox users, or at least teams. Put in other terms, Dropbox users and teams are built around 1970s or 1980s work environments.

  • You have a job; you work there, and everything you do there belongs to the company.
  • You have a home; you live there, and everything you do there belongs to you.

Based on the feedback and company responses in the forums, it seems clear that they started this way, and now are constrained from doing anything different without significant re-architecture.

Other times I see this issue from within companies with whom I have worked. It always is unfortunate to see CEOs frustrated that a relatively simple change is difficult to nearly impossible, requires multiple “hacks”, and challenges the stability and scalability of the product. Product says, “we need this feature/capability/flexibility to grow the business”; engineering says, “implementing this feature/capability/flexibility will risk the very service we need to run the current business, let alone grow it.”

Part of the root cause is the severe constraints inherent in the early days of a company. These manifest themselves in two ways:

  1. Plans: Some companies know that they need to plan for reasonably expected use cases, yet do not have the time to do so. As the saying goes, “Having scale (or usage, or diverse market, or ….) problems is a good thing to have. It means we are making it. Why waste energy beforehand?”
  2. Skills: Many companies, on the other hand, simply do not have the product management capabilities and architectural chops to think these issues through. They plow ahead, gain some success, and become frustrated when bolting on additional capabilities becomes expensive and even dangerous to product growth and stability. (When the founding team for a tech startup has no engineer, watch out.)

Even if you have the skills, and you think it through, it is a real challenge. How do you build for the future, without sacrificing too much precious early time, blood and treasure, to those potential future plans?

The answer is specific to each and every case. But a simple rule of thumb is to plan sufficient flexibility that you do not constrain yourself in key areas, while implementing only those parts that you need for the immediate future.

A simple example is complex access control. Your future targeted enterprise customers will insist on complex levels of role-based control and auditing; your early customers, the ones who get you enough traction to go after the whales, just want simple access.

  • If you build a simple binary yes/no for access or sharing, you may have to rip out the entire thing and replace it with something that can do enterprise-level controls and audits while replicating on top of it the exact same interface and methods you had before, all without causing disruption. Worse, other parts of the system will start to assume this design, making future changes a massive undertaking.
  • If you build a full-blown access-control system, with audits, management interfaces, APIs, you may never get to early stage customers, let alone enterprises.

I dealt with this precise issue with a company. My recommendation? Build in enough flexibility for the future, but only implement the parts you need now.

  1. Access:
    1. Build a role-based access control system now using off-the-shelf (or off-the-github) components.
    2. Implement precisely one role right now, “owner”. Provide no management interface, no cross-account controls, nothing.
    3. In the future, add new interfaces and new roles, but rip out nothing.
  2. Audits:
    1. Log every activity somewhere.
    2. Do absolutely nothing with these logs now. In the future, we can add review capabilities.

Unsurprisingly, it worked. It took, at most, a few weeks longer to get the first version, but we had bought ourselves a sufficiently flexible platform to do what we needed in the future.

Far more importantly, we had forced every other part of the system to be aware of access and auditing as they were built, making the future transition several orders of magnitude easier.

Summary

It is critically important to plan, design and architect your systems to be able to support reasonably expected future capabilities, while only implementing the minimum you need at each stage. Minimum Viable Product must be viable, not just minimum.

If you have reached the point where your early designs and their implementations struggle to deliver the capabilities you need now, find where you need to rearchitect, but get those areas you can with minimum disruption… and do it in a way that you plan for the next stages as well.

Struggling to plan for that future? Challenged by your current platform’s ability to deliver your requirements? Ask us to help.

 

Posted in business, product, security, technology | Tagged , , , , , | Comments Off on Architect Your Product Before It Holds You Back

Your Car Interior Should Be Like A Network

A lot of ink has been spilled (if that term still can be used in the digital age), on the coming driverless “revolution.”

Yet a much simpler “evolution” is long overdue for automative technology: the inside.

Anyone who has replaced any component on a car – dashboard, door panel, side-view mirror, radio, engine part, or any component at all – is familiar with the swamp of wiring that snakes its way behind every panel on the car.

Every single component has what is known as its “harness”, automative lingo for its wiring. The wiring, however, looks nothing like the simplicity that connects your home router to the cable modem or laptop to its mouse.

The following picture is the “simple” harness that I once used to connect an after-market radio to a Mitsubishi:

 

Every part in the car has its own harness: the power window, the powered mirror, the trunk light, you name it.

Look under your dashboard, behind the steering wheel and above the driver’s pedals, and you will see a forest of wires, all tightly tied together and shoved into whatever nook and cranny can be found.

If that were the entire story, it would be bad enough. Unfortunately, it is just the beginning.

Each car has its own unique cabling system, its own “harness”. Even though there are (sort of) standards in the ISO connectors, these apply primarily to audio systems. In any case, they are adopted very rarely by automobile manufacturers.

Adding insult to injury, the manufacturers change the harnesses between model years for the same car, and even between models of car for the same year.

Finally, each harness has one cable per type of data or power. Don’t try to calculate the permutations of numbers of potential harnesses; it is a terrible waste of good math.

The really sad part is that these thousands of wires and dozens of harnesses, carry just two things:

  1. Data
  2. Power

Sure, each component requires a little different data, and different levels of power, but at heart, these are just wires carrying data and power.

To understand how absurd the current automotive reality is, imagine translating it to the computer industry. We will enter a world where:

  1. Every component you connect – network, mouse, keyboard, monitor, scanner, DVD drive, hard drive – has its own connector with 10-15 different cables
  2. Each component also has its own, unique connector type
  3. Each computer manufacturer has its own connector: Lenovo uses one type, Apple another, Dell another, ASUS another.
  4. Each manufacturer uses different connectors for different components.
  5. Each manufacturer changes its connectors for that component every model year or two.

I highly doubt the computer business ever would have gotten very far!

Yet, this is precisely what occurs in the automotive components business.

In the technology industry, we have had two types of standardized cables that carry data and power for decades.

  • USB: That ubiquitous USB port on your laptop, now heading into USB-C, can carry both data and power in a single simple cable, with a simple, standard plug format. With each generation, the amount of power it can carry and the bandwidth of data has increased. The already-aging USB 2.0 standard, released as far back ago as 2000, can carry 480 Mbps. No data anywhere in a car, especially to peripherals like audio and windows, requires even a tiny fraction of that.
  • Ethernet: The Ethernet cable that links your modem to your router or your office desktop to the wall, known by its “Category” designation (you probably are using Cat-6), carries data at tremendous bandwidth and speeds, far in excess of anything your car components carry. It also has had the ability for years to carry power to end devices. Gigabit Ethernet, which is a little faster than twice the aforementioned USB 2, was released by IEEE in 1998.

 

The obvious question, then, is does it matter? Does anyone really care if the hidden cables are unwieldy, bulky, hard to figure out, expensive, hard to connect?

Definitely.

The current situation has terrible cost impacts. It increases all of the following:

  • Cost of each component;
  • Manufacturing cost of the car, due both to higher component costs and higher labour costs;
  • Amount of inventory write-down for the manufacturer and component supplier;
  • Amount of inventory write-down by spare parts suppliers;
  • Cost of maintaining the vehicle due to more time to do work (this hits you, car owner);
  • Cost of maintaining due to special skills to work with each vehicle type (you, again);
  • Cost of any changes or upgrades (and again, you).

Now imagine a different world.

  • A standard cable, similar to Ethernet or USB, but with the physical specifications to handle an automobile’s environment, connected everything.
  • A single bus (or two for redundancy) running from front of car to back.
  • A single cable from the bus to each door, with a hub to each component in that door.
  • A single cable from the bus to the trunk/hood.
  • A single cable from the bus to the stereo.
  • A single cable from the bus to the dashboard.

A power window, for example, should require a single cable that carries power and a coded signal to go up or down. An audio system should have just power and a few wires for serial data of any kind; instead, it has 10 or 15 cables!

The technology hardware industry has had standardized cables for decades (it is called a Universal Serial Bus, or USB, for a reason). It has standard connectors, standard pinout, standard sizing, and carries data and power far in excess of just about every automotive application outside of the brakes and engine.

While the big visionaries look to bring us cars that drive themselves – the name “automobile” means “self-moving” – there is much that can be done immediately to make the existing cars, and the future ones, better, faster and cheaper to build and maintain.

Posted in business, product, technology | Tagged , , , , | Comments Off on Your Car Interior Should Be Like A Network

The Problem with Serverless Is Packaging

Serverless. Framework-as-a-Service. Function-as-a-Service. Lambda. Compute Functions.

Whatever you call it, serverless is, to some degree, a natural evolution of application management.

  1. In the 90s, we had our own server rooms, managed our own servers and power and cooling and security, and deployed our software to them.
  2. In the 2000s, we used colocation providers like Equinix (many still do) to deploy our servers in our own cages or, at best, managed server providers like Rackspace.
  3. In the early 2010s, we started using infrastructure-as-a-service (IaaS) like Amazon EC2.

Over time, we have evolved to worrying less and less about the underlying infrastructure on which our software code runs, focusing more and more on the code itself. We have moved our focus further up the stack.

That was the very basis of Platform-as-a-Service (PaaS) providers, like Heroku (now part of Salesforce.com). Instead of running our code on a virtual server instance that we manage, we deploy the code unit, or “slug”, and they take care of that part as well.

However, even with a PaaS, we still have to think in server-like terms:

  1. We need to plan how many copies of our code, i.e. slugs, we need to run.
  2. We are billed by the number of instances running.

Serverless, typified by Amazon’s Lambda, attempts to change that calculus.

Once we get past worrying about servers entirely, we can focus on duplicate effort inside our applications. Rather than handling setting up the application, start up, connectivity, listening for request and routing them to the correct handler function, why not have an underlying service perform all of that? All we need to do is:

  1. Create the handler functions
  2. Declare which input event triggers which handler function

Just about any server-side app – and most client-side apps – are written following this paradigm anyways. However, we do all of that work in whatever our chosen framework: express, Rails, whatever. Serverless offers to handle all of that duplicate work as well.

The intended key benefits of serverless are threefold:

  1. Effort: Why waste time doing work that everyone else is doing anyways? Write your handlers, declare your routes, let it run.
  2. Financial: Why pay for unused server capacity? Get billed per second or even millisecond of code running.
  3. Cultural: Stop thinking about your application as a single unit. Instead, think of it as individual functions, each of which has a cost and a benefit.

However, there is a problem with serverless, and it is more fundamental than its name.

I believe that the key reason for the rapid and widespread adoption of Docker is that it solved major packaging headaches. Even the best packaging systems pre-Docker relied on the volatile and unpredictable state of the underlying host.

Docker abstracted all of that away, by putting required dependencies within the deployment artifact while simultaneously enabling the app to ignore (mostly) the state of the underlying host.

Serverless computing, including Lambda, makes packaging harder not easier. Sure, you don’t need to worry too much about what is on the server. Conflicts are avoided (using containers under the covers), while dependencies are declared and guaranteed. In that respect, it is similar to container images.

But your application isn’t made up of one handler function in isolation. It is made up of the totality of all of the functions. In containers – “serverfull”? – I can package my entire application up together. This makes moving it, deploying it and testing it easy and predictable.

In serverless, each function is a standalone unit, and the wiring up of events, like incoming http requests, to handler functions is managed by an API or UI. Lambda makes it very easy to focus on the purpose, value and cost of each function. But lambda makes it very difficult to reason about, deploy, test and manage the app in its entirety.

Serverless’s problem isn’t nomenclature; serverless’s problem is packaging.

From packaging flow all of the issues of deployment, management, testing, reasoning.

Many companies are writing small and large applications on Lambda or Compute Functions or OpenWhisk, many successfully. I have worked with some, transitioned apps to Lambda, and love the benefits, the financial and cultural ones above all.

But for many others, until the packaging becomes as simple to manage, deploy and reason about as self-contained apps as repositories for PaaS or Docker images, the costs will outweigh the benefits.

In that respect, I believe there is a space for a bold entrepreneur to “DigitalOcean” Lambda. DigitalOcean (DO) took on AWS by providing the same service but being incredibly simple to use. For large corporate entities, AWS remains the primary provider. But for companies looking for simple-to-use, simple-to-manage, great performance, DO is the superior IaaS offering.

If someone took the DO approach to IaaS and applied it to serverless – make it easy to use, easy to reason about, easy to manage – they could grab a significant chunk of the serverless market and likely drive it to the next level.

 

Posted in business, cloud, containers, pricing, product, technology | Tagged , , , , , , | Comments Off on The Problem with Serverless Is Packaging

Pilots In Habitats: Basic Unit of Application Deployment

What is the basic unit of application deployment?

Two related trends have changed the answer to this question:

  • DevOps
  • Containers

For many years, the tasks between engineer and operator were cleanly, if painfully, split:

  1. Engineer builds and delivers a package of files to deploy and run
  2. Operator deploys and runs those files in a production operating environment

In the early years, the package of files consisted of a directory with a ream of paper and instructions. Over time that improved to zip files, then proper packaging and installation tools like rpm.

Most recently, with the simplicity Docker (and others such as CoreOS’ rkt) brought to container packaging, the preferred unit of deployment has become a container image.

The goal of each step in this evolution has been to simplify two parts:

  1. Deployment: how easy is it to perform the one-time per release process of deploying it?
  2. Management: how easy is it to perform the ongoing process of resolving issues?

Container images attempt to simplify the issues further by including all of the dependencies in a single runtime file. Whereas “file copy” included instructions such as, “copy these files to the following directories on the following operating systems with the following prerequisites”, and rpms attempted to automate some of that, container images include all of the server dependencies in the right locations; just run it.

However, as we come closer to resolving lower level dependencies via container images, we have become more acutely aware that applications are more than just the process running on one single host with lower-level dependencies. They also have parallel and upstream dependencies: other processes; databases; middleware services; etc.

People often wonder why there were no reported cases of cancer one hundred years ago. “It must be our lifestyle,” or “it is pollution and our environment.” But the answer is simple: a century ago, life expectancy in the United States was  47, while the median age at cancer diagnosis is 67. Quite simply, few people lived long enough to get cancer! Once life expectancy and health improved, other illnesses had their opportunity.

Similarly, the lower-level issues of per-instance app deployment were so thorny, that higher-level cross-instance deployment coordination simply did not rise to the top of the stack (despite some attention). Now that we are solving those, higher level issues are becoming of concern.

Perhaps, then, the correct question is: given clean packaging of an app instance, what is the proper unit of complete app deployment?

I have been dealing with this in general and at some clients in working on clean, complete and self-managed deployments, as well as exploring the newer tools available to help, specifically the ContainerPilot work of Joyent and the Habitat work of Chef.

On a rather long flight last week, I listened to a podcast interview with Tim Gross of ContainerPilot, and Adam Jacob of Habitat.

In the interview, both Tim and Adam recognize similarities in each other’s issues with packaging, deployment and management, and similar solutions.

The primary argument for these solutions is, to my mind, one that requires some clarification.

The primary purpose of these tools is the one we described earlier: having solved the problem of reliable distribution, deployment and maintenance of one instance of an application, we now approach how to solve the distribution, deployment and maintenance of an entire application.

This is something that we could not do before, at least not simply.

Let’s take a simple application, a node app with a Web front-end on a static Web server and a MySQL database on the backend.

In the very old days (when, of course, neither node nor MySQL existed, but we will ignore that fact), the deployment would be as follows:

  1. Engineer packages up the static Web pages as a zip or tar file.
  2. Engineer packages up node application as zip or tar file.
  3. Engineer delivers two packages with instructions:
    1. Expand node app package on server A into the following directory with the following prerequisites
    2. Expand Web files on Web server B into the following directory with the following prerequisites
    3. Launch node app using the following command
    4. Serve up the Web files with the following configurations
    5. Configure the database to have the following
    6. Configure the node app to access the database at the following settings

Fortunately, we have come a long way since then, through many iterations. Much of the configuration, unpacking, deployment, prerequisites have been simplified dramatically. It now looks like this:

  1. Engineer packages up static Web files along with Web server in a container image
  2. Engineer packages up node application in a container image
  3. Engineer delivers two images with instructions
    1. Run node app with the following command line options and environment variables, including information to access the database
    2. Run Web server app with the following command line options and environment variables
    3. Configure the database to have the following

The number of steps is cut in half, and the complexity, and therefore opportunities for error, by much more.

Tim and Adam look at the above and say, “this isn’t enough!” The basic unit of deployment still shouldn’t be individual packages and instructions, however simplified. It should be a single deployable unit.

Entire applications should be single deployable units.

They are looking for a world that looks as follows:

  1. Engineer packages up everything – static Web files along with Web server, node application, even database, and every other reasonable upstream and downstream dependency – in a series of self-described images.
  2. Engineers delivers it with instruction: run this one command.

(Actually, they go one step further and say, “I can run this one command myself, who needs separate operators…”)

Tim and Adam are arguing that the unit of deployment for an application is the entire application. It is not each container image, however much of an improvement that is.

In a recent application for a client, we did precisely that. The entire application with all of its dependencies should be a single unit of deployment.

To do that, however, the individual units that compose the application – in our example, container images – must be able to know about each other and coordinate, without depending on external management.

The more I think about it, the more I believe that they are correct. It simply was a matter of solving the lower-level packaging issues, raising the bar to the point that we can begin to ask, “is this the best atomic deployment unit?”

Of course, an entirely different question is, are container images the future of higher-level deployment units, or will serverless, a.k.a. FaaS or Framework-as-a-Service, dominate. That is a question for a different day.

Summary

Solving the challenges of deploying a single instance frees us up to attack the problems of deploying an entire application with all of its related parts. DevOps means no longer being dependent on some infrastructure run by some operator to run your app, but being able to self-service.

How good are your deployment methodologies? Do you still “throw it over the fence”, or can you manage your apps dynamically? Ask us to help.

Posted in business, cloud, containers, product, technology | Tagged , , , , , | Comments Off on Pilots In Habitats: Basic Unit of Application Deployment