For almost all of computing history, we have interacted with computing devices via keyboard for input and printer, then screen, for output. Computers are logical devices, and require clean, defined logical statements to interact. Thus, we use precise text.
Human interactions, on the other hand, are less precise but much richer. We interact via touch, sound and sight – both the precise written word and visual pictures. For most of human history, the overwhelming majority of people, upwards of 99%, were illiterate. Touch, sound and sight were the only methods of interaction. With the advancement of literacy – a prerequisite to the information revolution – the written word became a more common medium:
- We write books and newspapers and distribute them to thousands of even millions of people we have never met.
- We write letters and, later, email, to people we actually do know – and many we do not – and read them.
- We write social media updates and blog posts and read them by the millions.
The written word, whether handled via keyboard or pen and paper, is far more scalable and long-term reliable than speech; speech, on the other hand, is a more natural interaction for humans.
Thus, it is only natural that we seek ways to make computer interaction more like human interaction. The first public stab at this was the famous Steve Jobs Macintosh announcement. At 3:18 in the video, the Macintosh “speaks for itself.”
Apple took another run at it with the integration of Siri, based on SRI research acquired by Apple, in the iPhone 4S in 2011, followed by Google Now and Microsoft Cortana. The initial releases were unimpressive, at best fodder for comedy shows.
Nonetheless, these companies recognized that text-based interaction, except for data that must be text, is less natural.
Recently, a completely open-source variant on the Intelligent Personal Assistant (IPA), called Sirius (unconnected to Apple’s Siri or Harry Potter’s Sirius Black), based on research at the University of Michigan, has made waves; the initial commit on GitHub is less than a year old, followed by over 650 commits since then.
I read an interesting analysis via the inimitable Adrian Colyer in his Morning Paper. The “humanizing” components – speech recognition, speech generation, sight recognition, etc. – that make it much more human require ~165x the computing resources of a typical Web search. This isn’t surprising; once all of the input is converted to text, the basic search still has to be performed. Similarly, once the results are found, they need to be converted to human form of speech or sight.
Certain hardware strategies, like using special purpose chips, help reduce that gap. Nevertheless, no search company, even Google, is going to spend 165x the capex it has deployed to support these interactions.
With Moore’s Law, over time, those costs will go down. Eventually, a Star Trek-like interaction where we just ask the computer to do something may be as common and economical as typing the request in nowadays.
Does this mean the end of the keyboard?
A friend of mine who works with advanced drones said to me last week that, “airplane pilots will be viewed as a 20th-century aberration.” Before the 20th century, there were no airplanes, so no pilots to fly them; after the 20th century, automated takeoff, landing and flying systems may surpass the skill of a human pilot, rendering him or her not only unnecessary but unsafe compared to a true auto-pilot.
Will the keyboard be like airplane pilots, a 20th-century aberration?
I think not.
We always will use language as a means of communication, whether with other people or with computers. Humans translate the language into text memory, along with its intonations, syntax, emphases and body language. Computers will be able to do the same.
However, even with thousands of years of human history of verbal communication, we still write letters and email when we could call or record an audio message. Text as an entered medium, not just a recorded medium, has value in efficiency – it is faster to read than to listen, and most people retain visual memories more easily than audible ones – and precision.
Writing structured articles, whether in newspapers (or whatever replaces them), books and blogs, as well as personal communications, will remain for foreseeable human history. Detailed writing, primarily as used for engineering such as specifications or software and hardware design and implementation, always will be more precise with a keyboard.
A future based on IPAs is exciting and interesting; we may each have a personal “human” secretary without the cost of a real human one, when the cost factors are reduced sufficiently to make it accessible.