Web of Knowledge
A post about software development, history, surveillance, knowledge, and climate change.
Paper Tape, 1725
A textile worker from Lyon, France named Basile Bouchon invented a way to control a loom using perforated paper tape in 1725. Jacques de Vaucanson attempted to fully automate looms in 1745 using punched cards. By 1804, Joseph Marie Jacquard refined and perfected punched card-controlled looms that revolutionised weaving. A circa 1910 model looked as follows:
Eighty years after Jacquard’s loom, Herman Hollerith filed for a punched card processing patent, inspired by railroad tickets, to help with the upcoming US Census. He’d go on to invent the tabulating machine and do business under his name, The Hollerith Electric Tabulating System, later reincorporated as The Tabulating Machine Company. Charles Flint, founder of the Computing-Tabulating-Recording Company (CTR) bought Hollerith’s business in 1911.
During 1913, Thomas Watson was convicted of antitrust violations, having his extortionate writings used against him. The following year, Watson approached Charles Flint for a job. Flint offered CTR, and 11 months later, Watson became its president.
In early 1924, Watson renamed CTR to International Business Machines, better known as IBM. Towards the end of the Roaring Twenties, IBM published an 80-column format for punched cards. Not long after, the company took a dark turn. During the 1930s, IBM facilitated genocide by selling and customising tabulating Hollerith machines for Nazi Germany to help the Reich track political opponents, Freemasons, Jewish people, homosexuals, and others. (The history of IBM’s involvement is more elaborate than what’s explained in this post.)
Prisoner types were encoded in punched cards: homosexual, 3; Jewish, 8; anti-social, 9; Gypsy, 12; and such. Death was digitised: natural causes, 3; execution, 4; suicide, 5; and so on. IBM engineers even created codes for Jewish people worked to death versus those killed by gas. Cards were printed, machines configured, staff trained, and the systems maintained on a bi-weekly basis at the concentration camps.
Punched card formats subsequently influenced the design of early electronic computer terminal hardware, including the VT52, VT100, IBM 2260 Model 3, and TRS-80 Model 4. These machines necessitated new standards (command sequences) to control how information is presented on a screen. Terminals offered 80-column wide screens for technical reasons and to ensure backwards-compatibility with standard punched card sizes.
(Aside, the default width for nearly all major terminal emulator software applications across all major desktop operating systems remains 80 columns: a lineage that can be traced to perforated paper tape from 1725! Colourful shell scripts use ANSI escape sequences defined in ECMA-48 [ANSI X3.64, later ISO/IEC 6429] sometime between 1976 and 1978.)
Miniaturisation of computers, improved storage capacities, and advances in network technology made the World Wide Web possible. Consequently, countless software applications developed on and for old hardware became obsolete as web browsers went mainstream. Web applications proliferated, followed shortly thereafter by mobile applications. The era of people offering up their own private data and metadata for analysis and sale by corporations had begun.
Unbeknownst to the general public, governments had also started collecting data. In particular, they were slurping up metadata as fast as people could generate it. The parallel with IBM subsidary Dehomag’s advertisement, which translates to “quickly oversee everything with Hollerith punch cards,” is chilling:
Metadata is information about information. A library’s card catalogue is an example of metadata: it contains details about the library’s books, which themselves are treasure troves of information. Metadata about a phone call can include: who called whom, what date and time the call was placed, each party’s location, and the call’s duration.
Imagine the following scenario where what words were spoken by a fictional individual named Kelly remains strictly confidential, but the metadata is available to government agencies:
- 01:57 am, 23 minutes – Kelly calls a phone sex service
- 02:41 am, 02 minutes – Kelly’s GPS indicates Golden Gate Bridge
- 02:43 am, 39 minutes – Kelly calls a suicide prevention hotline
- 11:33 am, 68 minutes – Kelly’s GPS indicates a police station
- 02:27 pm, 29 minutes – Kelly calls an STI testing service
- 03:05 pm, 17 minutes – Kelly calls a doctor’s office
- 03:35 pm, 36 minutes – Kelly calls a health insurance company
- 05:41 pm, 10 minutes – Kelly calls a gender identity support line
One possible interpretation of the metadata suggests that Kelly:
- had sexual activity without using protection, perhaps forced;
- might have been exposed to HIV; and
- is homosexual or bisexual.
Had such metadata been available to the Third Reich, it is easy to imagine that Kelly would have been flagged by encoding 3 (homosexual) on a punched card, arrested, and his death recorded by encoding 4 (execution), having been found guilty by association. There are other explanations for Kelly’s actions that, knowing the conversations’ contents, would lead to completely different interpretations.
Nazi Germany no longer exists, though; modern societies have fair trials, impartial judges, unalienable rights, due processes, and chartered freedoms. Yet, today’s governments that purport to uphold such principles in practice also execute people based on metadata. In 2004, General Michael Hayden, former Director of the National Security Agency admitted that people are killed based on metadata, not merely content.
So it goes.
Object References, 1965
In 1965, Tony Hoare invented a type system for a programming language named ALGOL W. He tried to make dereferencing objects absolutely safe in ALGOL W, but implemented a concept that undermined his goal: the
null reference. For him,
null was an easy idea to implement; however, it is considered a notoriously bad idea for many reasons, including added complexity and representing meaningless states.
Come 2009, Tony Hoare apologised for the mistake.
Almost a decade later, in 2018, a developer hard-coded
"NULL" as a string constant in a public-facing government system. This constant was compared against a user’s surname, leading to an obscure bug for anyone whose last name is literally Null. A code review caught the problem. Nonetheless, although various team members were previously asked to avoid using
null where possible, the advice went unheeded.
null include: default values, deferred object creation, optional references (such as Maybe in Haskell or Optional in Java), or development in languages that lack
null. Relational databases, which are breeding grounds for
null values, can signify missing data through the absence of row data. Consider the following entity-relationship diagram:
account entity contains nullable
null columns can result in a cleaner, resilient,
null-safe design. If a
contact does not have an email address, then there’s no corresponding row in the
contact_email entity for that particular
contact row. As opposed to the
account entity, which would most likely use
null to represent missing email addresses.
Software is riddled with
null-related errors and other types. Despite all the source code analysis tools, code reviews, automated software testing, educational institutions, development methodologies, fault tolerant mitigation strategies, on-going safety research, available knowledge, and formal specification languages, software continues to be plagued by devasting errors.
During the 1980s, a VT100-controlled radiation therapy machine named the Therac-25 fatally overdosed six people. After an extensive investigation by MIT professor Nancy Leveson, a number of lessons were put forward for safety-critical systems development, including:
- Overconfidence – Engineers tend to ignore software
- Reliability versus safety – False confidence grows with successes
- Defensive design – Software must have robust error handling
- Eliminate root causes – Patching symptoms does not increase safety
- Complacency – Prefer proactive development to reactive
- Bad risk asessments – Analyses make invalid independence claims
- Investigate – Apply analysis procedures when any accidents arise
- Ease versus safety – Ease of use may conflict with safety goals
- Oversight – Government-mandated software development guidelines
- Reuse – Extensively exercised software is not guaranteed to be safe
The remaining lesson was about inadequate software engineering practices. In particular, the investigation noted that basic software engineering principles were violated for the Therac-25, such as:
- Documentation – Write formal, up-front design specifications
- Quality assurance – Apply rigorous quality assurance practices
- Design – Avoid dangerous coding practices and keep designs simple
- Errors – Include error detection methods and software audit trails
- Testing – Subject software to extensive testing and formal analysis
- Regression – Apply regression testing for all software changes
- Interfaces – Carefully design input screens, messages, and manuals
In 2017, Leveson revisited those lessons and concluded that modern software systems still suffer from the same issues. In addition, she noted:
- Error prevention and detection must be included from the outset.
- Software designs are often unnecessarily complex.
- Software engineers and human factors engineers must communicate more.
- Blame still falls on operators rather than interface designs.
- Overconfidence in reusing software remains rampant.
Whatever the reasons (market pressures, rushing processes, inadequate certifications, fear of being fired, or poor project management), Leveson’s insights are being ignored. For example, after the first fatal Boeing 737 Max flight, why was the entire fleet not grounded indefinitely? Or not grounded after an Indonesian safety committee report uncovered multiple failures? Or not grounded when an off-duty pilot helped avert a crash? What analysis procedures failed to prevent the second fatal Boeing 737 Max flight?
So it goes.
World Wide Web, 1990
A series of woodblock scrolls from 868 CE were discovered in 1900. Called the Diamond Sūtra, the prints are the earliest known complete and dated text that used a printing device, predating Gutenberg’s printers by over 570 years. Curiously, the scrolls instructed that they were “for universal free distribution.” A part of the scroll looks as follows:
Over in England, during the 1300s, all educated people purportedly understood Latin: they had no use for vernacular translations of sacred texts. Although the Roman Catholic Church was more opposed to authoritative challenges than it was to high-quality translations, restricting translations under pain of heresy could be considered a way for the Church to maintain a privileged status and uphold an elevated societal position. Compounding the matter, elitist mediæval theologians and intellectuals dismissed the capacity of ordinary people to understand nuanced scripture.
In short, knowledge and power have been bedfellows for ages; those who wield power are often reluctant to yield it.
Fortunately, the stranglehold over knowledge was short-lived. Around 1439, Gutenberg introduced the movable-type printing press to Europe, which was followed by an era of mass communication. Revolutionary ideas crossed borders relatively unimpeded. A sharp increase in literacy rates proved the elitist intellectuals wrong and threatened the power of both political and religious authorities. An emerging middle class blossomed thanks to the proliferation of literature that ordinary people could read.
Dissemination of knowledge is crucial to society’s advancement, a fact recognised before 868 CE and restated over 1,100 years later in a vastly different medium.
When Sir Timothy Berners-Lee first proposed the World Wide Web in 1990, his motivation included:
… incompatibilities of the platforms and tools make it impossible to access existing information through a common interface, leading to waste of time, frustration and obsolete answers to simple data lookup.
The first web page ever written states that the World Wide Web is:
… an initiative aiming to give universal access to a large universe of documents.
Universal access to educational materials, scientific data, and recent discoveries is at the heart of the World Wide Web. Having such information freely available with ease is implied by the following sentence:
The texts are linked together in a way that one can go from one concept to another to find the information one wants.
Charging for information through paywalls restricts access to information. This widens financial, educational, and health gaps between the poor and the wealthy. Education is of pivotal importance.
Two factors influence family size and therefore global population: family planning and educating girls. As humanity’s population grows, its production of carbon dioxide and other greenhouses gases grows with it. By making educational material freely available to everyone, rich and poor alike, it could curb population growth, avoiding an extra 51 gigatons of greenhouse gas emissions by 2050.
The World Wide Web is an incredibly powerful invention, enabling social progress on par with—or perhaps exceeding that of—the printing press. It is ironic that a technology originally conceived for universal access to knowledge is starting to deprive information from poor or impoverished people, potentially to the detriment of everyone.
And so it goes.
About the Author
My career has spanned tele- and radio communications, enterprise-level e-commerce solutions, finance, transportation, modernization projects in both health and education, and much more.
Delighted to discuss opportunities to work with revolutionary companies combatting climate change.