One version of Moore’s Law states that, “Processing power doubles about every 18 months relative to cost or size.” On the other hand, Wirth’s Law notes a less favorable trend: “Software is getting slower more rapidly than hardware is getting faster.” Bill Gates’ version is that, “The speed of software halves every 18 months.”
The question becomes how to harness ongoing increases in processing power without succumbing to performance-killing bloat and long development cycles. There are multiple reasons for increased bloat and prolonged development:
- Poorly Defined and Poorly Bounded Initial Objectives. Without sharply-defined objectives, there is no clear path on which development can progress. Secondary or tertiary value objectives may be worked on before primary objectives have been met.
- Cut & Paste. As time goes by, programmers have more old code laying around that they can cut and paste into new applications and make work – even if some of it is redundant or superfluous to the task at hand.
Reinventing Wheels. If development staff is not aware of the range of public source or commercial software available, it can spend valuable time and resources developing functionality for which there is already low-cost or no-cost alternatives.
- Incomplete Knowledge. Development models based on completely documenting all needs, features, and specifications before coding begins often fail because there are simply too many unknowns at the beginning, e.g., what data anomalies will be encountered, how will users actually interact with the system, what functionality will be unused while other needed functionality is identified? Early on the only thing that is known for sure is that you don’t know everything – this is the Donald Brumsfeld “known unknowns” concept.
Just identifying some of the causes leads to some obvious solutions:
- Isolate Critical Needs. Define the primary objectives and defer work on non-essentials until the primary mission is accomplished. Code that doesn’t contribute to meeting the primary mission should be cut out. The less the code the faster it loads and executes and the easier it is to debug and maintain it.
- Quantify Performance Costs. Perform regression testing during development by reprocessing standard data sets or inputs so that the performance costs of “upgrades” or additional features can be measured and evaluated.
- Don’t Invent Everything. Research and use cost-effective public domain or commercial software where there would be no clear advantage in writing new code. For example, few development staffs should be writing software to manage the software they write when there are many existing software vault programs available.
- Iterative, Step-Wise Development. Much has been written about “agile” software development. The key is to accelerate getting to the point where a system is working at a basic level and then adjust it based on experience and live feedback. This results in more functional, less bloated code than trying to document all requirements and specifications up front.
Here are two related suggestions on procuring or managing the development of new software or systems.
- Buy Results, Not Reasons. Contract for results, not hours of effort. In the legal market, this is called value-based billing. Organizations need results, not in-depth educations on all the reasons projects fail. Explicit service level agreements should specify the key variables of time until deployment, accuracy to be delivered upon deployment, volume to be handled, etc.
- Use Meaningfully-Scaled Proof-of-Concept (“POC”) Projects. Early attention should be paid to developing software that can process data at the scale anticipated. Software that is anticipated to process terabytes of data per year should be tested with more than a few gigabytes of representative data. Also, use actual real-world client data with all its imperfections and challenges, not pristine, never-to-be-seen-in-production test sets. Otherwise the unknown unknowns may remain unknown until after deployment.
Example from Mortgage Processing and Oil & Gas
Here’s how we adopt these principles when providing engineered solutions for automated file classification and document coding/attribute extraction: First, we maintain individual modules that perform specific functions and we don’t use modules for clients that don’t need that functionality. Using multi-threading with focused functionality, our technology can keep processor utilization on 64-core servers above 90% for literally days at a time. Low processor utilization rates can mean Wirth’s Law is beating Moore’s.
When working initially with clients, we provide POCs using about a terabyte (“TB”) of client data. Within a few days we visually group the files and apply classification labels to the largest groupings. We then work with a client’s subject matter expert to identify and extract data elements or attributes from documents in the largest clusters. While complete classification and attribution of all groupings may take a few months, we’re able to demonstrate the practicality and consistency of the classification with the TB-sized sample and provide actual output in under two weeks.
An alternative approach we sometimes use is to visually cluster about a TB of documents that have already been classified by our clients using their previously-established process. We then identify clusters where the same types of documents have received multiple different prior classifications. Identifying which of the original classifications was the most prevalent within any given cluster can help clients determine how to normalize their classifications.
In both approaches the client sees the result of end-to-end processing within just weeks and has a realistic and practical understanding of how the technology meets their real-world needs. When we discuss guaranteed accuracy levels above 99% the client knows specifically how the accuracy rate is to be calculated and confirmed.
Wirth’s Law: https://en.wikipedia.org/wiki/Wirth%27s_law
Moore’s Law: https://en.wikipedia.org/wiki/Moore%27s_law
Agile Development: https://en.wikipedia.org/wiki/Agile_software_development