[Clean Architecture by Robert C. Martin]
[author's blog]
These notes are all taken from the book. Much of the notes are the opinions of Mr. Martin.
Grady Booch:
Architecture represents the significant design decisions that shape a system, where significant is measured by cost of change.
Brian Foote and Joseph Yoder:
If you think good architecture is expensive, try bad architecture.
Ralph Johnson:
Architecture is the decisions that you wish you could get right early in a project, but that you are not necessarily more likely to get them right than any other.
Tom Glib:
Architecture is a hypothesis that needs to be proven by implementation and measurement.
The only way to go fast, is to go well.
When you get software right, something magical happens: You don't need hordes of programmers to keep it working. You don't need massive requirements documents and huge issue tracking systems. You don't need global cube farms and 24/7 programming.
When software is done right, it requires a fraction of the human resources to create and maintain. Changes are simple and rapid. Defects are few and far between. Effort is minimized, and functionality and flexibility are maximized.
The goal of software architecture is to minimize the human resources required to build and maintain the required system.
Design (generally, low-level details) and Architecture (generally, high-level structure) are the same thing, at different ends of a continuum.
Low-level design decisions must support the high-level architecture decisions.
In either case, the measure of design quality is the measure of the effort required to meet the needs of the customer. If that effort is low, and stays low throughout the lifetime of the system, the design is good.
If programmer productivity drops with each software release, that indicates the design is very bad.
If architecture comes last [in considerations], then the system will become ever more costly to develop, and eventually change will become practically impossible for part or all of the system.
"We can clean it up later; we just have to get to market first."
If you don't have time for clean code now, you also won't have time for it later.
"Writing messy code will speed us up in the short term."
Making a mess is always slower than writing clean code, even in the short term.
(Specific example of using TDD or not to complete the same task.)
The only way to go fast, is to go well.
You've got a mess. Should you restart from scratch, or do gradual refactoring?
If you made a mess once, you'll do again, because your habits haven't changed.
Implied that going through refactoring will teach you better habits.
(From my own experience)
We lost knowledge of many edge cases when we restarted a code base from scratch - those edge cases had to be rediscovered as bugs, gradually.
We also grossly underestimated the time it would take to get a new, minimum product up and running.
Software provides two values to stakeholders:
1) Behavior - it does the task
2) Structure - it can be extended and modified
Software developers are responsible for both values, but frequently focus on just Behavior to the detriment of the system.
The "soft" in "software" means that software is intended to be an easily modified way to tell machines to complete tasks. Machines are "hardware" - difficult to change. "Software" greatly increases the usefulness of the machine, by making it easy to change the tasks it completes.
Therefore, the main goal of software is to be easy to modify.
When stakeholders change their mind about a feature, it should be simple and easy to make that change to the software. The difficulty of making a change should be proportional only to the scope of the change, and not to the shape of the change.
Is it more important for a software system to work, or to be easy to change? If you give me a program that works but is (practically) impossible to change, it won't work after the requirements change, and I won't be able to make it work. If you give me a program that doesn't work but is easy to change, then I can make it work. Therefore, it is more important that a program be easy to change, than that it works now.
Once a program becomes practically impossible to change (due to labor and time costs), it will be discarded.
Dwight D. Eisenhower:
I have two kinds of problems, the urgent and the important. The urgent are not important, and the important are never urgent.
1) Important + Urgent = Structure + Behavior
2) Important + Not urgent = Structure
3) Not important + Urgent = Behavior
4) Not important + Not urgent = ?
(3) is often placed at the highest priority by both managers and programmers.
Business managers are not equipped to evaluate the importance of software architecture. That is what software developers are hired to do.
Therefore it is the responsibility of the software development team to assert the importance of architecture over the urgency of features. Fulfilling this responsibility means wading into a struggle. The development team has to struggle for what they believe to be best for the company, and so does the management team, and the marketing team, and the sales team, and the operations team. [And this is correct - each team knows things the others do not, each team has valuable insights into the business that the others are not expected to have.]
Effective software development teams tackle that struggle head on. Remember, as a software developer, you are a stackholder - unabashedly squabble with the other stakeholders as equals. That's part of your role and part of your duty. It's a big part of why you were hired.
A brief history of how limiting our options, as programmers, has helped us write better code. Each of these paradigms removes capabilities from the programmer.
Structured programming imposes discipline on direct transfer of control.
Structured programming allows modules to be recursively decomposed into provable units (functional decomposition).
Discovered by Edsger Wybe Dijkstra in 1968, who proved that use of unrestrained jumps ("goto" statements) is harmful to program structure. Dijkstra applied the mathematical discipline of proofs to programming. He discovered that certain uses of "goto" statements made it impossible to deconstruct functions into smaller units - which prevented the divide-and-conquer method of proving the function "correct".
The good uses of "goto" followed the patterns of sequencing, selection, and iteration. So Dijkstra discovered that the constructs that made a function provable, were the same as the minimum set of constructs from which all programs can be built [based on the work of Bohm and Jacopini].
The "goto" statement was replaced with "if/then/else" and "do/while/until" constructs.
Structured programming supports testability by limiting programmers to creating provable functions.
Object-oriented programming imposes discipline on indirect transfer of control.
Discovered by Ole Johan Dahl and Kristen Nygaard in 1966, who moved the function call stack to heap memory and thereby invented objects. (The function call becomes the class name, local variables become the properties, and nested functions become the methods.)
This paradigm removes function pointers from programmers.
Martin discusses "What is the defining feature of object-oriented programming?"
1) The combination of data and functions? No, because there is no intrinsic difference between Object.Function() and Function(Object).
2) A way to model the real world? No, too vague to mean anything.
3) Encapsulation, inheritance, and polymorphism?
3a) Encapsulation? No, C already had perfect encapsulation, which C++ and C# have strayed further from (due to technical compiler reasons). Many newer OO languages don't enforce strong encapsulation at all.
3b) Inheritance? No, C could do inheritance manually already. (See technical explanation.) OO languages do make it easier to do this, though.
3C) Polymorphism? Polymorphism is an application of pointers to functions (the function call goes to a lookup table to see which function will actually be run). This was being done back in the 1940s manually, but OO languages formalize and automate the tricky bits, making it much easier and safer for the programmers.
(His point seems to be that OO languages make several techniques easier and more disciplined than programming them all manually.)
Martin settles on language/compiler enforced polymorphism as the core of Object-Oriented languages.
Why is polymorphism so important? Because it enables plug-in architectures. Example: in UNIX, writing to STDOUT could go anywhere depending on what device is currently set as STDOUT. These devices must all implement a standard interface, so every program writing to STDOUT uses the same command, but what actually happens depends on the device implementation. A program calling "write" is not recompiled when you plug in a new STDOUT device.
Device independence: your program doesn't know what device it is writing out to, because all the devices implement the same interface. So your program can work with any device that implements that interface. Your program is more reusable than one that is tightly tied to a particular device.
Polymorphism enables Dependency Inversion. In that sense, object-oriented programming is the ability (through polymorphism) to gain absolute control over every source code dependency in the system.
Functional programming imposes discipline upon assignment.
Discovered by Alonzo Church in 1936 (although it was not adopted for some time), who invented lambda calculus. A foundational notion of lambda calculus is immutability (the values of symbols do not change) - meaning that strictly functional languages have no "assignment" statement and all variables are immutable.
This paradigm removes the "assignment" operator from programmers.
//Clojure example: print the first 25 squares
(println (take 25 (map (fn [x] (* x x)) (range))))
This is important to architecture because all race conditions, deadlock conditions, and concurrent update problems are due to mutable variables. Multi-threading and use multiple processors becomes significantly easier without these problems.
Theoretically, immutability is easy to achieve given infinite storage space and infinite processor speed. Since resources are not infinite, you'll need to decide which parts of a system should be immutable and which should not (Segregation of Mutability).
Event Sourcing is an immutable way of storing data. Instead of updating records as they change, you store every edit or update as a new record. When you need to know the current state of the data, you run a function that looks at all the records and calculates the current state. A variation on this is to calculate and store the current state once a day. With Event Sourcing, no data ever updated or deleted. Source-control systems work like this.
We use polymorphism as the mechanism to cross architectural boundaries.
We use functional programming to impose discipline on the location of and access to data.
We use structured programming as the algorithmic foundation of our modules.
These align with the three big concerns of architecture:
separation of components
data management
function
The SOLID design principles for writing object-oriented code. How to arrange functions and data structures into classes, and how to interconnect those classes.
"Mid-level" principles means working higher than the "line of code" level and lower than the "architecture" level.
The goal of these principles is to produce software that is tolerant to change, easy to understand, and can be used in many different systems.
These principles were organized and presented by this author, so he should know what he's talking about here.
Each software module should have only one reason to change. (NOT every module should do just one thing. A FUNCTION should do just one thing, but this principle is working at a higher level.)
Software systems change to satisfy actors (groups of business stakeholders, users, etc). So to achieve this, organize the software to imitate the social organization of the company. You could rephrase it as "each software module should be responsible to only one actor".
("Module" generally means "source file" here.)
For example, the code to calculate overtime should not be shared between the Accounting Department's modules and the Human Resources Department's modules. The calculation might be the same now, but is very likely to be different in the future. The CFO's decisions should not impact the COO's decisions.
Adhering to this principle will make dividing work among a team much easier, as each programmer is more likely to be editing different source files. You won't have many merge operations when checking in source code.
Two examples of organizing the source code: (solid arrows mean a source code dependency)
Software should be open to change/extension and closed to editing/modification.
For software to be easy to change, the system must be designed so you can change it by adding new code, rather than editing existing code.
If simple extensions to the requirements force massive changes to the software, then the architects of that software system have engaged in a spectacular failure.
Example: you have a system with a financial summary web page that displays the data in a scrollable format, with some color-coding. If a stakeholder asks for the same data in a printable format, paginated, in black and white, how much of your current system has to change? How much of this can be achieved by adding code, instead of editing code?
Sample data flow:
Note that calculating the data is separated from displaying the data.
Sample detailed class diagram:
For software to be built of interchangeable parts, those parts must adhere to the contract that makes the parts interchangeable.
In other words, everywhere in the code that Class A is used, it should be possible to substitute in an instance of Class B without breaking anything.
For example, if you're wondering if Class B should inherit from Class A, and you know that there's a function in Class A that doesn't apply to Class B, then Class B should not inherit from Class A.
Don't depend on things you don't need.
For example, if there are 10 methods listed in an interface and half the users only need 6 of them, then that interface should be split into at least two interfaces. That way, those users only have to depend on the methods they are actually using.
Code implementing high-level policies (such as Business Logic) should not depend on code implementing low-level policies (such as Data Access).
This is the major thesis (or strategy?) of this book.
Since object-oriented languages support polymorphism, any source code dependency can be inverted. This is, architecturally, profound. The architect has absolute control over the source code dependencies in a project.
Source code dependency: which source code files "import" or "using" which other source code files
Flow of control: which functions/modules call which other functions/modules
The "Main" function calls the high level functions, which call the low level functions.
Source code dependencies always match the flow of control, because "Function A" must know about "Function B" in order to call it.
High Level A still calls Low Level C (dashed arrow shows flow of control).
But now High Level A has no source code dependency on Low Level C (solid arrow).
Instead, both High Level A and Low Level C have a dependency on the Interface that Low Level C inherits from.
If we package the source code like this, we can see the "Inversion" part of "Dependency Inversion". The source code dependency (solid arrow) is now pointing in the opposite direction as the flow of control (dashed arrow). The low level code now depends on the high level code.
Solid arrows = source code dependency
Dashed arrows = flow of control
The UI and the Database become plug-ins to the Business Rules. They can be compiled into separate components (such as C# assemblies) and deployed independently of each other.
Independent Deployability: only the component that was edited needs to be re-compiled and re-deployed.
Independent Developability: if components can be deployed independently, then they can be developed independently by different teams.
[T-Rav Clean Architecture demo project]
[StoneAgeTechnologies Clean Architecture library]
One use-case end-to-end:
Mathematical theorems can be proven correct.
Scientific theories can be proven wrong, but not correct, so that's a fundamental difference. Scientific theories are falsifiable.
Dijkstra:
Testing shows the presence, not the absence, of bugs.
A program, like a scientific theory, can be proven incorrect with a test, but can never be proven 100% correct. We show correctness by failing to prove incorrectness, despite our best efforts.