What Is SOA?

From Misc

SOA stands for Service-Oriented Architecture.

It is the reliance on web services to send and receive data. The services are loosely coupled, meaning any client on any platform can connect to any service as long as the essential contracts are met.

Clients can consume multiple services, and services can have multiple clients.

From Plural Sight

The decomposition of a system into autonomous (or nearly autonomous) units of responsibility, and exposure of those units in a secure and controlled fasion.

Exposure of an API for your system.
The client has limited functionality access.
Clients access functionality by making service calls.

From Sam Newman

Service-oriented architecture is a design approach where multiple services collaborate to provide some end set of capabilities. A service here typically means a completely separate operating system process. Communication between these services occurs via calls across a network rather than method calls within a process boundary.

(Summarized) The SOA movement got co-opted by vendors for profit and never explained how to achieve its architectural goals. The microservice movement (an extension of SOA) fills in the details of how to design, implement, and maintain such a system.

SOA Vs OOP

Comparison from the client side:
- OOP is when you use a library. You have access to all the individual objects. You need a lot of knowledge about how to use the objects together, what order to run operations in, etc.
- SOA abstracts all those details away, only exposing high-level operations that always leave the application in a valid state (i.e. they are stateless because state does not have to be maintained between calls).

Comparison from the application side:
- OOP is concerned with the lower-level design decisions.
- SOA is concerned with the higher-level design decisions.

Strengths

- Reusability: two applications can call on the same service on the backend
- Maintenance: a service can be altered or replaced without client applications being aware of it
Rough Definitions

System Vs Application

An application (in this distinction) is a single executable running on a single machine. It is not the responsibility of the application to solve connectivity problems, such as losing access to a storage location.

A system include multiple executables on (possibly) multiple machines. The system is responsible for solving its own connectivity problems.

Ex: a web page is a system because it involves the server, the client's machine and browser, and probably at least one database.

RPC Vs Queue

RPC stands for Remote Procedure Call, such as a request/response. They are synchronous

Message queues are asynchonous.

Loosely Coupled Services

Services that communicate with the client or with each other using a set of industry-accepted standards and protocols.

Service

A service is a collection of operations (units of responsibility).
The service is the point of entry for the client.
Services are secure. (They handle any required internal security. They also inherently limit client access to their system.)
Service operations always leave the system in a consistent (valid) state.
Services handle faults/exceptions gracefully.
The client is not exposed to any details of how an operation is fulfilled.
The client is protected from code volatility (likelihood of change).
Service calls are (almost always) stateless.

A service is a logical representation of a repeatable business activity with a specific outcome. It is self-contained. It is a black box to programs that consume it.

A service can be composed of several other services.

Service Oriented Application

An application whose volatile areas are abstracted, or wrapped, in a simple service call that is exposed to the client.

Each service call is called an "operation".

Transaction

A set of actions that must all succeed together or all fail together.
Don Box's 4 Tenets Of SOA

Boundaries Are Explicit

Services interact by sending messages across boundaries. The boundaries are formal and explicit. No assumption is made about what is behind the boundary.

Ex: the client does not know what kind of database or file system the service is using.

Services Are Autonomous

A service does not care how a message was created, or what will happen in the client after the service performs its actions.

Therefore, version and deploy the service independently from the client.

Therefore, design service contracts assuming that once published, they cannot change.

Services Share Schema And Contract, Not Class

Only message pass between services, not code. The messages are not random; they have been agreed upon.

Service Compatibility Is Based Upon Policy

A service must express what it does and how a client should communicate with it in a standard representation.
Principles Of SOA Design

Standardized Service Contract

Services adhere to an explicit service description.

Loose Coupling

Services minimize dependence on each other.

Service Abstraction

Services hide the logic they encapsulate.

Service Reusability

Divide business logic into several services with the intent of maximizing reuse.

Service Autonomy

Services have control over the logic they encapsulate.

Service Statelessness

Services should be stateless.

The client should not be required to know that operation A, B, and C must be called in a specific order, or that operation D must be followed by E.

Such specific ordering should be contained in a single operation.

Service Discoverability

Services can be discovered, usually in a service registry. Ie, a client can invoke a service regardless of its actual location in the network.

Service Composability

Services break big problems into little problems.

Service Interoperability

Services use standards that allow diverse clients to use the service.
Fallacies

[Advanced Distributed System Design (online course)]

The 8 Fallacies Of Distributed Computing

The network is reliable.
The network is secure.
The network is homogenous.
The topology won't change.
Latency isn't a problem.
Bandwidth isn't a problem.
Transport cost isn't a problem.
The administrator will know what to do.

(from Deutsch and Gosling, in the 1970s)

The 10 Fallacies Of Enterprise Computing

Everything from the 8 Fallacies of Distributed Computing, plus

The system is atomic and/or monolithic.
The system is finished.
Business logic can and should be centralized.

(from Neward)

(The name is a joke about counting from 0)

Unreliable Network

How do you handle HttpTimeoutException?
- is the process running correctly, but it's just taking a while?
- did the request fail?

You can log the error and continue.
You can forward the error to the user.
You can retry.
- but what if the request actually succeeded already?
- but what about transactional integrity?

You can use a reliable messaging service. (MSMQ, Azure Service Bus, WebSphere, etc)
- message queues are designed to handle retry/store and forward/transaction integrity behaviors
- what you lose is request/response - everything will be asynchronous

System design is very different when you cannot rely on request/response interactions.

Latency

Roundtrip time: time from invocation to response being received
- includes serialization, server processing, and deserialization

Latency time: time to cross the network in one direction
- more useful metric for distributed systems

Normalized latencies, based on 1 CPU cycle = 1 second
- Main memory access = 6 minutes
- Hard disk access = 2-6 days
- Internet from San Francisco to New York = 4 years
- Internet from San Francisco to England = 8 years
- Internet from San Francisco to Australia = 19 years
- TCP packet retransmit = 105-317 years
- OS virtualization system reboot = 423 years
- SCSI command timeout = 3 millennia
- Physical system rebook = 32 millennia

Distributed systems can have terrible latency issues.

If you use polymorphism and dependency injection, it may not be clear to when you are even making a remote call.
If you use cloud services, it may not be clear how often a system will have to reboot under you.
If you use microservices, which make many remote calls, that will be slower than a single service.
If you use an ORM with lazy-loading, it may not be clear how many database queries are being run.

In general, avoid using remote objects.
- ORM lazy loading is an example of remote objects
- tightly coupled request/response interactions are examples of remote objects

Limited Bandwidth

Bandwidth capacities have grown slower than CPU speed or memory.

How much is a gigabit of bandwidth?
- that's about 128 megabytes
- TCP overhead will use up half of it - down to 60 megabytes
- take another half for the structure of whatever data format you use - down to 30 megabytes

Putting more servers on the same network will not increase your bandwidth.

To avoid wasting bandwidth, you want to load as little data as possible.
But that conflicts with latency concerns, which recommend you load as much data at once as possible.

The best you can do is load exactly the data you need at one time.
You may need to decompose your domain model into multiple smaller, more specific models, each one designed to handle a subset of scenarios.

You can move time-critical data to its own network.
- either truly a separate network
- or subdivide your bandwidth and allocate a portion of it to each service
- this is a good reason to divide services into smaller services - based on their bandwidth priority

Insecure Network

Nothing is secure. Even computers that are disconnected from the network are at risk every time you put in a disk or thumbdrive. Every person with access to the system is a security risk. Faster computers means it's easier to crack encryption.

Changes To Network Topology

With cloud computing, the network topology can change on the fly.

With callback contracts (and similar) the server remembers the address of the client so it can push updates to them. What if the client isn't there anymore? HTTP timeouts default to 30 seconds, so it'll be that long to realize the client isn't available. This could even be used as a denial of service attack.

Can the system the designed so that it can preserve its performance when the network topology changes?

So
Don't hardcode ip addresses, domain names, etc.
Consider multicasting (talking to a range of addresses). This can be insecure.
Consider discovery mechanisms for self-configuring systems. It works well until it doesn't work.

Performance test early, starting at 30% of the way through a project. Test with various servers going down or rebooting.

Fallible Administrators

Even if an admin actually knows the whole system top to bottom, you can't rely on them working for you forever. How long will it take their replacement to get up to speed? To become an expert?

Configuration is magic if you don't understand it, and don't expect its effects.
Some configuration may be in a text file. Some may be in the database. Some may be in environment variables.

So
Use a configuration management system. See the ITIL standard.
Keep up-to-date documentation.
Save stable deployment points.
Backup the system.

IT will often get push back from the business about when they can/cannot deploy updates, because the business knows that errors are likely to be introduced. This results in large deployments (more likely to contain errors) being deployed less frequently.
It is better to move towards continuous deployment. Deploy many small updates so often that deployment is not worrying. Errors are easier to track down because the amount of changes that could have caused them is smaller.

If you can make sure the new code is backward compatible, then you can have little or no downtime while you upgrade, because you can upgrade one server at a time. When you test, test multiple versions of the system running in parallel.

Queueing can help with downtime, because while one part of the system is down, another part can still be adding messages to the queue. Or one part can be consuming from the queue while the producer is down.

Cost Of Transport
Distributed System Design

The 8 Fallacies Of Distributed Computing

The network is reliable.
The network is secure.
The network is homogenous.
The topology won't change.
Latency isn't a problem.
Bandwidth isn't a problem.
Transport cost isn't a problem.
The administrator will know what to do.

(from Deutsch and Gosling)

The 10 Fallacies Of Enterprise Computing

Everything from the 8 Fallacies of Distributed Computing, plus

The system is atomic and/or monolithic.
The system is finished.
Business logic can and should be centralized.

(from Neward)

(The name is a joke about counting from 0)
Enterprise Service Bus

ESB stands for Enterprise Service Bus.

A service bus implements a communication system between mutually interacting software.

Instead of N services each communicating with (N-1) services, all services communicate with the service bus only, resulting in only N lines of communication. The service bus passes messages through to the target service.