First up some disclaimers. This is a contrived example for illustrative purposes only. It is not meant to be a shining example of fantastic design, but to show some of the considerations that go into producing good code. I also provide some approximate definitions (approxinitions?) of some concepts. These may not be very accurate, but are designed to give the reader an idea of a concept without getting drowned in the details. Lastly, try not to get bogged down in the details (How does that work? How does it connect to a database? What’s a database? What language is that? Why am I reading this strange person’s blog?). Instead, concentrate on the concepts being communicated. You don’t need the details to get the gist. Right? Ok, let’s go.
A problem and first pass at a solution
We are writing an application that needs to send emails to a group of people. To do this we will write a class. Think of a class as group of related behaviours (functions) and data (fields, properties). Let’s see some code:
class EmailSender {
function SendEmails() {
listOfPeople = database.LoadPeople();
foreach person in listOfPeople {
emailService.SendEmail(welcomeEmail, person);
}
}
}
Even if this looks a bit daunting, you can probably still follow through what we are doing. Here we are creating a new class called EmailSender that contains a SendEmails() function. We can call a function through a class. This is usually expressed in a className.functionName() format, so in this case we could run our function by writing EmailSender.SendEmails().Within our function we load a list of people from the database, and then for every person in that list we send them an email by calling the emailService.SendEmail(welcomeEmail, person) function. The values in the parenthesis is information we are passing to the function. In this case we are telling the function which type of email to send (a welcomeEmail, whatever that is) to a particular person. In this example we are assuming that the emailService and database classes are already written somewhere for us to use.
Adding more features
We now have a function that will send welcome emails to all the people in some database. Hooray! Now we realise that our application does not just need to send a generic email to everyone, but we also need to send emails to everyone that is about to go on holidays. Let’s add a new function to do this:
class EmailSender {
function SendEmails() {
listOfPeople = database.LoadPeople();
foreach person in listOfPeople {
emailService.SendEmail(welcomeEmail, person);
}
}
function SendEmailsToHolidayers() {
listOfPeople = database.LoadPeopleGoingOnHolidays();
foreach person in listOfPeople {
emailService.SendEmail(haveANiceTripEmail, person);
}
}
}
This new function is very similar to the first. It loads a different list of people (people gong on holidays), and then sends each person a haveANiceTripEmail instead of a welcome email. Finally, lets send some nasty emails to people who have not paid their bills to our company.class EmailSender {
function SendEmails() {
listOfPeople = database.LoadPeople();
foreach person in listOfPeople {
emailService.SendEmail(welcomeEmail, person);
}
}
function SendEmailsToHolidayers() {
listOfPeople = database.LoadPeopleGoingOnHolidays();
foreach person in listOfPeople {
emailService.SendEmail(haveANiceTripEmail, person);
}
}
function SendEmailsToLatePayers() {
listOfPeople = database.LoadPeopleWithOutstandingDebts();
foreach person in listOfPeople {
emailService.SendEmail(weAreSendingSomeGoonsEmail, person);
}
}
}
And we are finished. Simple right?Problems with our class
Problems? Our class is perfect! We just wrote it, it’s beautiful! The last code sample may work, but it has a lot of duplication. All three functions load a list of people from the database, then send an email to each person. This duplication is a warning signal to the programmer that something is wrong with the code.
We wrote more functional code than was required, repeating the same logic in several places. Functional code is the lines that do stuff, rather than the semantics of defining classes and functions. The code is also hard to maintain. For example, say we want to record in the database that an email is being sent before we sent it. We then need to change the following lines in 3 places:
foreach person in listOfPeople {It is easy to miss one of these lines once our class grows to have more functions, which encourages bugs. It also means if there is one bug in a repeated part of the code it will be faithfully reproduced in the duplicated parts of the code. This kind of duplication can also make it difficult to understand the intent of the code, as the important logic is obscured by other bits of functionality.
database.RecordEmailIsBeingSent(weAreSendingSomeGoonsEmail, person);
emailService.SendEmail(weAreSendingSomeGoonsEmail, person);
}
A good indicator of good code is a lack of duplication. This makes for smaller amounts of functional code (which means less work as we are solving each problem only once, and less places for bugs to hide), and code that is easier to change and maintain (faster and cheaper to add features and support).
So let’s take a giant leap and see if we can turn this code into good code. This process is known as refactoring (changing the design without modifying the behaviour). To non-technical people refactoring can seem a waste of time. Why spend time changing something behind-the-scenes when the overall output remains the same? If it ain’t broke, why fix it? To technical people, refactoring is essential and will reap great rewards the second someone finds a bug or wants to change the application slightly. In reality the code is broken, unless part of the initial requirements was for our application to cost a huge amount to support and be nearly impossible to change.
Refactoring to good code
To eliminate this duplication in our functions, let’s think about exactly what each function is trying to achieve.
- Load a list of people from a database. The list will be different for different email types, but the end result will still be a list of people.
- We need to send an email to every person in the list.
- The email content will be different for each email type.
class EmailSender {How does that look? We don’t have any duplication now, but we have lost our multiple email types. Notice how clearly the SendEmails() function now expresses our intention: we load a list of relevant people, then send an email to each one. We also have some blank functions, LoadRelevantPeople() and EmailContent(). These functions represent the parts of our original code that differed between each function. We will take advantage of some programming magic called inheritance to inject the relevant functionality.
function LoadRelevantPeople();
function EmailContent();
function SendEmails() {
listOfPeople = LoadRelevantPeople();
foreach person in listOfPeople {
emailService.SendEmail(EmailContent(), person);
}
}
}
Inheritance is a relation between a parent class and a child class. The child class automatically gets the functions and properties of the parent, but can also add new functions, and override the behaviour of the parent’s functions. We will use this techniques to implement our different types of emails.
class EmailSender {We can now send emails like this: HolidayEmailSender.SendEmails(). This will use the SendEmails() function from our parent EmailSender class, which will fill-in-the-blanks with HolidayEmailSender’s implementation of LoadRelevantPeople() and EmailContent().
function LoadRelevantPeople();
function EmailContent();
function SendEmails() {
listOfPeople = LoadRelevantPeople();
foreach person in listOfPeople {
emailService.SendEmail(EmailContent(), person);
}
}
}
class WelcomeEmailSender inheritsFrom EmailSender {
function LoadRelevantPeople() {
return database.LoadPeople();
}
function EmailContent() { return welcomeEmail; }
}
class HolidayEmailSender inheritsFrom EmailSender {
function LoadRelevantPeople() {
return database.LoadPeopleGoingOnHolidays();
}
function EmailContent() { return haveANiceTripEmail; }
}
class LatePayersEmailSender inheritsFrom EmailSender {
function LoadRelevantPeople() {
return database.LoadPeopleWithOutstandingDebts();
}
function EmailContent() { return weAreSendingSomeGoonsEmail; }
}
What do we gain by doing this?
- We have separated the logic of sending an email to a group of people with the particulars of getting that list of people and email content. This makes it easier to understand each discrete block of logic.
- We can now add new email types without modifying the original EmailSender class. This helps us to avoid introducing bugs into that class. It also means we do not need to understand all the details of the EmailSender class, so our application is easier to modify.
- We can change the EmailSender class to record all the emails being sent without affecting the child classes. This example was used in the previous section and required 3 modifications. Now we only require one.
- We can separately test each unit of logic. This approach is more amenable to automated unit testing techniques.
- We have eliminated duplication.
- In this case, but not always, brevity. By removing duplication we have actually increased the number of lines we wrote from 20 to 28. We are using a new class for each function rather than the 3 line functions we were using initially. This is more to do with this example than the refactoring process in general.
- Directness. Before we could look at a function and see every step. Now we have introduced a layer of indirection or abstraction to the process. It is now easier to understand each discrete block of logic, but takes more brain power to understand the entire behaviour.
Even so, the extra lines are not really an issue in this case. The extra lines are not functional code, they are simple class definitions and so forth, rather than code that actually does stuff. Normally these non-functional lines are written for programmer’s by their code editing programs. They are also the kind of things that are automatically checked by the code compiler, so unlike functional lines of code they will generally not be harbouring bugs.
So the balance lies in the gains versus introducing indirection. Bear in mind that this is a contrived example, and that most software is significantly more complicated. As things get more complicated, abstraction and indirection become compulsory. The human brain can only handle so much complexity, so the natural response is to divide and conquer these problems by using abstractions. Programmer’s are used to these abstractions as they work with them every day, which further reduces the problems caused by the indirection.
A key point is that changing software can be very costly. Anything we can do to mitigate that cost will pay huge dividends over the life of the software. Programmer’s can do this by removing duplication and creating high quality code.
To me the gains seem to significantly outweigh the shortcomings of our changes. On balance, I would always choose our revised example.
Parting thoughts…
The structure and design of the code itself can have a big impact on the cost of supporting, maintaining and extending software. Refactoring is sometimes dismissed by non-geeks as an optional or low-value activity. In reality it is essential. Skipping it incurs a technical debt, which you will need to pay for next time the application needs to be modified.
The technique we used is an example of the Template Method design pattern. This technique relies on a main function that contains the logic of what we are trying to achieve (the template), and uses child classes to fill in the blanks on variable behaviour. The are many others ways of removing duplication from code, each with varying applicability to individual situations, and each offering specific strengths and weaknesses.
There is no way to mandate or standardise on a particular technique for all situations. It would be disastrous to declare that every time a programmer writes an email feature into software they must use a Template Method. Each case needs to be evaluated on its merits. Writing good code depends on the skills and experience of the programmer. Good code can not be rubber-stamped and produced en masse, not by standardisation nor by graphical tools, pre-built platforms, advanced IDEs, or any method other than applying good professional practices to every individual case.
Coding is sometimes considered a low-value part of software development (read: monkey work), but good programmers can contribute a huge amount to a project by making effective decisions at this level.
I hope this has helped to show some of the features of good code, why it is worth the effort, and also highlighted some of the issues programmers face when trying to write good code.