Remove The Smell From Your Build Scripts
From NeoWiki
Paul Duvall, CTO, Stelligent Incorporated
10 Oct 2006
- How much time do you spend maintaining project build scripts? Probably much more than you'd expect or would like to admit. It doesn't have to be such a painful experience. Development automation expert Paul Duvall uses this installment of Automation for the people to demonstrate how to improve a number of common build practices that prevent teams from creating consistent, repeatable, and maintainable builds.
I dislike the term "smell" when it comes to describing something like code. It feels strange to speak anthropomorphically about bits and bytes. It's not that the word "smell" doesn't accurately reflect a symptom that indicates that code may be wrong, it just sounds funny to me. Yet, I am choosing to perpetuate my vexation to describe software builds because, frankly, many build scripts I've seen over the years stink.
Often, even great programmers have difficulty constructing a build script; it's as if they recently learned how to write procedural code -- writing large monolithic build files, copying-and-pasting scripted code, hard coding attributes, and so on. I've always wondered why that is (maybe because build scripts don't get compiled into something a customer will eventually use?). Yet, we all know that build scripts are central to creating the code the customer eventually uses and if those scripts are a big ball of mud, creating that software efficiently becomes challenging.
Thankfully, you can easily employ a number of practices on a build (whether it be Ant, Maven, or even a custom one) that will go a long way toward keeping your builds consistent, repeatable, and maintainable. One effective way to learn how to create better build scripts is to see what not to do, understand why that is the case, and then see the correct way to do something. And in this article, I take that approach. I detail the following nine most common build smells you should avoid, why you should avoid them, and then how to fix them:
- IDE-only builds
- Copy-and-paste scripting
- Long targets
- Large build files
- Failing to clean up
- Hard-coded values
- Builds that succeed when tests fail
- Magic machines
- A lack of style
Although this is not meant to be a comprehensive list, it does represent some of the more common smells I've encountered over the years in build scripts I've read and written. Also, some tools, such as Maven, which are designed to handle much of the plumbing associated with builds, can help alleviate a portion of these smells, but many of these issues can occur no matter which tool you use.
Contents |
Avoid the aroma of IDE-only builds
An IDE-only build is a build that can be executed only through a developer's IDE and, unfortunately, this seems to be one of the more common build smells. The problem with an IDE-only build is that it can perpetuate the "works on my machine" problem where software works in a developer's environment but not in anyone else's environment. What's more, because IDE only builds are not very automatable, they are extremely challenging to integrate into a Continuous Integration environment; in fact, IDE-only builds are often impossible to automate without human intervention.
Let me be clear: It's fine to use an IDE to execute a build, but your IDE shouldn't be the only thing capable of building software. In particular, a fully scripted build enables teams to use multiple IDEs because the dependencies will be from the IDE to the build and not the other way around, as shown in Figure 1:
IDE-only builds prohibit automation, and the only way to fix this stench is to create a scriptable build. There is enough documentation and a plethora of books out there to guide you on your way (see Resources), and projects like Maven make it extremely easy to define a build from scratch too. Either way, pick a build platform and make your project scriptable as soon as possible.
Copy-and-paste is like cheap perfume
Duplicate code is a common problem on software projects. In fact, even many popular open source projects have duplication percentages in the 20-30 percent range. And just as code duplication can make a software program more difficult to maintain, so too does duplicate code in build scripts. For instance, imagine you need to reference specific files through Ant's fileset type, as shown in Listing 1:
Listing 1. Copy-and-paste Ant script
<fileset dir="./brewery/src" > <include name="**/*.java"/> <exclude name="**/*.groovy"/> </fileset>
If you need to refer to this set of files elsewhere, say for compilation, inspection, or generating documentation, you may end up using the same fileset in multiple places, and if, at a later point, you need to make a change to that fileset (say to exclude .groovy files), you may end up needing to make the change in multiple places. Clearly, this isn't a maintainable solution; however, fixing this smell is simple.
Ant's patternset type, shown in Listing 2, allows me to reference a logical name, which represents the files I need. Now when I need to add (or remove) additional files to the fileset, I have to do it only once.
Listing 2. Copy-and-paste Ant script
<patternset id="sources.pattern"> <include name="**/*.java"/> <exclude name="**/*.groovy"/> </patternset> ... <fileset dir="./brewery/src"> <patternset refid="sources.pattern"/> </fileset>
This fix will look familiar to anyone versed in object-oriented programming: Rather than defining the same logic over and over again in various classes, an established practice is to place that logic into a method, which can be called in various places. This method then becomes a single point of maintenance, limiting cascading defects and fostering reuse.
Don't savor long targets
In his book, Refactoring, Martin Fowler describes the issue with the Long Method code smell quite nicely as "the longer a procedure is, the more difficult it is to understand." Long methods, in essence, also end up having too much responsibility. When it comes to builds, the Long Target build smell presents a script that is more difficult to understand and maintain. Listing 3 shows a relatively long target:
Listing 3. Long target
<target name="run-tests"> <mkdir dir="${classes.dir}"/> <javac destdir="${classes.dir}" debug="true"> <src path="${src.dir}" /> <classpath refid="project.class.path"/> </javac> <javac destdir="${classes.dir}" debug="true"> <src path="${test.unit.dir}"/> <classpath refid="test.class.path"/> </javac> <mkdir dir="${logs.junit.dir}" /> <junit fork="yes" haltonfailure="true" dir="${basedir}" printsummary="yes"> <classpath refid="test.class.path" /> <classpath refid="project.class.path"/> <formatter type="plain" usefile="true" /> <formatter type="xml" usefile="true" /> <batchtest fork="yes" todir="${logs.junit.dir}"> <fileset dir="${test.unit.dir}"> <patternset refid="test.sources.pattern"/> </fileset> </batchtest> </junit> <mkdir dir="${reports.junit.dir}" /> <junitreport todir="${reports.junit.dir}"> <fileset dir="${logs.junit.dir}"> <include name="TEST-*.xml" /> <include name="TEST-*.txt" /> </fileset> <report format="frames" todir="${reports.junit.dir}" /> </junitreport> </target>
This long target (believe me, I've seen much longer ones) is performing four distinct processes: compiling source, compiling tests, running JUnit tests, and creating a JUnitReport. That's a lot of responsibility, not to mention adding to the associated complexity of all that XML in one place. This target can be broken into four distinct, logical, targets as demonstrated in Listing 4:
Listing 4. Extract targets
<target name="compile-src"> <mkdir dir="${classes.dir}"/> <javac destdir="${classes.dir}" debug="true"> <src path="${src.dir}" /> <classpath refid="project.class.path"/> </javac> </target> <target name="compile-tests"> <mkdir dir="${classes.dir}"/> <javac destdir="${classes.dir}" debug="true"> <src path="${test.unit.dir}"/> <classpath refid="test.class.path"/> </javac> </target> <target name="run-tests" depends="compile-src,compile-tests"> <mkdir dir="${logs.junit.dir}" /> <junit fork="yes" haltonfailure="true" dir="${basedir}" printsummary="yes"> <classpath refid="test.class.path" /> <classpath refid="project.class.path"/> <formatter type="plain" usefile="true" /> <formatter type="xml" usefile="true" /> <batchtest fork="yes" todir="${logs.junit.dir}"> <fileset dir="${test.unit.dir}"> <patternset refid="test.sources.pattern"/> </fileset> </batchtest> </junit> </target> <target name="run-test-report" depends="compile-src,compile-tests,run-tests"> <mkdir dir="${reports.junit.dir}" /> <junitreport todir="${reports.junit.dir}"> <fileset dir="${logs.junit.dir}"> <include name="TEST-*.xml" /> <include name="TEST-*.txt" /> </fileset> <report format="frames" todir="${reports.junit.dir}" /> </junitreport> </target>
As you can see, because each target has one responsibility, the code in Listing 4 is much easier to follow. By isolating targets based on purpose, you can reduce the complexity and, furthermore, provide the capability to use the targets in different contexts, enabling reuse if necessary.
Large build files also have a strong scent
Fowler also identifies the Large Class as a code smell. With build scripts, a similar smell is with large build files, which are amazingly difficult to read. It's hard to know which target is doing what and what the target's dependencies are. This, again, creates a maintenance issue; what's more, enormous build files usually have quite a lot of cut-and-paste aspects to them.
To reduce the size of build files, you can seek portions of the script that are logically related and extract those aspects into smaller build files that are executed by the main build file (for example, in Ant you can call other build files using the ant task).
Typically, I like to break up build scripts by core function and ensure they can be executed as stand-alone scripts (think build componentization). For example, I like to define four types of developer tests in my Ant builds: unit, component, system, and functional. Furthermore, I also like to run four types of automated inspectors: coding standard, dependency analysis, code coverage, and code complexity. Instead of placing the execution of these tests and inspectors in one monolithic build script (along with compilation, database integration, and deployment), I extract the test and inspector execution targets into two separate build files as demonstrated in Figure 2:
Not cleaning up
Builds that don't strictly reduce all underlying assumptions are a disaster waiting to happen. For instance, if your build doesn't reduce simple assumptions, such as removing generated binaries with stale data, an error could arise from a leftover file from a previous build. Or, perhaps (and even worse), a build may be "successful" because there were files from a previous build.
Fortunately, the solution is straightforward: You can easily eliminate assumptions by removing all generated directories and files from any previous builds. This simple action reduces assumptions and assures that your build's success or failure status is accurate. Listing 5 demonstrates an example of cleaning a build environment using the delete Ant task to remove any files or directories used in previous builds:
Listing 5. Cleaning up before yourself
<target name="clean"> <delete dir="${logs.dir}" quiet="true" failonerror="false"/> <delete dir="${build.dir}" quiet="true" failonerror="false"/> <delete dir="${reports.dir}" quiet="true" failonerror="false"/> <delete file="cobertura.ser" quiet="true" failonerror="false"/> </target>
Stray files from older builds have been known to cause many an unnecessary headache. Do yourself a favor and always remove any artifact your build creates before running a build.