November 22, 2024, Friday, 326

Remove The Smell From Your Build Scripts

From NeoWiki

Jump to: navigation, search

Paul Duvall, CTO, Stelligent Incorporated

10 Oct 2006

How much time do you spend maintaining project build scripts? Probably much more than you'd expect or would like to admit. It doesn't have to be such a painful experience. Development automation expert Paul Duvall uses this installment of Automation for the people to demonstrate how to improve a number of common build practices that prevent teams from creating consistent, repeatable, and maintainable builds.


I dislike the term "smell" when it comes to describing something like code. It feels strange to speak anthropomorphically about bits and bytes. It's not that the word "smell" doesn't accurately reflect a symptom that indicates that code may be wrong, it just sounds funny to me. Yet, I am choosing to perpetuate my vexation to describe software builds because, frankly, many build scripts I've seen over the years stink.

Often, even great programmers have difficulty constructing a build script; it's as if they recently learned how to write procedural code -- writing large monolithic build files, copying-and-pasting scripted code, hard coding attributes, and so on. I've always wondered why that is (maybe because build scripts don't get compiled into something a customer will eventually use?). Yet, we all know that build scripts are central to creating the code the customer eventually uses and if those scripts are a big ball of mud, creating that software efficiently becomes challenging.

Thankfully, you can easily employ a number of practices on a build (whether it be Ant, Maven, or even a custom one) that will go a long way toward keeping your builds consistent, repeatable, and maintainable. One effective way to learn how to create better build scripts is to see what not to do, understand why that is the case, and then see the correct way to do something. And in this article, I take that approach. I detail the following nine most common build smells you should avoid, why you should avoid them, and then how to fix them:

  • IDE-only builds
  • Copy-and-paste scripting
  • Long targets
  • Large build files
  • Failing to clean up
  • Hard-coded values
  • Builds that succeed when tests fail
  • Magic machines
  • A lack of style

Although this is not meant to be a comprehensive list, it does represent some of the more common smells I've encountered over the years in build scripts I've read and written. Also, some tools, such as Maven, which are designed to handle much of the plumbing associated with builds, can help alleviate a portion of these smells, but many of these issues can occur no matter which tool you use.

Contents

Avoid the aroma of IDE-only builds

Figure 1. IDE and build dependencies

An IDE-only build is a build that can be executed only through a developer's IDE and, unfortunately, this seems to be one of the more common build smells. The problem with an IDE-only build is that it can perpetuate the "works on my machine" problem where software works in a developer's environment but not in anyone else's environment. What's more, because IDE only builds are not very automatable, they are extremely challenging to integrate into a Continuous Integration environment; in fact, IDE-only builds are often impossible to automate without human intervention.

Let me be clear: It's fine to use an IDE to execute a build, but your IDE shouldn't be the only thing capable of building software. In particular, a fully scripted build enables teams to use multiple IDEs because the dependencies will be from the IDE to the build and not the other way around, as shown in Figure 1:

IDE-only builds prohibit automation, and the only way to fix this stench is to create a scriptable build. There is enough documentation and a plethora of books out there to guide you on your way (see Resources), and projects like Maven make it extremely easy to define a build from scratch too. Either way, pick a build platform and make your project scriptable as soon as possible.

Copy-and-paste is like cheap perfume

Duplicate code is a common problem on software projects. In fact, even many popular open source projects have duplication percentages in the 20-30 percent range. And just as code duplication can make a software program more difficult to maintain, so too does duplicate code in build scripts. For instance, imagine you need to reference specific files through Ant's fileset type, as shown in Listing 1:

Listing 1. Copy-and-paste Ant script

<fileset dir="./brewery/src" >
  <include name="**/*.java"/>
  <exclude name="**/*.groovy"/>
</fileset>

If you need to refer to this set of files elsewhere, say for compilation, inspection, or generating documentation, you may end up using the same fileset in multiple places, and if, at a later point, you need to make a change to that fileset (say to exclude .groovy files), you may end up needing to make the change in multiple places. Clearly, this isn't a maintainable solution; however, fixing this smell is simple.

Ant's patternset type, shown in Listing 2, allows me to reference a logical name, which represents the files I need. Now when I need to add (or remove) additional files to the fileset, I have to do it only once.

Listing 2. Copy-and-paste Ant script

<patternset id="sources.pattern">
  <include name="**/*.java"/>
  <exclude name="**/*.groovy"/>
</patternset>
...
<fileset dir="./brewery/src">
  <patternset refid="sources.pattern"/>
</fileset>

This fix will look familiar to anyone versed in object-oriented programming: Rather than defining the same logic over and over again in various classes, an established practice is to place that logic into a method, which can be called in various places. This method then becomes a single point of maintenance, limiting cascading defects and fostering reuse.

Don't savor long targets

In his book, Refactoring, Martin Fowler describes the issue with the Long Method code smell quite nicely as "the longer a procedure is, the more difficult it is to understand." Long methods, in essence, also end up having too much responsibility. When it comes to builds, the Long Target build smell presents a script that is more difficult to understand and maintain. Listing 3 shows a relatively long target:

Listing 3. Long target

<target name="run-tests">
  <mkdir dir="${classes.dir}"/>
  <javac destdir="${classes.dir}" debug="true">
    <src path="${src.dir}" />
    <classpath refid="project.class.path"/>
  </javac>
  <javac destdir="${classes.dir}" debug="true">
    <src path="${test.unit.dir}"/>
    <classpath refid="test.class.path"/>
  </javac>
  <mkdir dir="${logs.junit.dir}" />
  <junit fork="yes" haltonfailure="true" dir="${basedir}" printsummary="yes">
    <classpath refid="test.class.path" />
    <classpath refid="project.class.path"/>
    <formatter type="plain" usefile="true" />
    <formatter type="xml" usefile="true" />
    <batchtest fork="yes" todir="${logs.junit.dir}">
      <fileset dir="${test.unit.dir}">
        <patternset refid="test.sources.pattern"/>
      </fileset>
    </batchtest>
  </junit>    
  <mkdir dir="${reports.junit.dir}" />
  <junitreport todir="${reports.junit.dir}">
    <fileset dir="${logs.junit.dir}">
      <include name="TEST-*.xml" />
      <include name="TEST-*.txt" />
    </fileset>
    <report format="frames" todir="${reports.junit.dir}" />
  </junitreport>
</target>

This long target (believe me, I've seen much longer ones) is performing four distinct processes: compiling source, compiling tests, running JUnit tests, and creating a JUnitReport. That's a lot of responsibility, not to mention adding to the associated complexity of all that XML in one place. This target can be broken into four distinct, logical, targets as demonstrated in Listing 4:

Listing 4. Extract targets

<target name="compile-src">
  <mkdir dir="${classes.dir}"/>
  <javac destdir="${classes.dir}" debug="true">
    <src path="${src.dir}" />
    <classpath refid="project.class.path"/>
  </javac>
</target>

<target name="compile-tests">
  <mkdir dir="${classes.dir}"/>
  <javac destdir="${classes.dir}" debug="true">
    <src path="${test.unit.dir}"/>
    <classpath refid="test.class.path"/>
  </javac>
</target>

<target name="run-tests" depends="compile-src,compile-tests">
  <mkdir dir="${logs.junit.dir}" />
  <junit fork="yes" haltonfailure="true" dir="${basedir}" printsummary="yes">
    <classpath refid="test.class.path" />
    <classpath refid="project.class.path"/>
    <formatter type="plain" usefile="true" />
    <formatter type="xml" usefile="true" />
    <batchtest fork="yes" todir="${logs.junit.dir}">
      <fileset dir="${test.unit.dir}">
        <patternset refid="test.sources.pattern"/>
      </fileset>
    </batchtest>
  </junit>    
</target>

<target name="run-test-report" depends="compile-src,compile-tests,run-tests">
  <mkdir dir="${reports.junit.dir}" />
  <junitreport todir="${reports.junit.dir}">
    <fileset dir="${logs.junit.dir}">
      <include name="TEST-*.xml" />
      <include name="TEST-*.txt" />
    </fileset>
    <report format="frames" todir="${reports.junit.dir}" />
  </junitreport>
</target>

As you can see, because each target has one responsibility, the code in Listing 4 is much easier to follow. By isolating targets based on purpose, you can reduce the complexity and, furthermore, provide the capability to use the targets in different contexts, enabling reuse if necessary.

Large build files also have a strong scent

Figure 2. Extract build files

Fowler also identifies the Large Class as a code smell. With build scripts, a similar smell is with large build files, which are amazingly difficult to read. It's hard to know which target is doing what and what the target's dependencies are. This, again, creates a maintenance issue; what's more, enormous build files usually have quite a lot of cut-and-paste aspects to them.

To reduce the size of build files, you can seek portions of the script that are logically related and extract those aspects into smaller build files that are executed by the main build file (for example, in Ant you can call other build files using the ant task).

Typically, I like to break up build scripts by core function and ensure they can be executed as stand-alone scripts (think build componentization). For example, I like to define four types of developer tests in my Ant builds: unit, component, system, and functional. Furthermore, I also like to run four types of automated inspectors: coding standard, dependency analysis, code coverage, and code complexity. Instead of placing the execution of these tests and inspectors in one monolithic build script (along with compilation, database integration, and deployment), I extract the test and inspector execution targets into two separate build files as demonstrated in Figure 2:

Not cleaning up

Builds that don't strictly reduce all underlying assumptions are a disaster waiting to happen. For instance, if your build doesn't reduce simple assumptions, such as removing generated binaries with stale data, an error could arise from a leftover file from a previous build. Or, perhaps (and even worse), a build may be "successful" because there were files from a previous build.

Fortunately, the solution is straightforward: You can easily eliminate assumptions by removing all generated directories and files from any previous builds. This simple action reduces assumptions and assures that your build's success or failure status is accurate. Listing 5 demonstrates an example of cleaning a build environment using the delete Ant task to remove any files or directories used in previous builds:

Listing 5. Cleaning up before yourself

<target name="clean">
  <delete dir="${logs.dir}" quiet="true" failonerror="false"/>    
  <delete dir="${build.dir}" quiet="true" failonerror="false"/>    
  <delete dir="${reports.dir}" quiet="true" failonerror="false"/>    
  <delete file="cobertura.ser" quiet="true" failonerror="false"/>     
</target> 

Stray files from older builds have been known to cause many an unnecessary headache. Do yourself a favor and always remove any artifact your build creates before running a build.

The stench of hard-codedness

Just as copy-and-paste programming prohibits reuse, so too do hard-coded values. When build scripts contains hard-coded values, if an aspect requires modifications, you need to modify that value in more than one location. Or worse, you could miss one and have subtle errors associated with mismatching values. Moreover, if you follow my advice and choose to use multiple build scripts, hard-coded values can become the ultimate challenge in build maintenance. Trust me on that one!

For example, in Listing 6, the run-simian task has a number of hard-coded paths and values, namely the _reports directory:

Listing 6. Hard-coded values

<target name="run-simian">
  <taskdef resource="simiantask.properties" 
    classpath="simian.classpath" classpathref="simian.classpath" />
  <delete dir="./_reports" quiet="true" />
  <mkdir dir="./_reports" />
  <simian threshold="2" language="java" 
    ignoreCurlyBraces="true" ignoreIdentifierCase="true" ignoreStrings="true" 
    ignoreStringCase="true" ignoreNumbers="true"  ignoreCharacters="true">
    <fileset dir="${src.dir}"/>
    <formatter type="xml" toFile="./_reports/simian-log.xml" />
  </simian>
  <xslt taskname="simian"
    in="./_reports/simian-log.xml" 
    out="./_reports/Simian-Report.html" 
    style="./_config/simian.xsl" />
</target>

Hard-coding the _reports directory may make it difficult should I decide to push my Simian reports to another directory; furthermore, if other tools use this directory elsewhere in the script, someone could easily mistype the directory name, causing reports to show up in different directories. It is much easier and more maintainable to define a property value that points to this directory. Then throughout the script, I can reference the property, which means changes can be localized to one spot, the property definition. Listing 7 shows a refactored run-simian task:

Listing 7. Using properties

<target name="run-simian">
  <taskdef resource="simiantask.properties" 
    classpath="simian.classpath" classpathref="simian.classpath" />
  <delete dir="${reports.simian.dir}" quiet="true" />
  <mkdir dir="${reports.simian.dir}" />
  <simian threshold="${simian.threshold}" language="${language.type}" 
    ignoreCurlyBraces="true" ignoreIdentifierCase="true" ignoreStrings="true" 
    ignoreStringCase="true" ignoreNumbers="true"  ignoreCharacters="true">
    <fileset dir="${src.dir}"/>
    <formatter type="xml" toFile="${reports.simian.dir}/${simian.log.file}" />
  </simian>
  <xslt taskname="simian"
    in="${reports.simian.dir}/${simian.log.file}" 
    out="${reports.simian.dir}/${simian.report.file}" 
    style="${config.dir}/${simian.xsl.file}" />
</target>

Hard-coded values don't facilitate flexibility, they inhibit it. Just as it's easy to hard-code database connection Strings in your source code, you should also avoid hard-coding things like paths in build scripts.

Build succeeds when tests reek (or fail)

A build is much more than just source code compilation, it also may include the execution of automated developer tests, and if you want to keep your software functioning, don't let even one failed test creep into a build. After all, what's the point of having tests if they can't be trusted?

Listing 8 is an example of this build smell. Notice the haltonfailure attribute of the junit Ant task is set to false (its default value). This means the build will not fail even if any JUnit tests fail.

Listing 8. Smell: Build succeeds although the tests fail

<junit fork="yes" haltonfailure="false" dir="${basedir}" printsummary="yes">
  <classpath refid="test.class.path" />
  <classpath refid="project.class.path"/>
  <formatter type="plain" usefile="true" />
  <formatter type="xml" usefile="true" />
  <batchtest fork="yes" todir="${logs.junit.dir}">
  <fileset dir="${test.unit.dir}">
    <patternset refid="test.sources.pattern"/>
  </fileset>
  </batchtest>
</junit>

There are a couple of approaches to preventing this build smell. The first is simply to set the haltonfailure attribute to true. This will prevent a build from succeeding even if a test fails.

The only thing I don't like about this solution is that I like to see what percentage of my tests have failed so that I can see patterns in the failure. Therefore, the second approach is to set a property if any of the tests fail. Then, I configure Ant to fail the build after it has executed all of the tests. Either approach will work. Listing 9 demonstrates the second approach using the tests.failed property:

Listing 9. Tests fail the build

<junit dir="${basedir}" haltonfailure="false" printsummary="yes" 
  errorProperty="tests.failed" failureproperty="tests.failed">
  <classpath>
    <pathelement location="${classes.dir}" />
  </classpath>
  <batchtest fork="yes" todir="${logs.junit.dir}" unless="testcase">
    <fileset dir="${src.dir}">
      <include name="**/*Test*.java" />
    </fileset>
  </batchtest>
  <formatter type="plain" usefile="true" />
  <formatter type="xml" usefile="true" />
</junit>
<fail if="tests.failed" message="Test(s) failed." />

Builds that pass, even though tests fail, provide a false sense of security. If tests fail, fail the build: better to deal with a problem early than late one night when you'd rather be sleeping.

Magic machine smells

Figure 3. Magic machine

Of all the smells covered in this article, this one is probably the most fetid, for magic machines are those one-of-a-kind magical pieces of hardware that happen to be the only machines capable of building a company's software application. This scenario isn't as far-fetched as it may seem. I've run across these wizardly beasts a number of times in my career. These machines turn demonic, though, when dependencies are lost or when the inevitable bit rot strikes.

It's easy to see how a normal machine in a company's infrastructure can turn enchanted: over time, developers inadvertently added hard dependencies into the machine's script, made references to fully qualified directory paths, or even installed tools that only exist on a select machine, which slowly prevented the build from being able to run on any other machine. See Figure 3 for an example:

Hard-coded references to a machine, paths that include specific drives (like C:), and specific machine tools are all red flags that will quickly hex a machine. Any time you see a reference to the C: drive or a call to a specific tool (like grep), change the script immediately. If you catch yourself saying "but the C:\Program Files\ directory is on every machine" or some variation of this statement, think again.

Bad style stinks

As with programming style in mainstream languages, there are analogous considerations when managing build scripts. When considering programming style for build scripts, you need to account for the following:

  • property names
  • target names
  • directory names
  • environment variable names
  • indentation
  • line length

Personally, I prefer to leverage the rules of others as much as possible when dealing with stylistic conventions. Fortunately, a group of individuals have created such a reference called The Elements of Ant Style (see Resources). In it, the authors describe rules such as naming targets using lowercase with hyphens separating words, line length, and indentation. Whichever resource you choose, consistently applying stylistic rules will help in the long-term maintenance of build files.

Builds never smelled so nice

I can put up with the smell of cheap perfume; however, if there's one thing I can't stand, anthropomorphically speaking, it's the odor of unmaintainable build scripts. Just like smelly code will surely cost you valuable time down the road, so too can poorly designed builds. If the waft of inconsistent, unrepeatable, and unmaintainable builds is in the air, take the time now to refactor these vital assets. Your development environment will smell like roses.

Resources

Learn
Get products and technologies
  • Apache Ant: The mother of Java build platforms.
  • Maven: A powerful build platform built using lessons learned from Ant.
Discuss

About the author

Paul Duvall.jpg

Paul Duvall is the CTO of Stelligent Incorporated, which helps companies address software quality with effective developer testing strategies and Continuous Integration techniques that enable teams to monitor and improve code quality early and often. He is a contributing author to the UML™ 2 Toolkit and currently co-authoring Continuous Integration: Improving Software Quality and Reducing Risk (Addison-Wesley).