Remove The Smell From Your Build Scripts

From NeoWiki

(Difference between revisions)

Revision as of 06:56, 3 March 2007

Paul Duvall, CTO, Stelligent Incorporated

10 Oct 2006

How much time do you spend maintaining project build scripts? Probably much more than you'd expect or would like to admit. It doesn't have to be such a painful experience. Development automation expert Paul Duvall uses this installment of Automation for the people to demonstrate how to improve a number of common build practices that prevent teams from creating consistent, repeatable, and maintainable builds.

I dislike the term "smell" when it comes to describing something like code. It feels strange to speak anthropomorphically about bits and bytes. It's not that the word "smell" doesn't accurately reflect a symptom that indicates that code may be wrong, it just sounds funny to me. Yet, I am choosing to perpetuate my vexation to describe software builds because, frankly, many build scripts I've seen over the years stink.

Often, even great programmers have difficulty constructing a build script; it's as if they recently learned how to write procedural code -- writing large monolithic build files, copying-and-pasting scripted code, hard coding attributes, and so on. I've always wondered why that is (maybe because build scripts don't get compiled into something a customer will eventually use?). Yet, we all know that build scripts are central to creating the code the customer eventually uses and if those scripts are a big ball of mud, creating that software efficiently becomes challenging.

Thankfully, you can easily employ a number of practices on a build (whether it be Ant, Maven, or even a custom one) that will go a long way toward keeping your builds consistent, repeatable, and maintainable. One effective way to learn how to create better build scripts is to see what not to do, understand why that is the case, and then see the correct way to do something. And in this article, I take that approach. I detail the following nine most common build smells you should avoid, why you should avoid them, and then how to fix them:

IDE-only builds
Copy-and-paste scripting
Long targets
Large build files
Failing to clean up
Hard-coded values
Builds that succeed when tests fail
Magic machines
A lack of style

Although this is not meant to be a comprehensive list, it does represent some of the more common smells I've encountered over the years in build scripts I've read and written. Also, some tools, such as Maven, which are designed to handle much of the plumbing associated with builds, can help alleviate a portion of these smells, but many of these issues can occur no matter which tool you use.

Avoid the aroma of IDE-only builds

An IDE-only build is a build that can be executed only through a developer's IDE and, unfortunately, this seems to be one of the more common build smells. The problem with an IDE-only build is that it can perpetuate the "works on my machine" problem where software works in a developer's environment but not in anyone else's environment. What's more, because IDE only builds are not very automatable, they are extremely challenging to integrate into a Continuous Integration environment; in fact, IDE-only builds are often impossible to automate without human intervention.

Let me be clear: It's fine to use an IDE to execute a build, but your IDE shouldn't be the only thing capable of building software. In particular, a fully scripted build enables teams to use multiple IDEs because the dependencies will be from the IDE to the build and not the other way around, as shown in Figure 1:

Figure 1. IDE and build dependencies

IDE-only builds prohibit automation, and the only way to fix this stench is to create a scriptable build. There is enough documentation and a plethora of books out there to guide you on your way (see Resources), and projects like Maven make it extremely easy to define a build from scratch too. Either way, pick a build platform and make your project scriptable as soon as possible.

Copy-and-paste is like cheap perfume

Duplicate code is a common problem on software projects. In fact, even many popular open source projects have duplication percentages in the 20-30 percent range. And just as code duplication can make a software program more difficult to maintain, so too does duplicate code in build scripts. For instance, imagine you need to reference specific files through Ant's fileset type, as shown in Listing 1:

Listing 1. Copy-and-paste Ant script

<fileset dir="./brewery/src" >
  <include name="**/*.java"/>
  <exclude name="**/*.groovy"/>
</fileset>

If you need to refer to this set of files elsewhere, say for compilation, inspection, or generating documentation, you may end up using the same fileset in multiple places, and if, at a later point, you need to make a change to that fileset (say to exclude .groovy files), you may end up needing to make the change in multiple places. Clearly, this isn't a maintainable solution; however, fixing this smell is simple.

Ant's patternset type, shown in Listing 2, allows me to reference a logical name, which represents the files I need. Now when I need to add (or remove) additional files to the fileset, I have to do it only once.

Listing 2. Copy-and-paste Ant script

<patternset id="sources.pattern">
  <include name="**/*.java"/>
  <exclude name="**/*.groovy"/>
</patternset>
...
<fileset dir="./brewery/src">
  <patternset refid="sources.pattern"/>
</fileset>

This fix will look familiar to anyone versed in object-oriented programming: Rather than defining the same logic over and over again in various classes, an established practice is to place that logic into a method, which can be called in various places. This method then becomes a single point of maintenance, limiting cascading defects and fostering reuse.

Don't savor long targets

In his book, Refactoring, Martin Fowler describes the issue with the Long Method code smell quite nicely as "the longer a procedure is, the more difficult it is to understand." Long methods, in essence, also end up having too much responsibility. When it comes to builds, the Long Target build smell presents a script that is more difficult to understand and maintain. Listing 3 shows a relatively long target:

Listing 3. Long target

  <target name="run-tests">
    <mkdir dir="${classes.dir}"/>
    <javac destdir="${classes.dir}" debug="true">
      <src path="${src.dir}" />
      <classpath refid="project.class.path"/>
    </javac>
    <javac destdir="${classes.dir}" debug="true">
      <src path="${test.unit.dir}"/>
      <classpath refid="test.class.path"/>
    </javac>
    <mkdir dir="${logs.junit.dir}" />
    <junit fork="yes" haltonfailure="true" dir="${basedir}" printsummary="yes">
      <classpath refid="test.class.path" />
      <classpath refid="project.class.path"/>
      <formatter type="plain" usefile="true" />
      <formatter type="xml" usefile="true" />
      <batchtest fork="yes" todir="${logs.junit.dir}">
        <fileset dir="${test.unit.dir}">
          <patternset refid="test.sources.pattern"/>
        </fileset>
      </batchtest>
    </junit>    
    <mkdir dir="${reports.junit.dir}" />
    <junitreport todir="${reports.junit.dir}">
      <fileset dir="${logs.junit.dir}">
        <include name="TEST-*.xml" />
        <include name="TEST-*.txt" />
      </fileset>
      <report format="frames" todir="${reports.junit.dir}" />
    </junitreport>
  </target>

This long target (believe me, I've seen much longer ones) is performing four distinct processes: compiling source, compiling tests, running JUnit tests, and creating a JUnitReport. That's a lot of responsibility, not to mention adding to the associated complexity of all that XML in one place. This target can be broken into four distinct, logical, targets as demonstrated in Listing 4:

Listing 4. Extract targets

  <target name="compile-src">
    <mkdir dir="${classes.dir}"/>
    <javac destdir="${classes.dir}" debug="true">
      <src path="${src.dir}" />
      <classpath refid="project.class.path"/>
    </javac>
  </target>
 
  <target name="compile-tests">
    <mkdir dir="${classes.dir}"/>
    <javac destdir="${classes.dir}" debug="true">
      <src path="${test.unit.dir}"/>
      <classpath refid="test.class.path"/>
    </javac>
  </target>

  <target name="run-tests" depends="compile-src,compile-tests">
    <mkdir dir="${logs.junit.dir}" />
    <junit fork="yes" haltonfailure="true" dir="${basedir}" printsummary="yes">
      <classpath refid="test.class.path" />
      <classpath refid="project.class.path"/>
      <formatter type="plain" usefile="true" />
      <formatter type="xml" usefile="true" />
      <batchtest fork="yes" todir="${logs.junit.dir}">
        <fileset dir="${test.unit.dir}">
          <patternset refid="test.sources.pattern"/>
        </fileset>
      </batchtest>
    </junit>    
  </target>

  <target name="run-test-report" depends="compile-src,compile-tests,run-tests">
    <mkdir dir="${reports.junit.dir}" />
    <junitreport todir="${reports.junit.dir}">
      <fileset dir="${logs.junit.dir}">
        <include name="TEST-*.xml" />
        <include name="TEST-*.txt" />
      </fileset>
      <report format="frames" todir="${reports.junit.dir}" />
    </junitreport>
  </target>

As you can see, because each target has one responsibility, the code in Listing 4 is much easier to follow. By isolating targets based on purpose, you can reduce the complexity and, furthermore, provide the capability to use the targets in different contexts, enabling reuse if necessary.

Navigation

Search