Go Testing By Example (GopherCon Australia 2023)
By Russ Cox
Summary
## Key takeaways - **Programming vs. Software Engineering Testing**: Programming tests aim to find bugs by breaking code, while software engineering tests ensure code remains functional over time with multiple contributors, requiring automated, continuous execution. [00:09], [02:06] - **Coverage is a Tool, Not a Goal**: Test coverage highlights untested code but doesn't replace the critical thinking needed to identify subtle bugs or difficult input cases. [06:05], [07:27] - **Exhaustive Testing with Reference Implementations**: For complex functions, exhaustively test all small inputs by comparing against a simpler, separate reference implementation to catch subtle errors. [09:21], [10:16] - **Test Cases in Files for Simplicity**: Defining test cases in separate files, using formats like txtar for multi-file scenarios, simplifies test creation and maintenance, especially for complex setups. [16:09], [21:41] - **Test Failures Must Be Readable**: Make test failures easy to understand by providing clear input/output details and allowing multiple cases to run with t.Error, facilitating quicker debugging. [19:46] - **Scripts as Powerful Test Cases**: Script-based tests, like those used for the Go command, offer a concise and understandable way to define complex testing scenarios, making them easy to add and maintain. [29:02], [30:14]
Topics Covered
- Programming vs. Software Engineering: Why Tests Matter
- Make it easy to add new test cases
- If you didn't add a test, you didn't fix the bug
- Custom Parsers and Printers Simplify Complex Tests
- The Unsung Hero: How Tests Make Bad Code Work
Full Transcript
Hi everyone. I gave this talk at GopherCon Australia
a few weeks ago, but there were some A/V challenges
that hurt the recording, so I re-recorded this version
at home. Enjoy!
This talk is about writing good tests, but first, let's think about why we write tests at all.
Why do programmers write tests? Books about programming say that
tests are about findings bugs in your program.
For example, this book says that “testing is a determined,
systematic attempt to break a program that you think is working.”
This is true. This is why programmers should write tests. But for most of us here today,
it's not why *we* write tests, because we are
not mere programmers. We are software engineers. Let me tell you what I mean.
I like to say that software engineering is what
happens to programming when you add time and other programmers.
Programming means getting a program working.
You have a problem to solve, you write some code,
you run it, you test it, you debug it, you get your answer, you’re done.
That's already pretty difficult, and testing is an important part of that process.
But software engineering means doing all that in programs you keep working
on for a long time and with other people, and that changes the nature of testing.
Let's start by looking at tests for a binary search function.
This function Find takes a sorted slice, a target value, and a comparison function,
and it uses binary search to find and return two things: first, the index where the target
would appear if it were present, and second, a boolean saying whether the target is present.
Most binary searches are buggy, and this one is no exception. Let's test it.
Here is a nice interactive tester for a binary search function.
You type two numbers n and t. It makes a slice of n elements containing increasing multiples of 10,
and then it does a search for t in the slice and prints the results. And you repeat.
This may seem trivial, but how many of you have ever tested production
code just by running it and poking at it for a while? All of us have done that.
When you're programming, an interactive tester like this can be very useful for finding bugs,
although so far the code appears to be working.
But this tester is only good for programming.
If you're doing software engineering, meaning you are keeping the program running over a
long time and working with other people, this kind of tester is not too useful. You
need something that can be run by everyone else every day, while they are working on the code,
and that can be run automatically by a computer for every commit.
The problem is that testing your program by hand only makes sure it works *today*.
Automated, continuous testing makes sure that it keeps working tomorrow and into the future,
even if other people who don't know the code as well start working on it. And to be clear, that
person who doesn't know the code very well might be *you* six months or even six weeks from now.
Here is a software engineer's test. You can run it without any knowledge
of what the code should do. Any colleague or any computer can run
this using 'go test' and immediately understand whether the test passes.
You've seen tests like this already I'm sure.
The software engineering ideal is to have tests that catch every possible
mistake that might be made later. If your tests meet that ideal,
then you should be comfortable shipping your code to production automatically,
any time your tests are all passing, what people call continuous deployment.
If you don't do this already, if the idea of it makes you nervous,
it is worth asking yourself why. Either your tests are good enough or they're not.
If they are good enough, then why not do it? And if they're not good enough,
listen to those doubts and figure out what they are telling you about which tests are missing.
A few years back I was working on the server for the new Go web site go.dev.
We deployed the site manually back then, and at least once a week I would make a
change that worked fine on my machine but then wouldn't serve any pages at
all when I deployed it to production. This was annoying and embarrassing.
The solution was better tests and automated continuous deployment. Now every time a commit
lands in the repo, we use a Cloud Build program to run local tests, push the code
out to a fresh server, run a few more tests that only run in production, and then if all is well,
redirect traffic to the new server. This made things better in two different ways. First,
I stopped causing embarrasing web site outages. Second, everyone stopped having
to think about deploying the web site at all. If they want to make a change,
such as fixing a typo or adding a new blog post, they just mail the change,
it gets reviewed, tested, and submitted, and then the automatic process does the rest.
To be confident that your program won't break when other people edit it,
and to be confident that your program can be pushed to production any time the tests
are passing, you need a very good set of tests.
But what makes tests good? In general,
what makes test code good is the same thing that makes non-test code good: hard work, attention,
and time. I don't have any silver bullets or hard rules for writing good test code,
any more than I do for writing good non-test code. However, I do have a collection of
tips based on what has worked well for us on Go, and I'll share twenty tips in this talk.
Tip #1. Make it easy to add new test cases. This is the most important tip,
because if it's not easy to add new test cases, you won't.
Go already helps with this. This is the most trivial test of the function Foo.
We designed Go tests specifically to be very easy to write. There's
no bookkeeping or ceremony that gets in your way.
At the level of package testing, this is pretty good,
but in a specific package, you can do even better.
I'm sure you know about table driven tests. We encourage table-driven tests
because they make it very easy to add new test cases. Here is the one we saw before:
Let's suppose we have only this one test case and we think of a
new case to test. We don't have to write any new code at all,
just one new line of data. If the goal is “make it easy to add new tests”,
then for simple functions like this, adding a line to a table is about as good as it gets.
It does raise one question though: what cases should we add? That takes us to the next tip.
Tip #2. Use test coverage to find untested code. After all,
tests can't catch bugs in code they don't run.
Go ships with built-in support for test coverage.
And here's what it looks like.
You run go test -coverprofile to write a coverage profile,
and then you run go tool cover to view it in your browser.
In the display, we can see that our test case is not very good:
the actual binary search code is red, meaning entirely untested.
The next step is to look at the untested code and
think about what kinds of tests would cause those lines to run.
On closer examination, we only tested with an empty slice,
so let's add a case with a slice that isn't empty.
Now we can run the coverage again. This time I'm going to read the coverage profile with
a little command-line program I wrote called "uncover". Uncover
prints the lines of code not covered by a test. It doesn't give you the big
picture that the web view does, but it lets you stay in a shell window.
Uncover shows us that there's just one line left that isn't executed by
the tests. It's the line that moves into the second half of the slice,
which makes sense since our target is the very first element.
Let's add one more test, searching for that last element. When we run the test,
it passes, and we have 100% coverage. Great. Are we done?
No, and that takes us to the next tip.
Tip #3. Coverage is no substitute for thought.
Coverage is very useful for pointing out parts of your code that you might have forgotten about,
but mechanical tools are no substitute for actually thinking about what the difficult
inputs are, and what's subtle in your code and how it might break.
Code with 100% testing coverage can still have bugs, and this code does.
This tip also applies to fuzzing, which is coverage driven.
And fuzzing is just trying to explore more and more paths through your
code to increase the coverage. Fuzzing is really helpful too,
but fuzzing is also no substitute for thought.
So what's missing here?
One thing to notice is that the only test case that doesn't find the target has an
empty input slice. We should check not finding the target in a slice with values.
And specifically we should check what happens when the target is less than all the values,
greater than all the values, and in the middle of the values. So let's add three more test cases.
Notice how easy it is for us to add a new test case. If you think of a case that your
code might not handle correctly, it needs to be as easy as possible to add that case,
or you will be tempted not to bother. If it's too difficult, you won't.
You can also see how we're starting to enumerate all the important ways that
this function can go wrong. The tests constrain all future development to
keep the binary search working at least this well.
When we run these tests, they fail. The returned index i is correct,
but the boolean indicating whether the target was found is wrong. So let's look at that.
Reading the code, the boolean expression at the return is wrong. It was only checking that the
index is in range. It also needs to check also that the value at that index equals the target.
So we can make that change, highlighted here, and now the tests pass.
Now we feel pretty good about the test:
coverage is happy, and we've thought hard about it. What more can we do?
Tip #4. Write exhaustive tests.
If you can test every possible input to a function, you should.
Now that may be unrealistic,
but you can usually test all inputs up to a certain size under certain constraints.
Here is an exhaustive test for binary search.
We start by making a slice of 10 elements, specifically the odd numbers 1 3 5 up to 19.
Then we consider all possible length prefixes of that slice.
And for each prefix we consider all possible targets from 0, which is smaller than all
the values in the slice, up to twice the length, which is larger than all the values in the slice.
This will exhaustively test every possible search
path through every possible size slice up to our length 10 limit.
But now how do we know what the answer is? We could do some math
based on the specifics of our test cases, but there is a better, more general way.
That way is to write a reference implementation that is different
from the real one. Ideally the reference implementation should be obviously correct,
but it can just be any different approach from real one. Usually that will be a simpler,
slower approach, since if it was simpler and faster you'd use it as the real implementation.
In this case, the reference implementation is called slowFind. The test checks that slowFind
and Find agree on the answer. Since the inputs are small, slowFind can be a simple linear search.
This pattern of generating all possible inputs up to some size and comparing the
results against a simple reference implementation is very powerful.
One important thing it does is cover all the basic corner cases, like a 0-element slices,
1-element slices, slices that have odd length, even length, power of two length, and so on.
The vast majority of bugs in most programs can be reproduced by small inputs,
so testing all the small inputs is very effective.
It turns out that this exhaustive test passes. Our thinking was pretty good.
Now, if the exhaustive test fails, that means Find and slowFind disagree,
so at least one is buggy, but which don't know which one. It helps to add a direct
test of slowFind, and that's easy since we already have a table of test data.
This is another benefit of table-driven tests:
the tables can be used to test more than one implementation.
Tip #5. Separate test cases from test logic.
In a table-driven test, the cases are in the table, and the loop that processes them is the
test logic. As we just saw, separating them lets you use the same test cases in multiple contexts.
So now are we done wth binary search?
It turns out no, there is still a bug left, which leads us to:
Tip #6. Look for special cases.
Even if we've done an exhaustive test of all the small cases, there may still be bugs lurking.
Now here is the code again. There's one bug left. You can pause the video and look
at it for a while. Does anyone see it?
It's OK if you don't see it. It's a very special case, and it took people decades to notice.
Knuth told us that although binary search was published in 1946 the first correct
binary search wasn't published until 1964. But this bug wasn't discovered until 2006.
The bug is that if the number of elements in the slice is very close to the maximum value for an
int, then i+j overflows, so then i+j / 2 is the wrong calculation for the middle of the slice.
This bug was discovered back in 2006 in C programs using 64-bit memories and 32-bit ints,
indexing arrays with more than a billion entries. This particular combination is
basically never going to happen in Go, because we require 64-bit memories to use 64-bit ints,
exactly to avoid this kind of problem. But since we know about the bug, and you never
know how you or someone else will adapt the code in the future, it is worth avoiding it.
There are two standard fixes to keep the math from
overflowing. The slightly faster one is to do an unsigned divide.
Suppose we fix that. Now are we done? No. Because we haven't written a test.
Tip #7. If you didn't add a test, you didn't fix the bug.
This is true in two different ways.
The first way is the programming way. If you didn't test it,
the bug might not even be fixed. This may sound silly, but how often has this
happened to you? Someone tells you about a bug. You know immediately what the fix
is. You make the change and tell them it's fixed. And they come back right and say nope,
it's still broken. Writing a test saves you that embarrassment. You can say, well,
I'm sorry I didn't fix your bug, but I did fix a bug, and I'll take a look at this one again.
The second way this is true is the software engineering way,
the "time and other programmers" way. Bugs are not random. In any given program,
certain mistakes are far more likely than others. So if you made the mistake once, you or someone
else will probably make it again in the future. Without a test to stop them, the bugs comes back.
Now this specific test is hard to write, because the input has to be very large, but this tip is
true even when the test is hard to write. In fact, it's usually more true in that case.
To test this case, one possibility would be to write a test that only
runs on 32-bit systems and that does a binary search over two gigabytes of
uint8s. But that's a lot of memory and we don't have many 32-bit systems anymore.
There's a more clever answer in this case, as there often is for testing hard-to-find bugs.
We can make a slice of empty structs, which takes up no memory no matter how long it is.
This test calls Find on a slice of MaxInt empty structs, looking for an empty struct
as a target, but then it passes in a comparison function always returns -1,
claiming that the slice element is less than the target. This will make the binary search
investigate larger and larger indexes into the slice, which is how we can reach the overflow.
If we revert our fix and run this test, sure enough, the test fails.
And with our fix, the test passes. Now the bug is fixed.
Tip #8. Not everything fits in a table. This special case didn't, and that's okay.
But lots of things do fit in a table.
Here's one of my favorite test tables. This is from the fmt.Printf tests. Each
row is a printf format, a value, and the expected string. The real table
is far too large to fit on a slide but here are a few lines from it.
If you read through the table you start to see what are clearly bug fixes.
Remember Tip #7. If you didn't add a test, you didn't fix the bug.
The table made each of these tests trivial to add,
and adding them makes sure these bugs never come back.
Tables are one way to separate test cases from test logic and
make it easy to add new test cases, but sometimes you have so many tests
that it makes sense to avoid even the overhead of writing Go syntax.
For example here is a test file from package strconv for testing
conversion between strings and floating point numbers.
You might think that it's too much work to write a parser for this input, but once you know how, it's
not much work, and being able to define testing mini-languages turns out to be incredibly useful.
So I'm going to walk quickly through the parser to show there's not much to it.
We read the file.
Then we split it into lines.
For each line, we calculate the line number for error messages. Slice element 0 is line 1.
We cut off any comments on the end of the line.
And if the line is blank, we skip it.
This is pretty standard boilerplate so far. Now the good part. We split the
line into fields, and we pull out the four fields.
Then we do the conversion in float32 or float64 math according to the type field. myatof64 is
basically strconv.ParseFloat64 except it handles a decimal p format that lets
us write the test cases the way they were written in the paper I copied them from.
Finally, it the result is not what we want, we print the error.
This is a lot like a table-driven test.
We just parse the file instead of ranging over a table.
It doesn't fit on one slide, but it does fit on one screen when you're developing.
Tip #9. Test cases can be in testdata files.
They don't have to be in your source code.
As another example, the Go regular expression package includes some
testdata files copied from the AT&T POSIX regular expression library.
I won't go into the details here, but I am grateful that they chose to use a
file-driven test for that library, because it meant I could reuse the testdata files for Go.
It's another ad-hoc format, but it's easily parsed and easily edited.
Tip #10. Compare against other implementations.
Comparing against the AT&T regexp's test cases helped make sure that Go's package
handled various corner cases exactly the same way. We also compare Go's package against the
C++ RE2 library. To avoid needing to compile the C++ code, we run it in a mode that logs
all its test cases to a file, and then we check in that file in Go as testdata.
Another way to store test cases in files is to have pairs of files,
one for input and one for output. To implement go test -json,
there is a program called test2json that reads test output and converts it to JSON output.
The test data is pairs of files: test output, and JSON output.
Here's the shortest file.
This test output is at the top, and that's is the input to test2json,
and that should produce the JSON output at the bottom.
Here's the implementation, to show the idioms for reading test data from files.
We start by using filepath.Glob to find all
the testdata. If that fails or doesn't find any, we complain.
Otherwise, we loop over all the files. For each one, we create a subtest name by taking
the base file name, without the testdata/ directory name, and without the file suffix.
Then we Run a subtest with that name.
If your test cases are complex enough to have one per file,
it almost always makes sense to make each its own subtest.
That way when one is failing you can run just that specific file with go test -run.
For the actual test case, we just have to read the file,
run the converter,
and check whether the results match.
For the check, I started out using bytes.Equal,
but over time it became worthwhile to write a custom diffJSON that parses
the two JSON results and prints a nice explanation of what's actually different.
Tip #11. Make test failures readable.
Rewinding a bit, we've already seen this with binary search.
I think we all agree that the pink box is not a good failure. But there
are two details in the yellow box that make these failures especially good.
First, we check both return values in a single if statement,
and then we print the full input and output in a concise single line.
Second, we don't stop at the first failure. We call
t.Error instead of t.Fatal to let more cases run.
Combined, these two choices let us see the full
details of each failure and look for patterns across multiple failures.
Moving back to test2json, here is how its test fails. It calculates
which events are different and marks them clearly.
One important point is that you don't have to write this kind of
sophisticated code when you first write the test.
bytes.Equal was fine to get going and focus on the code.
But as the failures become more subtle and you notice
yourself spending too much time just reading the failure output,
that's a good signal to spend some time on making them more readable.
Also, these kinds of tests can be a bit annoying to update
if the exact output changes and you need to correct all of the test data files.
Tip #12. If the answers can change, write code to update them.
The usual way to do this is to add a -update flag to the test.
Here's the updating code for test2json. The test defines a new flag -update. When the flag is true,
the test writes the computed answer to the answer file instead of calling diffJSON.
Now, when we make an intentional change to the JSON format,
"go test -update" corrects all the answers. You can also use version control tools like
"git diff" to review the changes and back them out if they don't look right.
Staying on the topic of test files, sometimes it's annoying to have a test
case split across multiple files. If I was writing this test today, I wouldn't do that.
Tip #13. Use txtar for multi-file test cases.
Txtar is a new archive format we designed a few
years ago specifically to solve the multi-file test case problem.
The Go parser is in golang.org/x/tools/txtar,
and I've also found parsers written in Ruby, Rust, and Swift.
Txtar's design had three goals.
First, be trivial enough to create, edit, and read by hand.
Second, be able to store trees of text files, because we needed that for the go command.
And third, diff nicely in git history and code reviews.
Non-goals included being a completely general archive format,
storing binary data, storing file modes,
storing special files like symbolic links, and so on.
These are non-goals because archive file formats tend toward
becoming arbitrarily complex, and complexity directly contradicts the first goals.
These goals and non-goals led to a very simple format. Here is an example:
The txtar file starts with a comment, in this case "Here are some greetings." And
then in general there are zero or more files, each introduced by a line of the form dash dash space
file name space dash dash. This archive has two one-line files, hello and g'day.
That's it, that's the entire format. There is no escaping, no quoting,
no support for binary data, no symlinks, no possible syntax errors, and no complications.
Here is a real use in testdata for a package that computes diffs:
In this case, the comment is useful for people, to record what's being tested,
and then in this test each case is two files followed by their diff.
*Using* txtar files is almost as trivial as writing them.
Here is the test for the diff package we were looking at.
This is the usual file-based loop but we call txtar.ParseFile on the file.
Then we insist that the archive contains three files, the third being named diff.
Then we diff the two input files and check that the result matches the expected diff.
And that’s the whole test.
You may have noticed the file data is passed to this function "clean" before being used. Clean
lets us add some diff-specific extensions for this test without complicating the txtar format itself.
The first extension handles lines ending in spaces, which do happen in diffs.
Lots of editors want to remove those trailing spaces,
so the test allows placing a $ at the end of a txtar data line
to mark the ending, and clean removes that $.
In this example, the marked lines need to end in a single space.
Also, txtar insists that every line in a file ends in a newline character,
but we want to test diff's behavior on files that don't end in a newline.
So the test allows a literal carat capital D at the end.
Clean removes both the carat-D and the newline that follows it.
In this case the 'new' file ends up without a final newline,
which the diff correctly reports.
So even though txtar is incredibly simple, you can layer your own format
adjustments on top easily. Of course, it is important to document these
so that the people who work on the test next understand them.
Tip #14. Annotate existing formats to create testing mini-languages.
Annotating an existing format, like adding the $ and carat-D to txtar,
is a powerful tool.
Here's another example of annotating an existing format. This is a test for the Go type checker.
This is a plain Go input file,
but the expected type errors have been added in slash-star-ERROR comments. We
use slash-star comments so we can place them exactly where the error should be reported.
The test runs the type checker and checks that it produces the
expected messages at the expected locations and does not produce any unexpected messages.
Here's another example from the type checker. In this test, we've added an assert annotation on
top of the usual Go syntax. This lets us write tests of constant arithmetic, like this one.
The type checker is already computing the boolean value of each of those constant expressions,
so checking the assert is really just checking that the constant has evaluated to true.
Here's another example of an annotated format. Ivy
is an interactive calculator. You type programs, usually simple expressions,
and it prints back the answers. The test cases are files that look like this:
The unindented lines are Ivy input,
and the indented lines are annotations of what output to expect Ivy to print at that point.
It doesn’t get much easier to write a new test case than this.
These annotated formats are extending existing parsers
and printers. Sometimes it helps to write your own parsers and printers from scratch.
After all, most tests involve creating or inspecting data,
and those tests are always far nicer
when you can work with the data in a convenient form.
Tip #15. Write parsers and printers to simplify tests.
These parsers and printers don't have to be for standalone scripts in testdata files.
It's also possible to use them in regular Go code.
Here is a fragment of a test for the code that runs deps.dev.
This test sets up some database table rows.
It calls a function that uses the database and is being tested.
And then it checks that the database contains the expected results.
The Insert and Want calls are using a mini-language
for database contents written specifically for these tests.
The parser is as easy as it looks: it splits the input into lines
and then splits each line into fields. The first line gives the column names. That's it.
The exact spacing in these strings doesn't matter,
but of course it looks nice if they're all aligned.
So to support this test, the deps.dev team
also has a code formatter written just for these tests.
It uses the Go standard library to parse the test source files.
Then it walks over the Go syntax tree to look for calls to Insert or Want.
It extracts the string arguments and parses them into tables.
Then it reprints the tables back to strings, reinserts the strings back into the syntax tree,
and reprints the syntax tree back to Go source code.
This is just an extended version of gofmt,
using the same packages that gofmt uses. I won't show it to you, but it's not much code.
The parser and printer took some time to write. But now every time someone writes a test,
the test is that much easier to write. And every time a test fails or needs updating,
it's that much easier to debug. If you're doing software engineering,
the benefits scale with the number of programmers and the lifetime of the project.
For deps.dev, already the time spent on this parser and printer
has been saved many times over.
Perhaps even more importantly, because tests are easier to write,
you are likely to write more tests, which results in higher-quality code.
Tip #16. Code quality is limited by test quality.
If you can't write high-quality tests,
you won't write enough tests, and you won't end up with high-quality code.
Now I want to show you some of the highest quality tests I've ever worked on,
which are the tests for the go command.
These bring together many of the ideas we've seen so far.
Here is a simple but real go command test.
This is a txtar input, with a single file hello.go.
The archive comment is a script written in a simple line-at-a-time command language.
In the script, "env" sets an environment variable to turn off Go modules.
A hash sign introduces a comment.
And "go" runs the go command, which should in turn run helo world.
That program should print hello world to standard error.
The "stderr" command checks that the standard error printed by the
previous command matches a regular expression.
So this test runs "go run hello.go" and checks that it printed hello world to standard error.
Here's another real test.
Notice at the bottom that a.go is an invalid program
since it is importing an empty string.
The bang at the start of the first line is a NOT operator.
NOT go list a.go means go list a.go should fail.
The next line, NOT stdout dot, means that there should be no matches on
standard output for the regular expression dot, meaning no text at all should be printed.
Next, standard error should have an invalid import path message.
And finally there should NOT be a panic.
Tip #17. Scripts make good tests.
These scripts make it incredibly easy to add a new test case.
Here is our smallest test: two lines. I added this one recently
after I broke the error message printed for an unknown command.
In all, we have over 700 of these script tests, ranging from two lines to over 500 lines.
These test scripts replaced a more traditional test scaffold with methods. This slide shows
one of the real tests it replaced, behind the script translation. The details don't
matter except to notice that the script is much easier to write and understand.
Tip #18. Try rsc.io/script for your own script-based test cases.
It has been about five years since we created the go script tests, and we're very happy with
the specific script engine. Bryan Mills took the time to give it a very nice API,
and earlier in November I posted it for import at rsc.io/script. Now I said "Try" because it's
a bit new, and ironically it does not have enough tests itself, since the importable
package is only a few weeks old, but you still might find it useful. We might put it somewhere
more official when we have more experience with it. If you do try it, let me know how it goes.
The motivation for extracting the script engine was to reuse it
in a different part of the go command tests.
This script is preparing a Git repository
containing a module that we want to import during a regular go command script test.
You can see it sets some environment variables,
runs an actual git init, sets the time,
runs more git commands to add a hello world file to the repo,
and then checks that we got exactly the repo what we wanted.
Once again, the tests did not start out this way, which leads me to the next tip.
Tip #19. Improve your tests over time.
Originally, we didn't have those repo scripts.
We created small test repos by hand and posted them on GitHub, Bitbucket,
and other hosting servers depending on which version control system we needed.
That worked okay but it meant that if any of these servers went down, the tests failed.
Eventually we took the time to build our own
cloud server that could serve repos for every version control system.
Now we created the repos by hand, zipped them up, and copied them to the server.
That was better, since now there was only one server that could take down our tests,
but sometimes there were networking problems too. It was also a problem that the test repos
themselves were not version controlled and they were not near the tests that used them.
The script-based version builds and serves these repos entirely locally as part of the
test. And the repo descriptions are now easy to find, change, and review.
This is a lot of infrastructure, but it's testing a lot of code too.
If you only have 10 lines of code, you should *not* have thousands of lines
of test framework. But if you have a hundred thousand lines of code,
which is about what the go command is, then a few thousand lines to make tests
better, or even ten thousand lines, is almost certainly a good investment.
Tip #20. Aim for continuous deployment.
There may be policy reasons that you can't actually deploy your code
on every commit that passes all the tests, but aim for it anyway.
As I mentioned at the start of the talk, any doubts you have about continuous
deployment are helpful little voices telling you what needs better testing.
And the key to better testing is of course make it easy to add new tests.
Even if you never actually enable continuous deployment, aiming for it can help keep you
honest, improve the quality of your tests, and improve the quality of your code.
I mentioned earlier that the Go web site uses continuous deployment.
On each commit, we run tests to decide whether the latest version of the code
can be deployed and have traffic routed to it.
At this point you won't be surprised that we wrote
a testing script language for these tests. Here is what they look like.
Each test begins with an HTTP request. Here we GET the main go.dev page.
Then there are assertions about the response.
Each assertion is of the form "field, operator, value".
Here the field is the body, the operator is contains,
and the value is literal text that the body must contain.
This test is checking that the page is rendering,
so it checks for basic text as well as a subheading.
To make it easier to write the tests, there's no quoting at all:
the value is just the rest of the line after the operator.
Here's another test case.
For historical reasons, /about needs to redirect to pkg.go.dev.
Here's another. Nothing special here, just checking that the case studies page renders,
because it is synthesized from many other files.
Another field the test can check is the HTTP response code,
and here's a bug fix. We were accidentally serving these files from the Go repo root as
if they were Go web site pages. We want 404s for these instead.
Another field you can test is header foo, for some foo.
In this case, the header Content-Type
needs to be set correctly for the main blog page and its JSON feed.
Here's another example. This one uses the regular
expression matching operator tilde and the \s+ syntax to make sure that
the page has the right text no matter how many spaces are between the words.
That got a little bit old, so we added a new field named trimbody that is the body with
all runs of spaces replaced by a single space. This example also shows that the
value can be supplied as multiple indented lines, to make multiline matches easier.
We also have some tests that can't be run locally
but are still worth running in production before we migrate live traffic to the server.
And here are two of these.
These depend on network access to the production playground backends.
These cases are the same except for the URLs. And this is not a terribly readable test,
since these are our only POST tests. If we added more of these,
I would probably take the time to make them a bit nicer,
in the spirit of improve your tests over time.
But for now they're fine, and they serve an important purpose.
Finally, as usual, it's easy to add bug fixes. In issue 51989,
certain talks were not rendering at all.
So this test checks that the page *does* render and contains a distinctive piece of text.
Issue 51989 is never going to happen again, at least not on the live web site. There will
be other bugs for sure, but that one is gone for good, and that's progress.
That's all the examples I have time to show you, but one final thought.
I'm sure you've had the experience of chasing down a bug and ending
up in an important piece of code that is wrong.
But somehow it's wrong in a way that doesn't matter most of the time,
or wrong in a way that's cancelled out by some other wrong piece of code.
And you've probably thought to yourself “How did this code ever work?”
If you wrote the code, you might have thought you got lucky. And if someone else wrote the code,
you might have thought poorly of them and then also thought that they got lucky.
But most of the time the answer isn't luck.
The answer to how did this code ever work? is almost always: because it had a test.
Sure the code is wrong, but the test checked that it was correct enough for the rest
of the system to work, and that's what mattered.
Maybe the person who wrote the code was in fact a bad programmer,
but they were a good software engineer, because they wrote a test,
and that's why the overall system containing that code works.
What I hope you take away from this talk is not the specific details of any given test,
although I do hope you will keep an eye out for good uses for
small parsers and printers. Anyone can learn how to write those,
and using them effectively can be a software engineering superpower.
Ultimately, these were good tests for these packages.
Good tests for your packages may well look different.
And that’s fine.
But make it easy to add new test cases, and make sure that you have good,
clear, high-quality tests. Remember that code quality is
limited by test quality, so invest in improving
your tests gradually over time. The longer you work on a project,
the better your tests should become. And aim for continuous deployment,
at least as a thought experiment to understand what's not tested well enough.
Overall, put as much thought and care and effort into writing good test code as you do into
writing good non-test code. It’s absolutely worth it.
Loading video analysis...