LongCut logo

Go Testing By Example (GopherCon Australia 2023)

By Russ Cox

Summary

## Key takeaways - **Programming vs. Software Engineering Testing**: Programming tests aim to find bugs by breaking code, while software engineering tests ensure code remains functional over time with multiple contributors, requiring automated, continuous execution. [00:09], [02:06] - **Coverage is a Tool, Not a Goal**: Test coverage highlights untested code but doesn't replace the critical thinking needed to identify subtle bugs or difficult input cases. [06:05], [07:27] - **Exhaustive Testing with Reference Implementations**: For complex functions, exhaustively test all small inputs by comparing against a simpler, separate reference implementation to catch subtle errors. [09:21], [10:16] - **Test Cases in Files for Simplicity**: Defining test cases in separate files, using formats like txtar for multi-file scenarios, simplifies test creation and maintenance, especially for complex setups. [16:09], [21:41] - **Test Failures Must Be Readable**: Make test failures easy to understand by providing clear input/output details and allowing multiple cases to run with t.Error, facilitating quicker debugging. [19:46] - **Scripts as Powerful Test Cases**: Script-based tests, like those used for the Go command, offer a concise and understandable way to define complex testing scenarios, making them easy to add and maintain. [29:02], [30:14]

Topics Covered

  • Programming vs. Software Engineering: Why Tests Matter
  • Make it easy to add new test cases
  • If you didn't add a test, you didn't fix the bug
  • Custom Parsers and Printers Simplify Complex Tests
  • The Unsung Hero: How Tests Make Bad Code Work

Full Transcript

Hi everyone. I gave this talk at GopherCon Australia 

a few weeks ago, but there  were some A/V challenges 

that hurt the recording, so I re-recorded this version  

at home. Enjoy!

This talk is about writing good tests, but first,  let's think about why we write tests at all.

Why do programmers write tests?  Books about programming say that  

tests are about findings bugs in your program.

For example, this book says  that “testing is a determined,  

systematic attempt to break a  program that you think is working.”

This is true. This is why programmers should  write tests. But for most of us here today,  

it's not why *we* write tests, because we are  

not mere programmers. We are software  engineers. Let me tell you what I mean.

I like to say that software engineering is what  

happens to programming when you  add time and other programmers.

Programming means getting a program working.

You have a problem to solve, you write some code,  

you run it, you test it, you debug  it, you get your answer, you’re done.

That's already pretty difficult, and testing  is an important part of that process.

But software engineering means doing  all that in programs you keep working  

on for a long time and with other people,  and that changes the nature of testing.

Let's start by looking at tests  for a binary search function.

This function Find takes a sorted slice,  a target value, and a comparison function,  

and it uses binary search to find and return  two things: first, the index where the target  

would appear if it were present, and second, a  boolean saying whether the target is present.

Most binary searches are buggy, and  this one is no exception. Let's test it.

Here is a nice interactive tester  for a binary search function.

You type two numbers n and t. It makes a slice of  n elements containing increasing multiples of 10,  

and then it does a search for t in the slice  and prints the results. And you repeat.

This may seem trivial, but how many  of you have ever tested production  

code just by running it and poking at it  for a while? All of us have done that.

When you're programming, an interactive tester  like this can be very useful for finding bugs,  

although so far the code appears to be working.

But this tester is only good for programming.

If you're doing software engineering, meaning  you are keeping the program running over a  

long time and working with other people,  this kind of tester is not too useful. You  

need something that can be run by everyone else  every day, while they are working on the code,  

and that can be run automatically  by a computer for every commit.

The problem is that testing your program  by hand only makes sure it works *today*.

Automated, continuous testing makes sure that  it keeps working tomorrow and into the future,  

even if other people who don't know the code as  well start working on it. And to be clear, that  

person who doesn't know the code very well might  be *you* six months or even six weeks from now.

Here is a software engineer's test.  You can run it without any knowledge  

of what the code should do. Any  colleague or any computer can run  

this using 'go test' and immediately  understand whether the test passes.

You've seen tests like this already I'm sure.

The software engineering ideal is to  have tests that catch every possible  

mistake that might be made later.  If your tests meet that ideal,  

then you should be comfortable shipping  your code to production automatically,  

any time your tests are all passing,  what people call continuous deployment.

If you don't do this already, if  the idea of it makes you nervous,  

it is worth asking yourself why. Either  your tests are good enough or they're not.  

If they are good enough, then why not  do it? And if they're not good enough,  

listen to those doubts and figure out what they  are telling you about which tests are missing.

A few years back I was working on the  server for the new Go web site go.dev.  

We deployed the site manually back then,  and at least once a week I would make a  

change that worked fine on my machine  but then wouldn't serve any pages at  

all when I deployed it to production.  This was annoying and embarrassing.

The solution was better tests and automated  continuous deployment. Now every time a commit  

lands in the repo, we use a Cloud Build  program to run local tests, push the code  

out to a fresh server, run a few more tests that  only run in production, and then if all is well,  

redirect traffic to the new server. This made  things better in two different ways. First,  

I stopped causing embarrasing web site  outages. Second, everyone stopped having  

to think about deploying the web site  at all. If they want to make a change,  

such as fixing a typo or adding a new  blog post, they just mail the change,  

it gets reviewed, tested, and submitted, and  then the automatic process does the rest.

To be confident that your program  won't break when other people edit it,  

and to be confident that your program can  be pushed to production any time the tests  

are passing, you need a very good set of tests.

But what makes tests good? In general,  

what makes test code good is the same thing that  makes non-test code good: hard work, attention,  

and time. I don't have any silver bullets  or hard rules for writing good test code,  

any more than I do for writing good non-test  code. However, I do have a collection of  

tips based on what has worked well for us on  Go, and I'll share twenty tips in this talk.

Tip #1. Make it easy to add new test  cases. This is the most important tip,  

because if it's not easy to  add new test cases, you won't.

Go already helps with this. This is the  most trivial test of the function Foo.  

We designed Go tests specifically  to be very easy to write. There's  

no bookkeeping or ceremony that gets in your way.

At the level of package  testing, this is pretty good,  

but in a specific package, you can do even better.

I'm sure you know about table driven  tests. We encourage table-driven tests  

because they make it very easy to add new  test cases. Here is the one we saw before:

Let's suppose we have only this  one test case and we think of a  

new case to test. We don't have  to write any new code at all,  

just one new line of data. If the goal  is “make it easy to add new tests”,  

then for simple functions like this, adding a  line to a table is about as good as it gets.

It does raise one question though: what cases  should we add? That takes us to the next tip.

Tip #2. Use test coverage to  find untested code. After all,  

tests can't catch bugs in code they don't run.

Go ships with built-in support for test coverage.

And here's what it looks like.

You run go test -coverprofile  to write a coverage profile,  

and then you run go tool cover  to view it in your browser.

In the display, we can see that  our test case is not very good:  

the actual binary search code is  red, meaning entirely untested.

The next step is to look at the untested code and  

think about what kinds of tests  would cause those lines to run.

On closer examination, we only  tested with an empty slice,  

so let's add a case with a slice that isn't empty.

Now we can run the coverage again. This time  I'm going to read the coverage profile with  

a little command-line program I  wrote called "uncover". Uncover  

prints the lines of code not covered  by a test. It doesn't give you the big  

picture that the web view does, but  it lets you stay in a shell window.

Uncover shows us that there's just  one line left that isn't executed by  

the tests. It's the line that moves  into the second half of the slice,  

which makes sense since our  target is the very first element.

Let's add one more test, searching for  that last element. When we run the test,  

it passes, and we have 100%  coverage. Great. Are we done?

No, and that takes us to the next tip.

Tip #3. Coverage is no substitute for thought.

Coverage is very useful for pointing out parts  of your code that you might have forgotten about,  

but mechanical tools are no substitute for  actually thinking about what the difficult  

inputs are, and what's subtle in  your code and how it might break.

Code with 100% testing coverage can  still have bugs, and this code does.

This tip also applies to fuzzing,  which is coverage driven. 

And fuzzing is just trying to explore  more and more paths through your  

code to increase the coverage. Fuzzing is really helpful too,  

but fuzzing is also no substitute for thought.

So what's missing here?

One thing to notice is that the only test  case that doesn't find the target has an  

empty input slice. We should check not  finding the target in a slice with values.  

And specifically we should check what happens  when the target is less than all the values,  

greater than all the values, and in the middle of  the values. So let's add three more test cases.

Notice how easy it is for us to add a new  test case. If you think of a case that your  

code might not handle correctly, it needs  to be as easy as possible to add that case,  

or you will be tempted not to bother.  If it's too difficult, you won't.

You can also see how we're starting to  enumerate all the important ways that  

this function can go wrong. The tests  constrain all future development to  

keep the binary search working at least this well.

When we run these tests, they fail.  The returned index i is correct,  

but the boolean indicating whether the target  was found is wrong. So let's look at that.

Reading the code, the boolean expression at the  return is wrong. It was only checking that the  

index is in range. It also needs to check also  that the value at that index equals the target.

So we can make that change, highlighted  here, and now the tests pass.

Now we feel pretty good about the test:  

coverage is happy, and we've thought  hard about it. What more can we do?

Tip #4. Write exhaustive tests.

If you can test every possible  input to a function, you should.

Now that may be unrealistic,  

but you can usually test all inputs up to  a certain size under certain constraints.

Here is an exhaustive test for binary search.

We start by making a slice of 10 elements,  specifically the odd numbers 1 3 5 up to 19.

Then we consider all possible  length prefixes of that slice.

And for each prefix we consider all possible  targets from 0, which is smaller than all  

the values in the slice, up to twice the length,  which is larger than all the values in the slice.

This will exhaustively test every possible search  

path through every possible size  slice up to our length 10 limit.

But now how do we know what the  answer is? We could do some math  

based on the specifics of our test cases,  but there is a better, more general way.

That way is to write a reference  implementation that is different  

from the real one. Ideally the reference  implementation should be obviously correct,  

but it can just be any different approach  from real one. Usually that will be a simpler,  

slower approach, since if it was simpler and  faster you'd use it as the real implementation.

In this case, the reference implementation is  called slowFind. The test checks that slowFind  

and Find agree on the answer. Since the inputs  are small, slowFind can be a simple linear search.

This pattern of generating all possible  inputs up to some size and comparing the  

results against a simple reference  implementation is very powerful.  

One important thing it does is cover all the  basic corner cases, like a 0-element slices,  

1-element slices, slices that have odd length,  even length, power of two length, and so on.

The vast majority of bugs in most programs  can be reproduced by small inputs,  

so testing all the small inputs is very effective.

It turns out that this exhaustive test  passes. Our thinking was pretty good.

Now, if the exhaustive test fails,  that means Find and slowFind disagree,  

so at least one is buggy, but which don't  know which one. It helps to add a direct  

test of slowFind, and that's easy since  we already have a table of test data.

This is another benefit of table-driven tests:  

the tables can be used to test  more than one implementation.

Tip #5. Separate test cases from test logic.

In a table-driven test, the cases are in the  table, and the loop that processes them is the  

test logic. As we just saw, separating them lets  you use the same test cases in multiple contexts.

So now are we done wth binary search?

It turns out no, there is still  a bug left, which leads us to:

Tip #6. Look for special cases.

Even if we've done an exhaustive test of all the  small cases, there may still be bugs lurking.

Now here is the code again. There's one bug left. You can pause the video and look  

at it for a while. Does anyone see it?

It's OK if you don't see it. It's a very special  case, and it took people decades to notice.

Knuth told us that although binary search  was published in 1946 the first correct  

binary search wasn't published until 1964.  But this bug wasn't discovered until 2006.

The bug is that if the number of elements in the  slice is very close to the maximum value for an  

int, then i+j overflows, so then i+j / 2 is the  wrong calculation for the middle of the slice.

This bug was discovered back in 2006 in C  programs using 64-bit memories and 32-bit ints,  

indexing arrays with more than a billion  entries. This particular combination is  

basically never going to happen in Go, because  we require 64-bit memories to use 64-bit ints,  

exactly to avoid this kind of problem. But  since we know about the bug, and you never  

know how you or someone else will adapt the  code in the future, it is worth avoiding it.

There are two standard fixes to keep the math from  

overflowing. The slightly faster  one is to do an unsigned divide.

Suppose we fix that. Now are we done?  No. Because we haven't written a test.

Tip #7. If you didn't add a test, you didn't fix the bug.

This is true in two different ways.

The first way is the programming  way. If you didn't test it,  

the bug might not even be fixed. This  may sound silly, but how often has this  

happened to you? Someone tells you about  a bug. You know immediately what the fix  

is. You make the change and tell them it's  fixed. And they come back right and say nope,  

it's still broken. Writing a test saves  you that embarrassment. You can say, well,  

I'm sorry I didn't fix your bug, but I did fix  a bug, and I'll take a look at this one again.

The second way this is true is  the software engineering way,  

the "time and other programmers" way.  Bugs are not random. In any given program,  

certain mistakes are far more likely than others.  So if you made the mistake once, you or someone  

else will probably make it again in the future.  Without a test to stop them, the bugs comes back.

Now this specific test is hard to write, because  the input has to be very large, but this tip is  

true even when the test is hard to write. In  fact, it's usually more true in that case.

To test this case, one possibility  would be to write a test that only  

runs on 32-bit systems and that does  a binary search over two gigabytes of  

uint8s. But that's a lot of memory and we  don't have many 32-bit systems anymore.

There's a more clever answer in this case, as  there often is for testing hard-to-find bugs.  

We can make a slice of empty structs, which  takes up no memory no matter how long it is.

This test calls Find on a slice of MaxInt  empty structs, looking for an empty struct  

as a target, but then it passes in a  comparison function always returns -1,  

claiming that the slice element is less than  the target. This will make the binary search  

investigate larger and larger indexes into the  slice, which is how we can reach the overflow.

If we revert our fix and run this  test, sure enough, the test fails.

And with our fix, the test  passes. Now the bug is fixed.

Tip #8. Not everything fits in a table.  This special case didn't, and that's okay.

But lots of things do fit in a table.

Here's one of my favorite test tables.  This is from the fmt.Printf tests. Each  

row is a printf format, a value, and  the expected string. The real table  

is far too large to fit on a slide  but here are a few lines from it.

If you read through the table you start  to see what are clearly bug fixes.

Remember Tip #7. If you didn't add  a test, you didn't fix the bug.

The table made each of these tests trivial to add,  

and adding them makes sure  these bugs never come back.

Tables are one way to separate  test cases from test logic and  

make it easy to add new test cases,  but sometimes you have so many tests  

that it makes sense to avoid even  the overhead of writing Go syntax.

For example here is a test file  from package strconv for testing  

conversion between strings  and floating point numbers.

You might think that it's too much work to write a  parser for this input, but once you know how, it's  

not much work, and being able to define testing  mini-languages turns out to be incredibly useful.

So I'm going to walk quickly through the  parser to show there's not much to it.

We read the file.

Then we split it into lines.

For each line, we calculate the line number  for error messages. Slice element 0 is line 1.

We cut off any comments on the end of the line.

And if the line is blank, we skip it.

This is pretty standard boilerplate so  far. Now the good part. We split the  

line into fields, and we pull out the four fields.

Then we do the conversion in float32 or float64  math according to the type field. myatof64 is  

basically strconv.ParseFloat64 except  it handles a decimal p format that lets  

us write the test cases the way they were  written in the paper I copied them from.

Finally, it the result is not  what we want, we print the error.

This is a lot like a table-driven test. 

We just parse the file instead  of ranging over a table.

It doesn't fit on one slide, but it does  fit on one screen when you're developing.

Tip #9. Test cases can be in testdata files.

They don't have to be in your source code.

As another example, the Go regular  expression package includes some  

testdata files copied from the AT&T  POSIX regular expression library.

I won't go into the details here, but  I am grateful that they chose to use a  

file-driven test for that library, because it  meant I could reuse the testdata files for Go.

It's another ad-hoc format, but it's  easily parsed and easily edited.

Tip #10. Compare against other implementations.

Comparing against the AT&T regexp's test  cases helped make sure that Go's package  

handled various corner cases exactly the same  way. We also compare Go's package against the  

C++ RE2 library. To avoid needing to compile  the C++ code, we run it in a mode that logs  

all its test cases to a file, and then  we check in that file in Go as testdata.

Another way to store test cases in  files is to have pairs of files,  

one for input and one for output.  To implement go test -json,  

there is a program called test2json that reads  test output and converts it to JSON output.

The test data is pairs of files:  test output, and JSON output.

Here's the shortest file.

This test output is at the top, and  that's is the input to test2json,  

and that should produce the  JSON output at the bottom.

Here's the implementation, to show the  idioms for reading test data from files.

We start by using filepath.Glob to find all  

the testdata. If that fails or  doesn't find any, we complain.

Otherwise, we loop over all the files. For  each one, we create a subtest name by taking  

the base file name, without the testdata/  directory name, and without the file suffix.

Then we Run a subtest with that name.

If your test cases are complex  enough to have one per file,  

it almost always makes sense  to make each its own subtest.

That way when one is failing you can run  just that specific file with go test -run.

For the actual test case, we  just have to read the file,

run the converter,

and check whether the results match.

For the check, I started out using bytes.Equal,  

but over time it became worthwhile to  write a custom diffJSON that parses  

the two JSON results and prints a nice  explanation of what's actually different.

Tip #11. Make test failures readable.

Rewinding a bit, we've already  seen this with binary search.

I think we all agree that the pink  box is not a good failure. But there  

are two details in the yellow box that  make these failures especially good.

First, we check both return  values in a single if statement,  

and then we print the full input  and output in a concise single line.

Second, we don't stop at  the first failure. We call  

t.Error instead of t.Fatal to let more cases run.

Combined, these two choices let us see the full  

details of each failure and look for  patterns across multiple failures.

Moving back to test2json, here is  how its test fails. It calculates  

which events are different and marks them clearly.

One important point is that you  don't have to write this kind of  

sophisticated code when you first write the test.

bytes.Equal was fine to get  going and focus on the code.

But as the failures become  more subtle and you notice  

yourself spending too much time  just reading the failure output, 

that's a good signal to spend some  time on making them more readable.

Also, these kinds of tests can  be a bit annoying to update 

if the exact output changes and you need  to correct all of the test data files.

Tip #12. If the answers can  change, write code to update them.

The usual way to do this is to  add a -update flag to the test.

Here's the updating code for test2json. The test  defines a new flag -update. When the flag is true,  

the test writes the computed answer to the  answer file instead of calling diffJSON.

Now, when we make an intentional  change to the JSON format,  

"go test -update" corrects all the answers.  You can also use version control tools like  

"git diff" to review the changes and  back them out if they don't look right.

Staying on the topic of test files,  sometimes it's annoying to have a test  

case split across multiple files. If I was  writing this test today, I wouldn't do that.

Tip #13. Use txtar for multi-file test cases.

Txtar is a new archive format we designed a few  

years ago specifically to solve  the multi-file test case problem.

The Go parser is in golang.org/x/tools/txtar,  

and I've also found parsers  written in Ruby, Rust, and Swift.

Txtar's design had three goals.

First, be trivial enough to  create, edit, and read by hand. 

Second, be able to store trees of text files,  because we needed that for the go command. 

And third, diff nicely in  git history and code reviews.

Non-goals included being a  completely general archive format, 

storing binary data, storing file modes, 

storing special files like  symbolic links, and so on.

These are non-goals because archive file formats tend toward  

becoming arbitrarily complex, and complexity  directly contradicts the first goals.

These goals and non-goals led to a  very simple format. Here is an example:

The txtar file starts with a comment, in  this case "Here are some greetings." And  

then in general there are zero or more files, each  introduced by a line of the form dash dash space  

file name space dash dash. This archive  has two one-line files, hello and g'day.

That's it, that's the entire format.  There is no escaping, no quoting,  

no support for binary data, no symlinks, no  possible syntax errors, and no complications.

Here is a real use in testdata for a package that computes diffs:

In this case, the comment is useful for  people, to record what's being tested,  

and then in this test each case is  two files followed by their diff.

*Using* txtar files is almost  as trivial as writing them. 

Here is the test for the diff  package we were looking at.

This is the usual file-based loop but  we call txtar.ParseFile on the file.  

Then we insist that the archive contains  three files, the third being named diff.

Then we diff the two input files and check  that the result matches the expected diff.

And that’s the whole test.

You may have noticed the file data is passed to  this function "clean" before being used. Clean  

lets us add some diff-specific extensions for this  test without complicating the txtar format itself.

The first extension handles lines ending  in spaces, which do happen in diffs.

Lots of editors want to  remove those trailing spaces, 

so the test allows placing a $  at the end of a txtar data line 

to mark the ending, and clean removes that $.

In this example, the marked lines  need to end in a single space.

Also, txtar insists that every line  in a file ends in a newline character, 

but we want to test diff's behavior  on files that don't end in a newline. 

So the test allows a literal  carat capital D at the end. 

Clean removes both the carat-D  and the newline that follows it.

In this case the 'new' file  ends up without a final newline, 

which the diff correctly reports.

So even though txtar is incredibly simple, you can layer your own format  

adjustments on top easily. Of course, it is important to document these 

so that the people who work on  the test next understand them.

Tip #14. Annotate existing formats  to create testing mini-languages.

Annotating an existing format, like  adding the $ and carat-D to txtar, 

is a powerful tool.

Here's another example of annotating an existing  format. This is a test for the Go type checker.

This is a plain Go input file,  

but the expected type errors have been  added in slash-star-ERROR comments. We  

use slash-star comments so we can place them  exactly where the error should be reported.

The test runs the type checker  and checks that it produces the  

expected messages at the expected locations and does not produce any unexpected messages.

Here's another example from the type checker. In this test, we've added an assert annotation on  

top of the usual Go syntax. This lets us write  tests of constant arithmetic, like this one.

The type checker is already computing the boolean  value of each of those constant expressions, 

so checking the assert is really just checking  that the constant has evaluated to true.

Here's another example of an annotated format. Ivy  

is an interactive calculator. You type  programs, usually simple expressions,  

and it prints back the answers. The test  cases are files that look like this:

The unindented lines are Ivy input,  

and the indented lines are annotations of what  output to expect Ivy to print at that point.

It doesn’t get much easier to  write a new test case than this.

These annotated formats are  extending existing parsers  

and printers. Sometimes it helps to write  your own parsers and printers from scratch.

After all, most tests involve  creating or inspecting data, 

and those tests are always far nicer 

when you can work with the  data in a convenient form.

Tip #15. Write parsers and  printers to simplify tests.

These parsers and printers don't have to be for standalone scripts in testdata files. 

It's also possible to use them in regular Go code.

Here is a fragment of a test  for the code that runs deps.dev.

This test sets up some database table rows. 

It calls a function that uses  the database and is being tested. 

And then it checks that the database  contains the expected results.

The Insert and Want calls  are using a mini-language 

for database contents written  specifically for these tests. 

The parser is as easy as it looks: it splits the input into lines  

and then splits each line into fields. The first line gives the column names. That's it.

The exact spacing in these strings doesn't matter, 

but of course it looks nice  if they're all aligned.

So to support this test, the deps.dev team 

also has a code formatter  written just for these tests. 

It uses the Go standard library  to parse the test source files. 

Then it walks over the Go syntax tree  to look for calls to Insert or Want. 

It extracts the string arguments  and parses them into tables.

Then it reprints the tables back to strings, reinserts the strings back into the syntax tree, 

and reprints the syntax  tree back to Go source code.

This is just an extended version of gofmt, 

using the same packages that gofmt uses. I won't show it to you, but it's not much code.

The parser and printer took some time to write. But now every time someone writes a test,  

the test is that much easier to write. And every time a test fails or needs updating, 

it's that much easier to debug. If you're doing software engineering, 

the benefits scale with the number of  programmers and the lifetime of the project. 

For deps.dev, already the time  spent on this parser and printer 

has been saved many times over.

Perhaps even more importantly,  because tests are easier to write, 

you are likely to write more tests,  which results in higher-quality code.

Tip #16. Code quality is limited by test quality.

If you can't write high-quality tests,  

you won't write enough tests, and you  won't end up with high-quality code.

Now I want to show you some of the  highest quality tests I've ever worked on, 

which are the tests for the go command. 

These bring together many of  the ideas we've seen so far.

Here is a simple but real go command test.

This is a txtar input, with  a single file hello.go. 

The archive comment is a script written in a simple line-at-a-time command language.

In the script, "env" sets an environment  variable to turn off Go modules.

A hash sign introduces a comment.

And "go" runs the go command, which  should in turn run helo world. 

That program should print  hello world to standard error.

The "stderr" command checks that  the standard error printed by the  

previous command matches a regular expression.

So this test runs "go run hello.go" and checks  that it printed hello world to standard error.

Here's another real test.

Notice at the bottom that  a.go is an invalid program 

since it is importing an empty string.

The bang at the start of the  first line is a NOT operator.

NOT go list a.go means go list a.go should fail.

The next line, NOT stdout dot, means that there should be no matches on  

standard output for the regular expression dot, meaning no text at all should be printed.

Next, standard error should have  an invalid import path message.

And finally there should NOT be a panic.

Tip #17. Scripts make good tests.

These scripts make it incredibly  easy to add a new test case.

Here is our smallest test: two  lines. I added this one recently  

after I broke the error message  printed for an unknown command.

In all, we have over 700 of these script tests,  ranging from two lines to over 500 lines.

These test scripts replaced a more traditional  test scaffold with methods. This slide shows  

one of the real tests it replaced, behind  the script translation. The details don't  

matter except to notice that the script  is much easier to write and understand.

Tip #18. Try rsc.io/script for  your own script-based test cases.

It has been about five years since we created  the go script tests, and we're very happy with  

the specific script engine. Bryan Mills  took the time to give it a very nice API,  

and earlier in November I posted it for import  at rsc.io/script. Now I said "Try" because it's  

a bit new, and ironically it does not have  enough tests itself, since the importable  

package is only a few weeks old, but you still  might find it useful. We might put it somewhere  

more official when we have more experience with  it. If you do try it, let me know how it goes.

The motivation for extracting  the script engine was to reuse it 

in a different part of the go command tests.

This script is preparing a Git repository 

containing a module that we want to import during a regular go command script test.

You can see it sets some environment variables, 

runs an actual git init, sets the time, 

runs more git commands to add  a hello world file to the repo, 

and then checks that we got  exactly the repo what we wanted.

Once again, the tests did not start out this way, which leads me to the next tip.

Tip #19. Improve your tests over time.

Originally, we didn't have those repo scripts. 

We created small test repos by hand  and posted them on GitHub, Bitbucket,  

and other hosting servers depending on  which version control system we needed.

That worked okay but it meant that if any of  these servers went down, the tests failed.

Eventually we took the time to build our own  

cloud server that could serve repos  for every version control system.

Now we created the repos by hand, zipped  them up, and copied them to the server.

That was better, since now there was only  one server that could take down our tests,  

but sometimes there were networking problems too. It was also a problem that the test repos  

themselves were not version controlled and  they were not near the tests that used them.

The script-based version builds and serves  these repos entirely locally as part of the  

test. And the repo descriptions are  now easy to find, change, and review.

This is a lot of infrastructure,  but it's testing a lot of code too. 

If you only have 10 lines of code, you  should *not* have thousands of lines  

of test framework. But if you have  a hundred thousand lines of code,  

which is about what the go command is,  then a few thousand lines to make tests  

better, or even ten thousand lines,  is almost certainly a good investment.

Tip #20. Aim for continuous deployment.

There may be policy reasons that  you can't actually deploy your code  

on every commit that passes all  the tests, but aim for it anyway.

As I mentioned at the start of the talk,  any doubts you have about continuous  

deployment are helpful little voices  telling you what needs better testing.  

And the key to better testing is of  course make it easy to add new tests.

Even if you never actually enable continuous  deployment, aiming for it can help keep you  

honest, improve the quality of your tests,  and improve the quality of your code.

I mentioned earlier that the Go web  site uses continuous deployment. 

On each commit, we run tests to decide  whether the latest version of the code  

can be deployed and have traffic routed to it.

At this point you won't be surprised that we wrote  

a testing script language for these  tests. Here is what they look like.

Each test begins with an HTTP request.  Here we GET the main go.dev page.

Then there are assertions about the response. 

Each assertion is of the form  "field, operator, value". 

Here the field is the body,  the operator is contains, 

and the value is literal text  that the body must contain.

This test is checking that the page is rendering, 

so it checks for basic text  as well as a subheading.

To make it easier to write the  tests, there's no quoting at all: 

the value is just the rest of  the line after the operator.

Here's another test case.

For historical reasons, /about  needs to redirect to pkg.go.dev.

Here's another. Nothing special here, just  checking that the case studies page renders,  

because it is synthesized from many other files.

Another field the test can  check is the HTTP response code,  

and here's a bug fix. We were accidentally  serving these files from the Go repo root as  

if they were Go web site pages.  We want 404s for these instead.

Another field you can test  is header foo, for some foo.

In this case, the header Content-Type 

needs to be set correctly for the  main blog page and its JSON feed.

Here's another example. This one uses the regular  

expression matching operator tilde  and the \s+ syntax to make sure that  

the page has the right text no matter  how many spaces are between the words.

That got a little bit old, so we added a new  field named trimbody that is the body with  

all runs of spaces replaced by a single  space. This example also shows that the  

value can be supplied as multiple indented  lines, to make multiline matches easier.

We also have some tests that can't be run locally 

but are still worth running in production before we migrate live traffic to the server.

And here are two of these. 

These depend on network access to  the production playground backends.

These cases are the same except for the URLs. And this is not a terribly readable test,  

since these are our only POST tests. If we added more of these, 

I would probably take the  time to make them a bit nicer, 

in the spirit of improve your tests over time. 

But for now they're fine, and  they serve an important purpose.

Finally, as usual, it's easy to  add bug fixes. In issue 51989,  

certain talks were not rendering at all.

So this test checks that the page *does* render and contains a distinctive piece of text.

Issue 51989 is never going to happen again,  at least not on the live web site. There will  

be other bugs for sure, but that one  is gone for good, and that's progress.

That's all the examples I have time  to show you, but one final thought.

I'm sure you've had the experience  of chasing down a bug and ending  

up in an important piece of code that is wrong.

But somehow it's wrong in a way that  doesn't matter most of the time,  

or wrong in a way that's cancelled  out by some other wrong piece of code.

And you've probably thought to  yourself “How did this code ever work?”

If you wrote the code, you might have thought you  got lucky. And if someone else wrote the code,  

you might have thought poorly of them and  then also thought that they got lucky.

But most of the time the answer isn't luck.

The answer to how did this code ever work?  is almost always: because it had a test.

Sure the code is wrong, but the test checked that it was correct enough for the rest  

of the system to work, and that's what mattered.

Maybe the person who wrote the  code was in fact a bad programmer, 

but they were a good software engineer, because they wrote a test, 

and that's why the overall system  containing that code works.

What I hope you take away from this talk is not the specific details of any given test, 

although I do hope you will keep  an eye out for good uses for 

small parsers and printers. Anyone can learn how to write those, 

and using them effectively can be a software engineering superpower.

Ultimately, these were good  tests for these packages. 

Good tests for your packages  may well look different. 

And that’s fine.

But make it easy to add new test cases, and make sure that you have good,  

clear, high-quality tests. Remember that code quality is  

limited by test quality, so invest in improving  

your tests gradually over time. The longer you work on a project,  

the better your tests should become. And aim for continuous deployment, 

at least as a thought experiment to  understand what's not tested well enough.

Overall, put as much thought and care and effort into writing good test code as you do into  

writing good non-test code. It’s absolutely worth it.

Loading...

Loading video analysis...