Posts Non-deterministic test failures when testing React
Post
Cancel

Non-deterministic test failures when testing React

Do you feel a pang of joy and relief when your tests fail? If you think that an odd question, consider how you feel (as a developer, that is) when you have no idea why your tests fail… and you spend days pulling your hair out trying to fix it. Oh, and for comedy’s sake, a colleague comes along and fixes it after one glance.

I digress; the focus here is React. I develop a React application which uses Redux Toolkit and Material-UI as a part of my job, and that application is, as you’d guess, throwing some strange errors when testing.

Observation

  • Run a test which uses Redux state, and watch it fail by not finding text on the page.
  • Run it again without code changes and watch it pass.

Clearly, something is afoot here.

This Post Will Cover

  • Analysis of the problem.
  • Attempts at fixing the problem.
  • The (hopefully) identified cause.
  • Some (possible) rants along the way.

Fix Attempts

An excerpt of the code is below:

1
2
renderRoute(`/services/${mockService[0].id}`);
await screen.findByRole('cell', {name: mockSubServices[0].description});

This expects that the UI will render a service with its sub-services at the given URL. This makes it more of an integration test than a unit test, but not fully so. Run this test once and you could watch it fail; hit return and watch it pass. I suspect the non-deterministic behaviour shown here is due to loading times more than anything. To test this theory, I ran it five times to prove the non-deterministic behaviour, then made it wait:

1
2
3
renderRoute(`/services/${mockService[0].id}`);
await new Promise(resolve => setTimeout(resolve, 1000));
await screen.findByRole('cell', {name: mockSubServices[0].description});

Surely enough, there were no test failures after an excess of 10 runs. Also interesting is that tests with the former code that intermittently fails takes a little over 13 seconds, while the “sleep” code takes 5-6 seconds even though it intentionally waits for an absolute amount of time independently of the app’s lifecycle.

My theory here is that the React Testing Library polls for the requested element, then waits, and rinse/repeat; meanwhile, when the sleep command is given before querying the DOM, the tests find the element first time around, which saves time by spending time.

This suggests that renderRoute isn’t properly waiting for the rendering to be done before it continues running.

Possible Solutions

  1. Use a small timeout that just happens to work across all tests. This is a cowboy-esque approach that (for reasons hopefully obvious) isn’t the best.
  2. Use waitFor to verify that the API in question was called, which is close enough in the lifecycle to be confident of the data being displayed correctly.

One interesting thing about msw in this blog post by Kent C. Dodds goes into detail about using msw in place of fetch; while that’s not the main issue here, it details asserting an API being called when using fetch:

1
2
3
userEvent.click(screen.getByRole('button', {name: /confirm/i});

expect(client).toHaveBeenCalledWith('checkout', {data: shoppingCart});

However, it makes no such assertion for the msw test:

1
2
3
userEvent.click(screen.getByRole('button', {name: /confirm/i});

expect(await screen.findByText(/success/i)).toBeInTheDocument();

This has me confused, since it will fundamentally be testing two disparate concepts, even though they live somewhere in the same lifecycle. I digress, though, since:

  1. Kent C. Dodds has a plethora of good material, so this is not a criticism.
  2. We will be waiting for the API to be called, and not so much checking that it was.

Attempting a waitFor Approach

The msw docs themselves (specifically Request Assertions) details how to:

Check that a specific request handler was called.

Perfect… aside from the fact that there’s no code, and they then mention:

Adding such assertions, however, is implementation details testing and is highly discouraged.

Hmm. Perhaps it’s time to pay attention to the failure again and see if there’s another way.

Back to the Drawing Board

Removing the timeout code from the test, I’ll run it again enough times to observe the failure:

1
Unable to find role="cell"

I see two possible routes aside from an arbitrary sleep:

  1. Increase the timeout of the findByRole call.
  2. See if renderRoute can be called with await.

I much prefer option 2, but it depends on whether render (from React Testing Library) can be called with await, since that’s all renderRoute calls under the hood.

Unfortunately, that still fails sporadically and takes longer than the other method previously discussed. There’s an interesting GitHub issue which discusses waitFor and getByRole taking a long time to run. Though this is a different function to findByRole, I imagine that there may be similar logic under the hood which is causing issues around independent lifecycles. This is quite recent, too - the issue has comments from 2021.

The most recent of these comments goes on to say:

[…] switching from findByRole with name: to findByText saved me over a second per test.

Replacing findByRole with findByText seems to give consistently pass results so far, with a very consistent time of ~5 seconds per run. Now to find all other tests and change their find strategies. By being less specific, one does lose granularity of asserting claims about the DOM, but it then becomes the responsibility of the developer writing tests to ensure that they won’t pick up on another element in the page; this could mean using findByRole in select areas where it becomes problematic to be too generic.

Unfortunately, not all tests with such sporadic errors are fixed by this supposed catch-all. Tests which simply don’t render in time for such elements to be found are fixed by switching from findByRole to findByText. However, tests that depend on different state from the default, and use dispatch functions to change this, operate on a lifecycle independent of the store and resulting state. This means the test will be making assertions on a stale copy of the DOM which operates on historical state values.

Trading a Fix for a Bug

The above issue was resolved by passing a custom Redux state into the renderRoute function instead of generating a store with less-than-ideal state and then issuing dispatch calls to retrospectively change that. However, finding elements in the DOM for this test fails unless you provide a very small timeout, and I’m still not sure why:

1
2
3
renderRoute(`/services/${mockService[0].id}`);
await new Promise(resolve => setTimeout(resolve, 10));
await screen.getByRole('button', {name: /Add sub-service/i});

Here’s something even more interesting: if the test instead uses findByRole or findByText, the test passes without the timeout… so the conclusion here is that the test relies on the query being slow for the DOM to render properly, and not by a lot, either. Reducing the timeout to 1 ms once again causes failures.

Remediation

The ideal here would be to do one of the following:

  1. Ensure that the test is (possibly slow and) deterministic, such that the testing code only resumes when the button appears in the DOM.
  2. Reduce the amount of time for the DOM to render by eliminating the slow-loading offenders in the app.

Option 2 sounds lovely, but for now it’s better to get the tests working, even if they take longer to run. Since option 2 would increase the scope considerably, I’ll go with option 1 for now.

This involves removing the dispatch(setTempReadWriteEnabled(isReadWriteEnabled)) calls from tests, and instead calling the following when rendering:

1
renderRoute(`/services/${mockService[0].id}`, getTestStore(false));

The false argument specifies that read-write should be false, which replaces the dispatch call.

Lessons Learned

It’s perhaps not obvious from the fact that you can’t observe the whole repository of code, but these tests operate at too high a level. Rather than explicitly render a component and make assertions on that component alone, it renders the entire DOM tree. This has benefits including:

  • Being able to test navigation with one-to-one simulation conditions, since the app’s routing logic is the implementation under test (IUT).
  • Ensuring that all environmental dependencies (e.g. SnackbarProvider from notistack) exist without additional work required.
  • Greater assurance that the app works as a whole (bringing it closer to integration tests).

However, having a half-baked suite of integration tests brings about its own set of challenges, where one will inevitably be taking multiple trips to Stack Overflow to fix problems that end up being quite specific to their application. Uncle Bob might class this as a code smell.

Until next time, all the best!

This post is licensed under CC BY-SA 4.0 by the author.