We Love Testing!
We take testing very seriously at Jet in general and on team Nova (Jet’s matching engine/taxonomy management platform) in particular. While bugs are unavoidable, proper testing can catch most of them and minimize errors in our production code. And minimizing errors is what keeps our ecommerce site humming along. There are many different kinds of testing and error detection done on the Nova codebase and I’m going to give you a rundown of what these are and then focus in on our newest achievement, an automated integration testing framework.
The Mélange of Testing
On the Nova team we use a variety of methods to ensure that our production code is of the highest quality:
- IDE Error Indication– Visual Studio IntelliSense (https://msdn.microsoft.com/en-us/library/hcw1s69b.aspx) indicates errors in our code as we write with the (in) famous red squiggly line and recommends solution.
- Linting– FSharpLint (https://github.com/fsprojects/FSharpLint) is integrated with FSharp Power Tools (http://fsprojects.github.io/VisualFSharpPowerTools). This visually indicates when a set of configurable rules on styling code has been broken in our codebase.
- Compilation– The benefit of a strongly and statically typed language like F# is that the compiler will catch many errors in the code.
- Unit Tests– The quickest way to test code, we use NUnit (http://www.nunit.org) to quickly write and run unit tests locally in Visual Studio. Functional languages like F# lend themselves very well to unit testing when functional composition, DRY coding, and proper dependency injection via partial application and higher-order functions is used. We also run our unit tests as part of our Jenkins build pipeline and this provides some regression testing. If the unit tests don’t pass, the build fails.
- Service Tests– Similar to unit tests. We also use the NUnit framework for these and run the tests in Visual Studio. But for service tests we also inject some dependencies in order to test larger units of code in conjunction with databases/caches/etc.
- Manual Regression Testing– Testing done in our QA environment by the developers and business team to ensure that new code doesn’t break existing functionality. This often involves injecting message into message buses or commands into queues that are then picked up by services and then verifying the output put into the database, message bus, or event storage by the Nova system to confirm expected behavior.
- Manual Feature Testing– The same as above except that tests involve new functionality.
- Smoke Tests– Tests that are done once code is pushed to production to confirm that the release went as planned.
- Automated Integration Tests– The focus of this post…
Not Your Typical System
At many e-commerce sites, the catalog is a carefully curated source of truth, manually created and maintained, relatively static, and somewhat trivial to test. However, at Jet there is no static catalog, it is constantly being created and refined by the matching engine and taxonomy management platform. And these systems, by necessity, are complex. Inputs to the system are received from merchants, 3rd parties, internal sources, manual intervention, and other places. Once these inputs come into the system, they must be identified, matched, normalized, grouped, categorized, prioritized, aggregated, and standardized in order to provide the highest quality data to enable user search and product display and ordering. These operations require over 150 scripts and microservices, most using multiple threads concurrently and many running in multiple instances. These services communicate asynchronously with each other, users, and downstream and upstream systems via queues and message buses. We additionally maintain a web API for user interaction and web-base UI for interaction with this API. Data is stored as immutable events (EventStore, Azure Table Storage), blobs (Azure Blob Storage), and in database tables (Azure SQL). We also cache data in Redis (http://redis.io) when necessary.
Automated Integration Tests- The Elves to our Shoemaker
The description of our platforms in the previous section is given to show that creating automated integration tests in such an environment is not a trivial matter. The existing data in the system can effect the results of end-to-end tests as can other inputs into the system. Upstream and downstream systems can also impact or be impacted by the results of such tests. So if we have as our goal to create robust, reproducible tests that we can rely on and run in an automated fashion, we need to create and recreate an isolated, pristine environment to run them in. And what could be better than to have all this happen overnight while we’re happily sleeping in our beds?
The Pristine Environment
Given the complexity of the Jet matching engine and our taxonomy management platform, creating a pristine environment is no trivial matter. Here are the steps we need to take in order to accomplish this:
Clean The Databases
We first have to delete everything in our database (data, schema, etc) and recreate the structure of the database to match to latest version of the schema used in our candidate production release. We do this using Visual Studio Database projects for each of our production databases (https://msdn.microsoft.com/en-us/library/hh272677(v=vs.103).aspx). Any new changes to the schema are added to these projects and versioned in TFS. Then to restore the integration testing environment to it’s initial state each night, we run a stored procedure on each database to delete everything (except the stored procedure itself of course) and then check out our database projects from TFS, build them, and use SqlPackage (https://msdn.microsoft.com/en-us/hh550080(v=vs.103).aspx) to publish the schema to our newly cleaned database. This is completely reproducible over and over again without user intervention and guarantees that all new changes to the database structure, once checked into our database projects, will be part of our tests.
Update Environmental Properties
We use a PowerShell script versioned in our repo to write all environmentally specific property settings to a Consul instance running in the integration environment. This is done via the Consul Key/Value web api (https://www.consul.io/docs/agent/http/kv.html).
Clearing out the Redis Caches
We use Redis as a cache when necessary and these caches must be emptied. We use F# code called from PowerShell and the StackExchange Redis library (https://github.com/StackExchange/StackExchange.Redis) to clear our caches.
Delete All Streams in EventStore and Recreate Taxonomy Data
We use EventStores (https://geteventstore.com) as immutable storage for all events in our system. Before we can run our integration tests we need to remove all streams of events and then get the latest streams of taxonomy events (which we require in order to be able to source metadata) from production and put it into the environment. We do this by running a PowerShell script on our EventStore VMs that stop the EventStores, deletes all existing data, copies over the most recent production backup, and then restart the EventStores. This is all carefully controlled so that the data is properly replicated, errors are recovered from (using a previous backup), and the EventStores are correctly restarted.
Reset Projection of Events from EventStores to Message Buses
We maintain data for all entities in our system as streams of immutable events in our EventStores. We then project an aggregation of all streams of data of each type (product, category, offer source, etc) into message buses so that services can subscribe and react to these projections. This is done by subscribing to the $ALL streams on our EventStores via services we call Replicators and requires careful configuration in order to maintain ordering and ensure that everything stays in sync. During our reset of the integration environment we use a PowerShell script to properly stop and restart these Replicator services at the appropriate times in order to ensure that they properly project events in our empty integration environment.
We use PowerShell to delete all blobs, queues, and tables hosted on Azure (https://azure.microsoft.com/en-us/documentation/articles/storage-powershell-guide-full)
Jenkins to Schedule Everything
All of this environmental cleanup is organized via a scheduled Jenkins (https://jenkins.io) pipeline job that runs each night. The scheduling is done via a Cron Expression set as the Schedule of a “Build Periodically” Build Trigger.
Redeploying the Latest Microservice and Web API Code
Once we’ve cleaned the environment, we redeploy the latest code for our microservices and web API. This is done via Jenkins jobs that are triggered upon completion of the initial scheduled cleanup job. The latest code is checked out of TFS, built, uploaded to the servers, then custom deployment scripts start processes for each micro service and Azure restarts the web API.
SO NOW WE CAN TEST!
Sorry to yell but it’s taken us quite a lot of work to get to this point. We have all of the data cleaned out, all of the latest code, taxonomy data, and SQL schema deployed. Everything is ready to go! We can now start testing… but how? The way that our system works, we can start injecting message into the upstream message buses and testing that we get the expected messages in the downstream message buses. We do this by deploying a testing microservice in the integration environment that is scheduled to wake up after the environment is reset each night. We then use NUnit’s SimpleTestRunner to run our tests:
CoreExtensions.Host.InitializeService() use runner = new NUnit.Core.SimpleTestRunner() Setup.configureTestRunner "bin/release/integrationtests.dll" runner let logger = printfn "%s" let listener = Setup.createListener logger let result = runner.Run(listener, TestFilter.Empty, true, LoggingThreshold.All)
This will run all tests that we’ve written and built into the IntegrationTests project. The configureTestRunner method loads the dll into our SimpleTestRunner and the createListener method creates a NUnit.Core.EventListener.
And here’s what a sample test would look like:
[&amp;lt;TestFixture&amp;gt;] type IntegrationTests() = [&amp;lt;Test&amp;gt;] member x.``Insert message and expect message``() = sendMessage "message with id = 9b8fb684b15e4f7087e28125cb8a40a5…" //put a message on the upstream message bus getMessages “topic” //get messages from the given downstream message bus //this will result in an AsyncSeq: //&lt;a href="https://github.com/fsprojects/FSharp.Control.AsyncSeq"&gt;https://github.com/fsprojects/FSharp.Control.AsyncSeq&lt;/a&gt; |&amp;gt; AsyncSeq.map decode //decode messages from message bus, returns an Option |&amp;gt; AsyncSeq.filter (function | Some message when message.id = "9b8fb684b15e4f7087e28125cb8a40a5" -&amp;gt; true | _ -&amp;gt; false) //filter other messages to make sure we get the one we want |&amp;gt; AsyncSeq.take 1 //wait for exactly 1 |&amp;gt; Async.timeoutAfter (TimeSpan.FromSeconds 60.00) //timeout after a minute |&amp;gt; Async.RunSynchronously |&amp;gt; ignore
So the above code will insert a message into the upstream bus and wait for the corresponding message to show up in the downstream bus. If the message doesn’t show up within 1 minute, an exception is thrown and the test fails. If the message does show up then the system worked as expected and the test passes! Our test runner generates a report we can look at in the morning.
Place in the Release Cycle
So now we can base a daily deployment schedule off of these integration tests. We code all day and when we feel confident about our code using the other test methods listed above then we merge it into our TFS repo’s trunk. Then each night the Jenkins integration pipeline will kick off, refresh the integration environment, and deploy the latest code from trunk. Finally, the testing microservice will wake up, run the tests and generate a report. When we get into the office in the morning we check the test report and if everything looks good we release the code. Otherwise we fix what’s broken, run the tests again, and then deploy.
So that’s the basics of what we’re doing. The next step will be for us to create a Domain Specific Language that will allow our business analysts to write tests themselves. They should be able to write something like:
Insert message “x” on topic “y” and expect id “z” to show up in topic “a”.
Executing the above will generate a test written in F# and save it in our IntegrationTests project that will be run every night. How much fun will that be!?
A Note On Integration Tests In General
Integration tests like the ones we’re using are wonderful when used properly, that is, to test user behavior that otherwise couldn’t be captured by simpler tests lower down the testing pyramid (http://martinfowler.com/bliki/TestPyramid.html). But it’s important to remember that these are costly tests- any issues found with these tests will take about 30 minutes to retest. So anything that can be tested by unit or system tests should be tested by unit or system tests (and preferably by unit tests!). Using these tests results in much shorter turnaround between running the tests and fixing any issues found by them. That is all. Enjoy your integration tests.