Here at Jet we use parameterized queries to access Azure SQL as a best practice in certain cases. The benefit here is that the query plan can be cached when you use a parameterized query, and subsequent calls for the same query perform better than dynamic queries (Disclaimer: this isn’t always true, particularly for more complex queries. Always analyze your execution plan).
When parameterizing queries, we define variables for the fields we’re passing in and then provide values for those variables with each execution. Here’s a simple code example:
let query = """ SELECT [item_a], [item_b] FROM [dbo].[table_a] WHERE [item_c] = @item_c """ use conn = new System.Data.SqlClient.SqlConnection(cs) conn.Open() use cmd = new System.Data.SqlClient.SqlCommand(query, conn) cmd.Parameters.AddWithValue("@item_c", itemC) |> ignore cmd.CommandType <- System.Data.CommandType.Text use! reader = cmd.ExecuteReaderAsync() |> Async.AwaitTask
As you can see, we’re selecting two columns, item_a and item_b from table_a, and we’re passing in a value to compare with the column item_c. The parameter we’ve defined is @item_c and we use the method SqlParameterCollection.AddWithValue to set the value for this column.
Here’s the table we are querying against:
CREATE TABLE [dbo].[table_a] ( [item_a] VARCHAR (64) NOT NULL, [item_b] VARCHAR (64) NOT NULL, [item_c] VARCHAR (64) NOT NULL, CONSTRAINT [PK_table_a_imsc] PRIMARY KEY CLUSTERED ([item_c]) )
Our table’s primary key is the column we’re querying on, and the performance should be awesome, right?
However, when we start using this query (well, a similar one at least) in our code, the performance is terrible. With a few hundred thousand rows and ten thousand queries per minute we’re seeing 4-second execution times across many Azure SQL P6 shards, which is just awful. This actually pegs all of the shards at 100% resource utilization… epic fail.
What’s going on? Why did this fail so badly?
First, we enable Query Store on the database and start collecting data. We see that our parameterized query is generating the following SQL code:
(@item_c nvarchar(64))SELECT [item_a], [item_b] FROM [dbo].[table_a] WHERE [item_c] = @item_c
Hmmmmm. The query looks reasonable, so now we go into SQL Server Management Studio, click the ‘Include Actual Execution Plan’ button, and run the query. Now we get this:
Ugh, so the problem is clear. We’re doing a full table scan. But, why???
If we look a bit further down in the execution plan, we see the following horrible words: implicit conversion. If we look back at our generated query again, we see that while the column we’re querying in our table is defined as varchar, our parameterized query has passed in a parameter of type nvarchar, requiring this implicit conversion between varchar and nvarchar and, consequently, a full table scan. Nooooooo!
The issue here is suddenly apparent: SqlParameterCollection.AddWithValue has to convert our F# string to a SQL datatype and has chosen nvarchar. This method does not allow us to provide enough information to properly generate our query, but we do have another option, SqlParameterCollection.Add. Now our code looks like this:
let query = """ SELECT [item_a], [item_b] FROM [dbo].[table_a] WHERE [item_c] = @item_c """ use conn = new System.Data.SqlClient.SqlConnection(cs) conn.Open() use cmd = new System.Data.SqlClient.SqlCommand(query, conn) cmd.Parameters.Add("@item_c", System.Data.SqlDbType.VarChar).Value <- itemC cmd.CommandType <- System.Data.CommandType.Text use! reader = cmd.ExecuteReaderAsync() |> Async.AwaitTask
This is less neat and tidy but allows us to specify the exact SQL type we want to use. Looking at our generated query we now see:
(@item_c varchar(64))SELECT [item_a], [item_b] FROM [dbo].[table_a] WHERE [item_c] = @item_c
When we rerun our execution plan we bask in this glory:
Wow. Now we’re in business. We see an index seek and 25,000 times better performance. We scale our Azure SQL instances down to P2 and see less than 5% resource utilization. The good guys win. Game over.