From the very beginning, a core feature of .NET was the database access framework known as ADO.NET or System.Data. The purpose of this framework is to ensure all databases can be accessed in a consistent manner. However, there are the occasional gaps that need to be addressed.
One such gap is the ability to send multiple SQL statements in a single batch. While some databases such as SQL Server allow the statements to be separated by a semi-colon, this is not available for all databases. Shay Rojansky writes,
The problem with this approach, is that most databases require separate protocol messages for each statement (e.g. PostgreSQL, MySQL), forcing the database's ADO.NET provider to parse the SQL client-side and to split on semicolons. This is both unreliable (parsing SQL is difficult) and bad for performance - ideally an ADO.NET provider should simply forward user-provided SQL to the database, without parsing or rewriting it.
And even SQL Server has limitations, such as not supporting multiple stored procedure calls in a single batch.
The .NET 6 solution to this is the new DbBatch class. The goals in the Batching API proposal are;
- Provide a structured way to execute multiple SQL statements in a single roundtrip, without any need for client-side parsing of SQL.
- Keep the API consistent with other ADO.NET APIs, and specifically close to
DbCommand
(both are "executable"), to reduce the conceptual complexity for adoption. - Allow mixing different types of statements in the same batch (insert, update, select). The current concatenation approach supports this, and so does our reader API (multiple resultsets).
- Provide non-aggregated access to the number of rows affected, for each individual command in the batch.
Using a DbBatch Directly
Under this model, you start by creating a database specific of the DbBatch class. For example, if targeting PostgreSQL then you’d create an instance of NpgsqlBatch
. For MySQL, the class is called MySqlBatch
. (For generic programming, you can also use a DbProviderFactory to create the correct DbBatch subclass for you.)
Onto the DbBatch
class you add your connection and optional transaction objects. Then you add your list of DbBatchCommand
objects.
A DbBatchCommand
is a slimmed-down version of the DbCommand
. Basically, it is little more than the SQL string and parameter collection, with everything else handled at the DbBatch
level.
At this point you can call an execute method on the DbBatch
. The three basic options are:
ExecuteNonQuery
: Returns the total affected rows.ExecuteScalar
: Returns the first column of the first row of the first result set.ExecuteReader
: Returns all of the result sets in order.
When using ExecuteReader
, only the first result set will be visible until you call DbDataReader.NextResult
. At that point the first result set is lost and the next one becomes visible. Repeat until all result sets are consumed and NextResult
returns false.
After executing a DbBatch
, each DbBatchCommand
will have its RecordsAffected
property set. When exactly this gets populated is implementation specific and may be delayed until all result sets are consumed.
DbBatch and ORMs
Most ORMs need to batch operations for the sake of efficiency. So theoretically they could see a performance boost by switching to the DbBatch API. This would be done internally with no visible change to the ORM. Essentially it would be a free performance boost for application developers.
PostgreSQL Implementation
The official PostgreSQL drivers for .NET have been updated to support the batching API. Along with this change came the raw SQL mode option.
As mentioned above, the PostgreSQL driver needs to scan the SQL string for semi-colons so it can convert the batch into single statements. This parsing can be quite costly, as it has to correctly handle things like semi-colons embedded in string literals.
The parsing also deals with the named parameter issue. PostgreSQL does not natively support named parameters, so the client-side parser also needs to convert the named parameters into positional parameters.
By switching to raw SQL mode, the parsing is disabled. This offers some potential performance improvements, but you lose the ability to use batching via NpgsqlCommand, which is an even bigger performance concern.
Once explicit batching became possible via NpgsqlBatch, raw SQL mode also became a viable option. To use this mode, you must not assign names to any parameters.
If you do not have parameters, then you must use either of these commands,
AppContext.SetSwitch("Npgsql.EnableSqlRewriting", false);
cmd.Parameters.Add(new() { Value = 0 });
The AppContext version changes the setting globally.
MySQL Implementation
The MySQL implementation is being offered as part of MySqlConnector. This library is generally considered to be better than the official Oracle version and supports MariaDB as well.
The MySqlBatch functionality was actually completed back in 2019, but didn’t get much attention because the matching framework API was cut from .NET Core 3. When it was finally released with .NET 6, the only significant change was marking it as inheriting from the matching base classes.
SQL Server Implementation
The SQL Server implementation of this feature is lagging behind the others. Though planning started in 2018, key decisions have not yet been made. The last update in December indicated it was a managerial issue,
The api is in net 6 so what we need is a net6 build target and a policy decision from the MS side on the api will be made available in other builds where the base classes aren't available.
As far as the implementation is concerned, there is already a lot of the base functionality in place. A similar internal class known as SqlCommandSet has been in use for a very long time, but is only accessible indirectly via SqlDataAdapter, which few developers even know exists.
You can track the progress of this in the ticket named SqlClient: Implement optimized version of the new ADO.NET batching API.