UPDATE: the source code for DbContextScope
is now available on GitHub: DbContextScope on GitHub.html
This isn't the first post that has been written about managing the DbContext
lifetime in Entity Framework-based applications. In fact, there is no shortage of articles discussing this topic.ios
For many applications, the solutions presented in those articles (which generally revolve around using a DI container to inject DbContext
instances with a PerWebRequest lifetime) will work just fine. They also have the merit of being very simple - at least at first sight.git
For certain types of applications however, the inherent limitations of these approaches pose problems. To the point that certain features become impossible to implement or require to resort to increasingly complex structures or increasingly ugly hacks to work around the way the DbContext
instances are created and managed.github
Here is for example an overview of the real-world application that prompted me to re-think the way we managed our DbContext
instances:web
DbContext
type. Any approach assuming a single DbContext
type won't work here.DbContext
instance). This clearly doesn't apply here. Just because one remote API call failed doesn't mean that we can auto-magically "rollback" the results of any remote API call we may be done prior to the failed one (e.g. when you've used the Facebook API to post a status update on Facebook, you can't roll it back even if that operation was part of a wider user action that eventually failed as a whole). So in this application, a user action will often require us to execute multiple business transactions, which must be independently persisted. (you may argue that there might be ways to redesign the whole system to avoid finding ourselves in this sort of situation. And maybe there are. But that's how the application was originally designed, it works very well and that's what we have to work with).Task.Run()
or Parallel.Invoke()
methods. So the way we manage our DbContext
instances must play well with multi-threading and parallel programming in general. Most of the common approaches suggested to manage DbContext
instances don't work at all in this scenario.In this post, I'll go in depth into the various moving parts that are involved in DbContext
lifetime management. We'll look at the pros and cons of several strategies commonly used to solve this problem. Finally, we'll look in details at one strategy (among others) to manage the DbContext
lifetime that addresses all the challenges presented above and that should work for most applications regardless of their complexity.sql
There is of course no such thing as one-size-fits-all. But by the end of this post, you should have all the tools and knowledge you need to make an informed decision for your specific application.session
Like most posts on this blog, this post is on the long and detailed side. It might take a while to read and digest. For an Entity Framework-based application, the strategy you choose to use to manage the lifetime of the DbContext
will be one of the most important decisions you make. It will have a major impact on the correctness, maintainability and scalability of your application. So it's well worth taking some time to choose your strategy carefully and not rush into it.mvc
In this post, I'll often refer to the term "services". What I mean by that is not remote services (REST or otherwise). Instead, what I'm referring to is what is often called Service Objects. That is: the place where your business logic is implemented - the objects responsible for executing your business rules and defining your business transaction boundaries.app
Of course, depending on the design patterns that were used to create the architecture of your application (and depending on the imagination of whoever designed it - software developers are an imaginative bunch), your code base might be using different names for this. So what I call a "service" might very well be called a "workflow", an "orchestrator", an "executor", an "interactor", a "command", a "handler" or a variety of other names in your application.less
Not to mention that many application don't have a well-defined place where business logic is implemented and rely instead on implementing (and often duplicating) business logic on an ad-hoc basis where and when needed, e.g. in controllers in an MVC application.
But none of this matters for this discussion. Whenever I say "service", read: "the place that implements the business logic", be it a random controller method or a well-defined service class in a separate service layer.
When coming up with or evaluating a DbContext
lifetime management strategy, it's important to keep in mind the key scenarios and functionalities that it must support.
Here are a few points that I would consider to be essential for most applications.
DbContext
instance lifetime)Perhaps the main source of confusion when it comes to managing DbContext
instances is understanding the difference between the lifetime of a DbContext
instance and the lifetime of a business transaction and how they relate.
DbContext
implements the Unit of Work pattern:
Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.
In practice, as you use a DbContext
instance to load, update, add and delete persistent entities, the instance keeps track of those changes in memory. It doesn't however persist those changes to the underlying database until you call its SaveChanges()
method.
A service method, as defined above, is responsible for defining the boundary of a business transaction.
The practical consequence of this is that:
DbContext
instance throughout the duration of a business transaction. This is so that all the changes made to your persistent model are tracked and either committed to the underlying data store or rolled back in an atomic manner.DbContext.SaveChanges()
method at the end of a business transaction. Should other parts of the application call the SaveChanges()
method (e.g. repository methods), you will end up with partially committed changes, leaving your data in an inconsistent state.SaveChanges()
method must be called exactly once at the end of each business transaction. Inadvertently calling this method in the middle of a business transaction may leave the system with inconsistent, partially committed changes.A DbContext
instance can however span across multiple (sequential) business transactions. Once a business transaction has completed and has called the DbContext.SaveChanges()
method to persist all the changes it made, it's entirely possible to just re-use the same DbContext
instance for the next business transaction.
I.e. the lifetime of a DbContext
instance is not necessarily bound to the lifetime of a single business transaction.
DbContext
instance lifetime independently of the business transaction lifetime.A very common scenario where the lifetime of the DbContext
instance can be maintained independently from the lifetime of business transactions is in the case of web applications. It's quite common to a use a configuration where a DbContext
instance is created at the beginning of each web request, used by all the services invoked during the execution of the web request and eventually disposed of at the end of the request.
There are two main reasons why you would want to decouple the lifetime of the DbContext
instance from the business transaction lifetime.
DbContext
instance maintains a first-level cache of all the entities its loads from the database. Whenever you query an entity by its primary key, the DbContext
will first attempt to retrieve it from its first-level cache before defaulting to querying it from the database. Depending on your data query pattern, re-using the same DbContext
across multiple sequential business transactions may result in a fewer database queries being made thanks to the DbContext
first-level cache.DbContext
instance from which those entities were retrieved must extend beyond the scope of the business transaction. If the service method disposed the DbContext
instance it used before returning, any attempt to lazy-load properties on the returned entities would fail (whether or not using lazy-loading is a good idea is a different debate altogether which we won't get into here). In our web application example, lazy-loading would typically be used in controller action methods on entities returned by a separate service layer. In that case, the DbContext
instance that was used by the service method to load these entities would need to remain alive for the duration of the web request (or at the very least until the action method has completed).DbContext
alive beyond the scope of a business transactionWhile it can be fine to re-use a DbContext
across multiple business transactions, its lifetime should still be kept short. Its first-level cache will become eventually become stale, which will lead to concurrency issues. If your application uses optimistic concurrency this will result in business transactions failing with a DbUpdateConcurrencyException
. Using an instance-per-web-request lifetime for your DbContext
in web apps will usually be fine as a web request is short-lived by nature. But using an instance-per-form lifetime in a desktop application, which you'll often find suggested, is a lot more questionable and requires careful thought before being adopted.
Note that you can't re-use the same DbContext
instance across multiple business transactions if you rely on pessimistic concurrency. Correctly implementing pessimistic concurrency involves keeping a database transaction with the correct isolation level open for the whole lifetime of a DbContext
instance, which would prevent committing or rolling back individual business transactions independently.
Re-using the same DbContext
instance for more than one business transaction can also lead to disastrous bugs where a service method accidently commits the changes from a previously failed business transaction.
Finally, managing your DbContext
instance lifetime outside of your services tends to tie your application to a specific infrastructure, making it a lot less flexible and much more difficult to evolve and maintain in the long run.
For example, for an application that starts off as a simple web application and relies an instance-per-web-request strategy to manage the lifetime of its DbContext
instances, it's easy to fall into the trap of relying on lazy-loading in controllers or views or on passing persistent entities across service methods on the assumption that they will all use the same DbContext
instance behind the scenes. When the need to introduce multi-threading or move operations to background Windows Services inevitably arises, this carefully constructed sand castle often collapses as there are no more web requests to bind DbContext
instances to.
As a result, it's advisable to avoid managing the lifetime of DbContext
instances separately from business transactions. Instead, each service method (i.e. each business transaction) should create its own DbContext
instance and dispose it at the end of the business transaction (i.e. before returning).
This precludes using lazy-loading outside of services (which can be addressed by modeling your domain using DDD or by getting services to return DTOs instead of persistent entities) and poses a few other constraints (e.g. you shouldn't pass persistent entities into a service method as they won't be attached to the DbContext
instance that the service will use). But it brings a lot of long-term benefits for the flexibility and maintenance of the application.
If your application works against an RDMS that provides ACID properties for its transactions (and if you're using Entity Framework, you almost certainly are), it's essential for your services to be in control of the database transaction scope and isolation level. You can't write correct code otherwise.
As we'll see later, Entity Framework wraps all write operations within an explicit database transaction by default. Coupled with a READ COMMITTED isolation level - the default on SQL Server - this suits the needs of most business transactions. This is especially the case if you rely on optimistic concurrency to detect and avoid conflicting updates.
Most applications however will still occasionally need to use other isolation levels for specific operations.
It's very common for example to execute reporting queries where you have determined that dirty reads aren't an issue under a READ UNCOMMITTED isolation level in order to eliminate lock contention with other queries (although if your environment allows it, you'll probably want to use READ COMMITTED SNAPSHOT instead).
And some business rules might require the use the REPEATABLE READ or even SERIALIZABLE isolation levels (especially if your application uses pessimistic concurrency control). In which case the service will need to have explicit control over the transaction scope.
DbContext
is managed should be independent of the architecture of the applicationThe architecture of a software system and the design patterns it relies on always evolve over time to adapt to new constraints, business requirements and increasing load.
You don't want the strategy you choose to manage the lifetime of your DbContext
to tie you to a specific architecture and prevent you from being able to evolve it as and when needed.
DbContext
is managed should be independent of the application typeWhile most applications today start off as web applications, the strategy you choose to manage the lifetime of your DbContext
shouldn't assume that your service method will be called from within the context a web request. More generally, your service layer (if you have one) should be independent of the type of application it's used from.
It won't be long until you need to create command-line utilities for your support team to execute ad-hoc maintenance tasks or Windows Services to handle scheduled tasks and long-running background operations. When this happens, you want to be able to reference the assembly that contains your services and just use any service you need from your console or Windows Service application. You most definitely don't want to have to completely re-engineer the way your DbContext
instances are managed just to be able to use your services from a different type of application.
DbContext
management strategy should support multiple DbContext-derived typesIf your application needs to connect to multiple databases (for example if it uses separate reporting, logging and / or auditing databases) or if you have split your domain model into multiple aggregate groups, you will have to manage multiple DbContext
-derived types.
For those coming from an NHibernate background, this is the equivalent of having to manage multiple SessionFactory
instances.
Whatever strategy you choose should be able to let services use the appropriate DbContext
for their need.
DbContext
management strategy should work with EF6's async workflowIn .NET 4.5, ADO.NET introduced (at very long last) support for async database queries. Async support was then included in Entity Framework 6, allowing you to use a fully async workflow for all read and write queries made through EF.
Needless to say that whatever system you use to manage your DbContext
instance must play well with Entity Framework's async features.
DbContext
's default behaviourIn general, DbContext
's default behaviour can be described as: "does the right thing by default".
There are several key behaviours of Entity Framework you should always keep in mind however. This list documents EF's behaviour when working against SQL Server. There might be differences when using other data stores.
DbContext
is not thread-safeYou must never access your DbContext
-derived instance from multiple threads simultaneously. This might result on multiple queries being sent concurrently over the same database connection. It will also corrupt the first-level cache that DbContext
maintains to offer its Identity Map, change tracking and Unit of Work functionalities.
In a multi-threaded application, you must create and use a separate instance of your DbContext
-derived class in each thread.
So if DbContext
isn't thread-safe, how can it support the async query features introduced with EF6? Simply by preventing more than one async operation being executed at any given time (as documented in the Entity Framework specifications for its async pattern support). If you attempt to execute multiple actions on the same DbContext
instance in parallel, for example by kicking off multiple SELECT queries in parallel via the the DbSet<T>.ToListAsync()
method, you will get a NotSupportedException
with the following message:
A second operation started on this context before a previous asynchronous operation completed. Use 'await' to ensure that any asynchronous operations have completed before calling another method on this context. Any instance members are not guaranteed to be thread safe.
Entity Framework's async features are there to support an asynchronous programming model, not to enable parallelism.
Any changes made to your entities, be it updates, inserts or deletes, are only persisted to the database when the DbContext.SaveChanges()
method is called. If a DbContext
instance is disposed before its SaveChanges()
method was called, none of the inserts, updates or deletes done through this DbContext
will be persisted to the underlying data store.
The canonical manner to implement a business transaction with Entity Framework is therefore:
using (var context = new MyDbContext(ConnectionString)) { /* * Business logic here. Add, update, delete data * through the 'context'. * * Throw in case of any error to roll back all * changes. * * Do not call SaveChanges() until the business * transaction is complete - i.e. no partial or * intermediate saves. SaveChanges() must be * called exactly once per business transaction. * * If you find yourself needing to call SaveChanges() * multiple times within a business transaction, it means * that you are in fact implementing multiple business * transactions within a single service method. * This is the perfect recipe for disaster. Clients of * your service class will naturally assume that your * service method will either commit or roll-back all * changes in an atomic manner when it might in fact * end up doing a partial roll-back, leaving the system * in an inconsistent state. * * In this case, refactor your service method into * multiple service methods that each implement once * and exactly one business transaction. */ [...] // Complete the business transaction // and persist all changes. context.SaveChanges(); // Changes cannot be rolled back after this point. // context.SaveChanges() should be the last statement // of any business transaction. }
If you're coming from an NHibernate background, the way Entity Framework persists changes to the database is one of the major differences between EF and NHibernate.
In NHibernate, the Session
operates by default in AutoFlush mode. In this mode, the Session
will automatically persists all changes made to entities to the database before executing any 'select' query, ensuring consistency between the persisted entities and their in-memory state within the context of a Session
. Entity Framework's default behaviour is the equivalent of setting Session.FlushMode
to Never
in NHibernate.
This EF behaviour can result in subtle bugs as it is possible to be in a situation where queries may unexpectedly return stale or incorrect data. This wouldn't be possible with NHibernate's default behaviour. On the other side, it dramatically simplifies the issue of database transaction lifetime management.
One of the trickiest issue in NHibernate is to correctly manage the database transaction lifetime. Since NHibernate's Session
can persists outstanding changes to the database automatically at any time throughout its lifetime and may do so multiple times within a single business transaction, there is no single, well-defined point or method where to start the database transaction to ensure that all changes are either committed or rolled-back in an atomic manner.
The only reliable method to correctly manage the database transaction lifetime with NHibernate is to wrap all your service methods in an explicit database transaction. This is what you'll see done in pretty much every NHibernate-based application.
A side-effect of this approach is that it requires keeping a database connection and transaction open for often longer than strictly necessary. It therefore increases database lock contention and the probability of database deadlocks occurring. It's also very easy for a developer to inadvertently execute a long-running computation or a remote service call without realizing or even knowing that they're within the context of an open database transaction.
With the EF approach, only the SaveChanges()
method must be wrapped in an explicit database transaction (unless you need a REPEATABLE READ or SERIALIZABLE isolation level of course), ensuring that the database connection and transaction are kept as short-lived as possible.
DbContext
doesn't start explicit database transactions for read queries. It instead relies on SQL Server's Autocommit Transactions (or Implicit Transactions if you've enabled them but that would be a relatively unusual setup). Autocommit (or Implicit) transactions will use whatever default transaction isolation level the database engine has been configured to use (READ COMMITTED by default for SQL Server).
If you've been around the block for a while, and particularly if you've used NHibernate before, you may have heard that AutoCommit (or Implicit) transactions are bad. And indeed, relying on Autocommit transactions for writes can have a disastrous impact on performance.
The story is very different for reads however. As you can see by yourself by running the SQL script below, neither Autocommit nor Implicit transactions have any significant performance impact for SELECT
statements.
/* * Execute 100,000 SELECT queries under autocommit, * implicit and explicit database transactions. * * These scripts assumes that the database they are * running against contains a Users table with an 'Id' * column of data type INT. * * If running from SQL Server Management Studio, * right-click in the query window, go to * Query Options -> Results and tick "Discard results * after execution". Otherwise, what you'll be measuring * will be the Result Grid redrawing performance and not * the query execution time. */ --------------------------------------------------- -- Autocommit transaction -- 6 seconds DECLARE @i INT SET @i = 0 WHILE @i < 100000 BEGIN SELECT Id FROM dbo.Users WHERE Id = @i SET @i = @i + 1 END --------------------------------------------------- -- Implicit transaction -- 6 seconds SET IMPLICIT_TRANSACTIONS ON DECLARE @i INT SET @i = 0 WHILE @i < 100000 BEGIN SELECT Id FROM dbo.Users WHERE Id = @i SET @i = @i + 1 END COMMIT; SET IMPLICIT_TRANSACTIONS OFF ---------------------------------------------------- -- Explicit transaction -- 6 seconds DECLARE @i INT SET @i = 0 BEGIN TRAN WHILE @i < 100000 BEGIN SELECT Id FROM dbo.Users WHERE Id = @i SET @i = @i + 1 END COMMIT TRAN
Obviously, if you need to use an isolation level higher than the default READ COMMITTED, all reads will need to be part of an explicit database transaction. In that case, you will have to start the transaction yourself - EF will not do this for you. But this would typically only be done on an ad-hoc basis for specific business transactions. Entity Framework's default behaviour should suit the vast majority of business transactions.
Entity Framework automatically wraps all the queries made by the DbContext.SaveChanges()
method in a single explicit database transaction, therefore ensuring that all the changes applied to the context are either committed or rolled-back in full.
It will use whatever default transaction isolation level the database engine has been configured to use (READ COMMITTED by default for SQL Server).
This is another major difference between EF and NHibernate. With NHibernate, database transactions are entirely in the hands of developers. NHibernate's Session
will never start an explicit database transaction automatically.
With Entity Framework 6, taking explicit control of the database transaction scope and isolation level is as simple as it should be:
using (var context = new MyDbContext(ConnectionString)) { using (var transaction = context.BeginTransaction(IsolationLevel.RepeatableRead)) { [...] context.SaveChanges(); transaction.Commit(); } }
An obvious side-effect of manually controlling the database transaction scope is that you are now forcing the database connection and transaction to remain open for the duration of the transaction scope.
You should be careful to keep this scope as short-lived as possible. Keeping a database transaction running for too long can have a significant impact on your application's performance and scalability. In particular, it's generally a good idea to refrain from calling other service methods within an explicit transaction scope - they might be executing long-running operations unaware that they have been invoked within an open database transaction scope.
As mentioned earlier, the AutoCommit transactions EF relies on for read queries and the explicit transaction it automatically starts when SaveChanges()
is called use whatever default isolation level the database engine has been configured with.
There's unfortunately no built-in way to override this isolation level. If you'd like to use another isolation level, you must start and manage the database transaction yourself.
TransactionScope
Alternatively, you can also use the TransactionScope
class to control the transaction scope and isolation level. The database connection that Entity Framework opens will enroll in the ambient TransactionScope
.
Prior to EF6, using TransactionScope
was the only practical way to control the database transaction scope and isolation level.
In practice, and unless you actually need a distributed transaction, you should avoid using TransactionScope
. TransactionScope
, and distributed transactions in general, are not necessary for most applications and tend to introduce more problems than they solve. EF's documentation has more details on working with TransactionScope
with Entity Framework if you really need distributed transactions.
DbContext
implements IDisposable
. Its instances should therefore be disposed of as soon as they're not needed anymore.
In practice however, and unless you choose to explicitly manage the database connection or transaction that the DbContext uses, not calling DbContext.Dispose()
won't cause any issues as Diego Vega, a EF team member, explains.
This is good news as a lot of the code you'll find in the wild fails to dispose of DbContext
instances properly. This is particularly the case for code that attempts to manage DbContext
instance lifetimes via a DI container, which can be a lot trickier than it sounds.
A DI container like StructureMap for example doesn't support decommissioning the components it created. As a result, if you rely on StructureMap to create your DbContext
instances, they will never be disposed of, regardless of what lifecycle you choose for them. The only correct way to manage disposable components with a DI container like this is to significantly complicate your DI configuration and use nested dependency injection containers as Jeremy Miller demonstrates.
A key decision you'll have to make at the start of any Entity Framework-based project is how your code will handle passing the DbContext
instances down to the method / layer that will make the actual database queries.
As we've seen above, the responsibility of creating and disposing the DbContext
lies with the top-level service methods. The data access code, i.e. the code that actually uses the DbContext
instance, will however often be made in a separate part of the code - be it in a private method deep down the service implementation, in a query object or in a separate repository layer.
The DbContext
instance that the top-level service method creates must therefore somehow find its way down to these methods.
There are 3 school of thoughts when it comes to making the DbContext
instance available to the data access code: ambient, explicit or injected. Each approach has its pros and cons, which we'll examine now.
With the explicit DbContext
approach, the top-level service method creates a DbContext
instance and simply passes it down the stack as a method parameter until it finally reaches the method that implements the data access part. In a traditional 3-tier architecture with both a service and a repository layer, this would look like this:
public class UserService : IUserService { private readonly IUserRepository _userRepository; public UserService(IUserRepository userRepository) { if (userRepository == null) throw new ArgumentNullException("userRepository"); _userRepository = userRepository; } public void MarkUserAsPremium(Guid userId) { using (var context = new MyDbContext()) { var user = _userRepository.Get(context, userId); user.IsPremiumUser = true; context.SaveChanges(); } } } public class UserRepository : IUserRepository { public User Get(MyDbContext context, Guid userId) { return context.Set<User>().Find(userId); } }
(in this intentionally contrived example, the repository layer is of course completely pointless. In a real-work application, you would expect the repository layer to be a lot richer. In addition, you could of course abstract your DbContext
behind an "IDbContext" of sorts and create it via an abstract factory if you really didn't want to have to have a direct dependency on Entity Framework in your services. The principle would remain the same).
This approach is by far and away the simplest approach. It results in code that's very easy to understand and maintain, even by developers new to the code base.
There's no magic anywhere. The DbContext
instance doesn't materialize out of thin air. There's a clear and obvious place where the context is created. And it's really easy to climb up the stack and find it if you're wondering where a particular DbContext
instance is coming from.
The main drawback of this approach is that it requires you to pollute all your repository methods (if you have a repository layer) as well as most of your service methods with a mandatory DbContext
parameter (or some sort of IDbContext
abstraction if you don't want to be tied to a concrete implementation - but the point still stands). You could see this as being a sort of Method Injection pattern.
That your repository methods require to be provided with an explicit DbContext
parameter isn't too much of an issue. In fact, it can even be seen as a good thing as it removes any potential ambiguity as to which context they'll run their queries against.
Things are quite different in your service layer however. Chances are that most of your service methods won't use the DbContext
at all, particularly if you've isolated your data access code away in query objects or in a repository layer. As a result, these methods will only require to be provided with a DbContext
parameter so that they can pass it down the line until it eventually reaches whatever method actually uses it.
It can get quite ugly. Particularly if your application uses multiple DbContext
, resulting in service methods potentially requiring two or more mandatory DbContext
parameters. It also muddies your method contracts as your service method are now forced to ask for a parameter that they neither need nor use but require purely to satisfy the dependency of a downstream method.
Jon Skeet wrote an interesting article on the topic of explicitness vs ambient but couldn't come up with a good solution either.
Nevertheless, the simplicity and foolproofness of this approach is hard to beat.
NHibernate users will be very familiar with this approach as the ambient context pattern is the predominant approach used in the NHibernate world to manage NH's Session
(NHibernate's equivalent to EF's DbContext
). NHibernate even comes with built-in support for this pattern, which it calls contextual sessions.
In .NET itself, this pattern is used quite extensively. You've probably already used HttpContext.Current
or the TransactionScope
class, both of which rely on the ambient context pattern.
With this approach, the top-level service method not only creates the DbContext
to use for the current business transaction but it also registers it as the ambient DbContext
. The data access code can then just retrieve the ambient DbContext
whenever it needs it. No need to pass the DbContext
instance around anymore.
Anders Abel has written a simple implementation of an ambient DbContext that relies on a ThreadStatic
variable to store the ambient DbContext
. Have a look - there's less to it than it sounds.
The advantages of this approach are obvious. Your service and repository methods are now free of DbContext
parameters, making your interfaces cleaner and your method contracts clearer as they can now only request the parameters that they actually need to do their job. No need to pass DbContext
instances all over the place anymore.
As with the explicit approach, the creation and disposal of the DbContext
instance is in a clear, well-defined and logical place.
This approach does however introduce a certain amount of magic which can certainly make the code more difficult to understand and maintain. When looking at the data access code, it's not necessarily easy to figure out where the ambient DbContext
is coming from. You just have to hope that someone somehow registered it before calling the data access code.
If your application uses multiple DbContext
classes, e.g. if it connects to multiple databases or if you have split your domain model into separate model groups, it can be difficult for the top-level service method to know which DbContext
object(s) it must create and register. With the explicit approach, the data access methods require to provided with whatever DbContext
object they need as a method parameter. There is therefore no ambiguity possible. But with an ambient context approach, the top-level service method must somehow know what DbContext
type the downstream data access code will require. There are ways to solve this issue in a fairly clean manner however as we'll see later.
Finally, the ambient DbContext
example I linked to above works fine in a single-threaded model. But if you intend to use Entity Framework's async query feature, this won't fly. After an async operation, you will most likely find yourself in another thread than the one where the DbContext
was created. In many cases (although not in all cases - this is where async gets tricky), it means that your ambient DbContext
will be gone. This is fixable as well but it will require some advanced understanding of how multi-threading, the TPL and async works behind the scenes in .NET. We'll have a look at this later in this post.
Last but not least, the injected DbContext
approach is the most often mentioned strategy in articles and blog posts addressing the issue of managing the DbContext
lifetime.
With this approach, you let your DI container manage the lifetime of your DbContext
and inject it into whatever component needs it (your repository objects for example).
This is what it looks like:
public class UserService : IUserService { private readonly IUserRepository _userRepository; public UserService(IUserRepository userRepository) { if (userRepository == null) throw new ArgumentNullException("userRepository"); _userRepository = userRepository; } public void MarkUserAsPremium(Guid userId) { var user = _userRepository.Get(context, userId); user.IsPremiumUser = true; } } public class UserRepository : IUserRepository { private readonly MyDbContext _context; public UserRepository(MyDbContext context) { if (context == null) throw new ArgumentNullException("context"); _context = context; } public User Get(Guid userId) { return _context.Set<User>().Find(userId); } }
You then need to configure your DI container to create an instance of the DbContext
with an appropriate lifetime on object graph creation. A common advice you'll find is to use a PerWebRequest lifetime for web apps and PerForm lifetime for desktop apps.
The advantage here is similar to that of the ambient approach: the code isn't littered with DbContext
instances being passed all over the place. This approach goes one step further still: there is no DbContext
to be seen anywhere in the service code. The service is completely oblivious of Entity Framework. Which might sound good a first sight but quickly leads to a lot of problems.
Despite its popularity, this approach has significant drawbacks and limitations. It's important to understand them before adopting this approach.
The first issue is that this approach relies very heavily on magic. And when it comes to managing the correctness and consistency of your data - your most precious asset - magic isn't a word you want to hear too often.
Where do these DbContext
instances come from? How and where is the business transaction boundary defined? If a service depends on two different repositories, will they both have access to the same DbContext
instance or will they each have their own instance?
If you're a back-end developer working on a EF-based project, you must know the answers to these questions if you want to be able to write correct code.
The answers here aren't obvious and will require you to pour through your DI container configuration code to find out. And as we've seen earlier, getting this configuration right isn't as trivial as it may seem at first sight and may end up being fairly complex and / or subtle.
Perhaps the most glaring issue in the code sample above is: who is responsible for committing changes to the data store? I.e. who is calling the DbContext.SaveChanges()
method? It's unclear.
You could inject the DbContext
into your service for the sole purpose of calling its SaveChanges()
method. That would be rather baffling and very error-prone code. Why would the service method call SaveChanges()
on a context object that it neither created nor used? What changes would be saved?
Alternatively, you could define a SaveChanges()
method on all your repositories, which would just delegate to the underlying DbContext
. The service method would then just call SaveChanges()
on the repository itself. This would be very misleading code, as it would imply that each repository implement their own unit-of-work and can persist their changes independently of the other repositories. Which would of course be incorrect as they would in fact all use the same DbContext
instance behind the scenes.
Another approach sometimes seen in the wild is to let the DI container call SaveChanges()
before decommissioning the DbContext
instance. A disastrous approach that would merit a blog post of its own to examine.
In short: the DI container is an infrastructure-level component - it has no knowledge of the business logic the components it manages implement. The DbContext.SaveChanges()
method on the other side defines a business transaction boundary - i.e. it's a business logic concern (and a critical one at that). Mixing those two unrelated concerns together will quickly cause a lot of pain.
All that being said, if you subscribe to the Repository is Dead movement, the issue of defining who is calling DbContext.SaveChanges()
shouldn't arise as your services will use the DbContext
instance directly. They will therefore be the natural place for SaveChanges()
to be called.
There is however a number of other issues you will run into with an injected DbContext
regardless of the architectural style of your application.
A notable one is that DbContext
isn't a service. It's a resource. And a Disposable one to boot. By injecting it into whatever layer implement your data access, you're making that layer, and by extension all the layers above which would be pretty much the entire application, stateful.
It's not the end of the world but it certainly complicates DI container configuration. Having stateless services provides tremendous flexibility and makes the configuration of their lifetime a non-issue (any lifetime would do and singleton is often your best bet). As soon as you introduce stateful services, careful consideration has to be given to your service lifetimes.
It often starts off easy (PerWebRequest or Transient lifetime for everything which suits a simple web app well) and then descends into more complexity as console apps, Windows Services and others inevitably make their appearance.
Another issue (related to the previous one) that will inevitably bite you quite hard is that an injected DbContext
prevents you from being able to introduce multi-threading or any sort of parallel execution flows in your services.
Remember that DbContext
(just like Session
in NHibernate) isn't thread-safe. If you need to execute multiple tasks in parallel in a service, you must make sure that each task works against its own DbContext
instance or the whole thing will blow up at runtime. This is impossible to do with the injected DbContext approach since the service isn't in control of the DbContext
instance creation and doesn't have any way to create new ones.
How can you fix this? Not easily.
Your first instinct is probably to change your services to depend on a DbContext factory instead of depending directly on a DbContext. That would allow them to create their own DbContext
instances when needed. But that would effectively defeat the whole point of the injected DbContext
approach. If services create their own DbContext instances via a factory, these instances can't be injected anymore. Which means that services will have to explicitly pass those DbContext
instances down the layers to whatever components need them (e.g. the repositories). So you're effectively back to the explicit DbContext approach discussed earlier. I can think of a few ways in which this could be solved but all of them feel more like hacks than clean and elegant solutions.
Another way to approach the issue would be to add a few more layers of complexity, introduce a queuing middleware like RabbitMQ and let it distribute the workload for you. Which may or may not work depending on why you need to introduce parallelism. But in any case, you may neither need nor want the additional overhead and complexity.
With an injected DbContext
, you're simply better off limiting yourself to single-threaded code or at least to a single logical flow of execution. Which is perfectly fine for many applications but it will become a major limitation in certain cases.
Time to look at a better way to manage those DbContext
instances.
The approach presented below relies on DbContextScope
, a custom component that implements the ambient DbContext approach presented earlier. The full source code for DbContextScope
and the classes it depends on is on GitHub.
If you're familiar with the TransactionScope
class, then you already know how to use a DbContextScope
. They're very similar in essence - the only difference is that DbContextScope
creates and manages DbContext
instances instead of database transactions. But just like TransactionScope
, DbContextScope
is ambient, can be nested, can have its nesting behaviour disabled and works fine with async execution flows.
This is the DbContextScope
interface:
public interface IDbContextScope : IDisposable { void SaveChanges(); Task SaveChangesAsync(); void RefreshEntitiesInParentScope(IEnumerable entities); Task RefreshEntitiesInParentScopeAsync(IEnumerable entities); IDbContextCollection DbContexts { get; } }
The purpose of a DbContextScope
is to create and manage the DbContext
instances used within a code block. A DbContextScope
therefore effectively defines the boundary of a business transaction. I'll explain later why I didn't name it "UnitOfWork" or "UnitOfWorkScope", which would have been a more commonly used terminology for this.
You can instantiate a DbContextScope
directly. Or you can take a dependency on IDbContextScopeFactory
, which provides convenience methods to create a DbContextScope
with the most common configurations:
public interface IDbContextScopeFactory { IDbContextScope Create(DbContextScopeOption joiningOption = DbContextScopeOption.JoinExisting); IDbContextReadOnlyScope CreateReadOnly(DbContextScopeOption joiningOption = DbContextScopeOption.JoinExisting); IDbContextScope CreateWithTransaction(IsolationLevel isolationLevel); IDbContextReadOnlyScope CreateReadOnlyWithTransaction(IsolationLevel isolationLevel); IDisposable SuppressAmbientContext(); }
With DbContextScope
, your typical service method would look like this:
public void MarkUserAsPremium(Guid userId) { using (var dbContextScope = _dbContextScopeFactory.Create()) { var user = _userRepository.Get(userId); user.IsPremiumUser = true; dbContextScope.SaveChanges(); } }
Within a DbContextScope
, you can access the DbContext
instances that the scope manages in two ways. You can get them via the DbContextScope.DbContexts
property like this:
public void SomeServiceMethod(Guid userId) { using (var dbContextScope = _dbContextScopeFactory.Create()) { var user = dbContextScope.DbContexts.Get<MyDbContext>.Set<User>.Find(userId); [...] dbContextScope.SaveChanges(); } }
But that's of course only available in the method that created the DbContextScope
. If you need to access the ambient DbContext
instances anywhere else (e.g. in a repository class), you can just take a dependency on IAmbientDbContextLocator
, which you would use like this:
public class UserRepository : IUserRepository { private readonly IAmbientDbContextLocator _contextLocator; public UserRepository(IAmbientDbContextLocator contextLocator) { if (contextLocator == null) throw new ArgumentNullException("contextLocator"); _contextLocator = contextLocator; } public User Get(Guid userId) { return _contextLocator.Get<MyDbContext>.Set<User>().Find(userId); } }
Those DbContext
instances are created lazily and the DbContextScope
keeps track of them to ensure that only one instance of any given DbContext type is ever created within its scope.
You'll note that the service method doesn't need to know which type of DbContext
will be required during the course of the business transaction. It only needs to create a DbContextScope
and any component that needs to access the database within that scope will request the type of DbContext
they need.
A DbContextScope
can of course be nested. Let's say that you already have a service method that can mark a user as a premium user like this:
public void MarkUserAsPremium(Guid userId) { using (var dbContextScope = _dbContextScopeFactory.Create()) { var user = _userRepository.Get(userId); user.IsPremiumUser = true; dbContextScope.SaveChanges(); } }
You're implementing a new feature that requires being able to mark a group of users as premium within a single business transaction. You can easily do it like this:
public void MarkGroupOfUsersAsPremium(IEnumerable<Guid> userIds) { using (var dbContextScope = _dbContextScopeFactory.Create()) { foreach (var userId in userIds) { // The child scope created by MarkUserAsPremium() will // join our scope. So it will re-use our DbContext instance(s) // and the call to SaveChanges() made in the child scope will // have no effect. MarkUserAsPremium(userId); } // Changes will only be saved here, in the top-level scope, // ensuring that all the changes are either committed or // rolled-back atomically. dbContextScope.SaveChanges(); } }
(this would of course be a very inefficient way to implement this particular feature but it demonstrates the point)
This makes creating a service method that combines the logic of multiple other service methods trivial.
If a service method is read-only, having to call SaveChanges()
on its DbContextScope
before returning can be a pain. But not calling it isn't an option either as:
SaveChanges()
or did you forget to call it?)SaveChanges()
will result in the transaction being rolled back. Database monitoring systems will usually interpret transaction rollbacks as an indication of an application error. Having spurious rollbacks is not a good idea.The DbContextReadOnlyScope
class addresses this issue. This is its interface:
public interface IDbContextReadOnlyScope : IDisposable { IDbContextCollection DbContexts { get; } }
And this is how you use it:
public int NumberPremiumUsers() { using (_dbContextScopeFactory.CreateReadOnly()) { return _userRepository.GetNumberOfPremiumUsers(); } }
DbContextScope
works with async execution flows as you would expect:
public async Task RandomServiceMethodAsync(Guid userId) { using (var dbContextScope = _dbContextScopeFactory.Create()) { var user = await _userRepository.GetAsync(userId); var orders = await _orderRepository.GetOrdersForUserAsync(userId); [...] await dbContextScope.SaveChangesAsync(); } }
In the example above, the OrderRepository.GetOrdersForUserAsync()
method will be able to see and access the ambient DbContext instance despite the fact that it's being called in a separate thread than the one where the DbContextScope
was initially created.
This is made possible by the fact that DbContextScope
stores itself in the CallContext. The CallContext automatically flows through async points. If you're curious about how it all works behind the scenes, Stephen Toub has written an excellent blog post about it. But if all you want to do is use DbContextScope
, you just have to know that: it just works.
WARNING: There is one thing that you must always keep in mind when using any async flow with DbContextScope
. Just like TransactionScope
, DbContextScope
only supports being used within a single logical flow of execution.
I.e. if you attempt to start multiple parallel tasks within the context of a DbContextScope
(e.g. by creating multiple threads or multiple TPL Task
), you will get into big trouble. This is because the ambient DbContextScope
will flow through all the threads your parallel tasks are using. If code in these threads need to use the database, they will all use the same ambient DbContext
instance, resulting the same the DbContext
instance being used from multiple threads simultaneously.
In general, parallelizing database access within a single business transaction has little to no benefits and only adds significant complexity. Any parallel operation performed within the context of a business transaction should not access the database.
However, if you really need to start a parallel task within a DbContextScope
(e.g. to perform some out-of-band background processing independently from the outcome of the business transaction), then you must suppress the ambient context before starting the parallel task. Which you can easily do like this:
public void RandomServiceMethod() { using (var dbContextScope = _dbContextScopeFactory.Create()) { // Do some work that uses the ambient context [...] using (_dbContextScopeFactory.SuppressAmbientContext()) { // Kick off parallel tasks that shouldn't be using the // ambient context here. E.g. create new threads, // enqueue work items on the ThreadPool or create // TPL Tasks. [...] } // The ambient context is available again here. // Can keep doing more work as usual. [...] dbContextScope.SaveChanges(); } }
This is an advanced feature that I would expect most applications to never need. Tread carefully when using this as it can create tricky issues and quickly lead to a maintenance nightmare.
Sometimes, a service method may need to persist its changes to the underlying database regardless of the outcome of overall business transaction it may be part of. This would be the case if:
In that case, you can pass a value of DbContextScopeOption.ForceCreateNew
as the joiningOption
parameter when creating a new DbContextScope
. This will create a DbContextScope
that will not join the ambient scope even if one exists:
public void RandomServiceMethod() { using (var dbContextScope = _dbContextScopeFactory.Create(DbContextScopeOption.ForceCreateNew)) { // We've created a new scope. Even if that service method // was called by another service method that has created its // own DbContextScope, we won't be joining it. // Our scope will create new DbContext instances and won't // re-use the DbContext instances that the parent scope uses. [...] // Since we've forced the creation of a new scope, // this call to SaveChanges() will persist // our changes regardless of whether or not the // parent scope (if any) saves its changes or rolls back. dbContextScope.SaveChanges(); } }
The major issue with doing this is that this service method will use separate DbContext
instances than the ones used in the rest of that business transaction. Here are a few basic rules to always follow in that case in order to avoid weird bugs and maintenance nightmares:
If you force the creation of a new DbContextScope
(and therefore of new DbContext
instances) instead of joining the ambient one, your service method must never return persistent entities that were created / retrieved within that new scope. This would be completely unexpected and will lead to humongous complexity.
The client code calling your service method may be a service method itself that created its own DbContextScope
and therefore expects all service methods it calls to use that same ambient scope (this is the whole point of using an ambient context). It will therefore expect any persistent entity returned by your service method to be attached to the ambient DbContext
.
Instead, either:
DbContext
instance if it needs the actual object.If your service method forces the creation of a new DbContextScope
and then modifies persistent entities in that new scope, it must make sure that the parent ambient scope (if any) can "see" those modification when it returns.
I.e. if the DbContext
instances in the parent scope had already loaded the entities you modified in their first-level cache (ObjectStateManager), your service method must force a refresh of these entities to ensure that the parent scope doesn't end up working with stale versions of these objects.
The DbContextScope
class has a handy helper method that makes this fairly painless:
public void RandomServiceMethod(Guid accountId) { // Forcing the creation of a new scope (i.e. we'll be using our // own DbContext instances) using (var dbContextScope = _dbContextScopeFactory.Create(DbContextScopeOption.ForceCreateNew)) { var account = _accountRepository.Get(accountId); account.Disabled = true; // Since we forced the creation of a new scope, // this will persist our changes to the database // regardless of what the parent scope does. dbContextScope.SaveChanges(); // If the caller of this method had already // loaded that account object into their own // DbContext instance, their version // has now become stale. They won't see that // this account has been disabled and might // therefore execute incorrect logic. // So make sure that the version our caller // has is up-to-date. dbContextScope.RefreshEntitiesInParentScope(new[] { account }); } }
The first version of the DbContextScope
class I wrote was actually called UnitOfWork
. This is arguably the most commonly used name for this type of component.
But as I tried to use that UnitOfWork
component in a real-world application, I kept getting really confused as to how I was supposed to use it and what it really did. This is despite the fact that I was the one who researched, designed and implemented it and despite the fact that I knew what it did and how it worked inside-out. Yet, I kept getting myself confused and had to often take a step back and think hard about how this "unit of work" related to the actual problem I was trying to solve: managing my DbContext instances.
If even I, who had spent a significant amount of time researching, designing and implementing this component, kept getting confused when trying to use it, there clearly wasn't a hope that anyone else would find it easy to use it.
So I renamed it DbContextScope
and suddenly everything became clearer.
The main issue I had with the UnitOfWork
I believe is that at the application-level, it often doesn't make a lot of sense. At the lower levels, for example at the database level, a "unit of work" is a very clear and concrete concept. This is Martin Fowler's definition of a unit of work:
Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.
There is no ambiguity at to what a unit of work means at the database level.
At the application level however, a "unit of work" is a very vague concept that could mean everything and nothing. And it's certainly not clear how this "unit of work" relates to Entity Framework, to the issue of managing DbContext
instances and to the problem of ensuring that the persistent entities we're manipulating are attached to the right DbContext instance.
As a result, any developer trying to use a "UnitOfWork
" would have to pour through its source code to find out what it really does. The definition of the unit of work pattern is simply too vague to be useful at the application level.
In fact, for many applications, an application-level "unit of work" doesn't even make any sense. Many applications will have to use several non-transactional services during the course of a business transaction, such as remote APIs or non-transactional legacy components. The changes made there cannot be rolled back. Pretending otherwise and is counter-productive, confusing and makes it even harder to write correct code.
A DbContextScope
on the other side does what it says on the tin. Nothing more, nothing less. It doesn't pretend to be what it's not. And I've found that this simple name change significantly reduced the cognitive load required to use that component and to verify that it was being used correctly.
Of course, naming this component DbContextScope
means that you can't hide the fact that you're using Entity Framework from your services anymore. UnitOfWork
is a conveniently vague term that allows you to abstract away the persistence mechanism used in the lower layers. Whether or not abstracting EF away from your service layer is a good thing is another debate that we won't get into here.
The source code on GitHub includes a demo application that demonstrates the most common use-cases.
DbContextScope
worksThe source code is well commented and I would encourage you to read through it. In addition, this excellent blog post by Stephen Toub on the ExecutionContext is a mandatory read if you'd like to fully understand how the ambient context pattern was implemented in DbContextScope
.
The personal blog of Rowan Miller, the program manager for the Entity Framework team, is a must-read for any developer working on an Entity Framework-based application.
An Entity Framework anti-pattern commonly seen in the wild is to implement the creation and disposal of DbContext
in data access methods (e.g. in repository methods in a traditional 3-tier application). It usually looks like this:
public class UserService : IUserService { private readonly IUserRepository _userRepository; public UserService(IUserRepository userRepository) { if (userRepository == null) throw new ArgumentNullException("userRepository"); _userRepository = userRepository; } public void MarkUserAsPremium(Guid userId) { var user = _userRepository.Get(userId); user.IsPremiumUser = true; _userRepository.Save(user); } } public class UserRepository : IUserRepository { public User Get(Guid userId) { using (var context = new MyDbContext()) { return context.Set<User>().Find(userId); } } public void Save(User user) { using (var context = new MyDbContext()) { // [...] // (either attach the provided entity to the context // or load it from the context and update its properties // from the provided entity) context.SaveChanges(); } } }
By doing this, you're loosing pretty much every feature that Entity Framework provides via the DbContext
, including its 1st-level cache, its identity map, its unit-of-work, and its change tracking and lazy-loading abilities. That's because in the scenario above, a new DbContext
instance is created for every database query and disposed immediately afterwards, hence preventing the DbContext
instance from being able to track the state of your data objects across the entire business transaction.
You're effectively reducing Entity Framework to a basic ORM in the literal sense of the term: an mapper from your objects to their relational representation in the database.
There are some applications where this type of architecture does make sense. If you're working on such an application, you should however ask yourself why you're using Entity Framework in the first place. If you're going to use it as a basic ORM and won't use any of the features that it provides on top of its ORM capabilities, you might be better off using a lightweight ORM library such as Dapper. Chances are it would simplify your code and offer better performance by not having the additional overhead that EF introduces to support its additional functionalities.