Should I directly connect to a custom database from Sitecore?
If you’re in a situation where you need a custom database for whatever reason, your first reaction might be “Hey, let’s just chuck in a connection string, whack in Entity Framework or ADO.NET, easy!”
STOP! It’s not safe!
What do I mean by safe here? Safety in development refers to both the short-term and long-term effects of a change. Short-term considerations could include whether it is likely to need manual intervention to get the change deployed, or is it totally automated. Long-term considerations could include maintainability — is it going to come back to bite you in the ass and cause a major headache?
Introducing a new database into a system should be considered a major architectural change and should not be undertaken lightly. And when it comes to Sitecore (XM / XP), even more so.
Let me run you through a real-world example of where this has happened.
Case Study
The year is 2014 and the effects of the GFC are finally wearing off, sparking new projects to kick off. A large enterprise has budget for a full rewrite of their site and has chosen Sitecore XP. Included in the scope is an online portal where users can create accounts and log in, and update various details.
Enter the requirement for a custom database for all these details. The developers add Entity Framework as a dependency to the Sitecore solution and create the database. They use EF code-first migrations to manage the schema, committing the migration to the Sitecore solution repo.
Developer overhead for Sitecore front-end developers immediately increases:
- Devs need a basic understanding of a new complex dependency (EF) — training/upskilling
- Additional environment setup time
- Make sure the migrations have been run locally and deal with conflicts if multiple people are changing the schema
- Managing the data if it gets out of date on the local machine
- EF.NET has a notoriously slow startup time which affects dev productivity
The developers had the best of intentions, implementing the Repository pattern for all data access. We could find a CustomerRepository, OrderRepository, EnquiriesRepository. As the solution grew, the different types of data needed grew, we started seeing different repositories injected into each other via Dependency Injection in an attempt to hold up DRY (don’t repeat yourself). That soon turned into DI hell and so wrapper repositories were added. The DI framework used at the time (Autofac) was stretched to the limit to try and handle all of this, and yet still edge cases came up and we’d find developers creating new DbContexts directly in Sitecore components rather than deal with this mess of Repositories.
Then a new challenge started coming in: a new dev team has been hired to build back-office dashboards and they need to read and update the database, and even modify the schema. Suddenly we’re seeing commits in the Sitecore codebase from a team that has no idea about Sitecore, and the team’s branching strategy and release cadence.
Then comes time to upgrade to Sitecore 9, which brought in even more challenges:
- New environments getting spun up still need to deal with this overhead
- Package conflicts between dependent versions meant that it’s time to upgrade Entity Framework whether you like it or not
- Sitecore dropped support for Autofac, suddenly all those clever DI container registrations were in vain and had to be rewritten for the Microsoft DI API.
What a headache!
And yes, I can go on with the ramifications of this: some years later, parts of the application needed to be abstracted out of Sitecore to an API layer so that other channels could use it, suddenly we were in the scenario where multiple applications are accessing the same database!
So, what should we do instead?
Build a Middleware API.
Back in those days — being that the team are .NET developers — the best choice would have been to create a new ASP.NET WebAPI2 project, add Entity Framework to that, and design and create API endpoints to provide the data that Sitecore requires.
This is even easier these days — .NET 6 provides such an easy way to scaffold new API projects that are automatically self-documenting via the OpenAPI spec and the Swagger UI, and have made it super easy to add authentication if it’s required.
Let’s look at the problems above and how this solves them:
- Front-end developers don’t need to worry about data access anymore — they can just work with the API
- Environment setup time is streamlined — you could have a shared API server as default and the developer only needs to set it up if they need to make changes
- This abstraction makes it possible to share a database across the team too if that is beneficial
- Sitecore XM/XP is heavy-weight enough without adding large dependencies like Entity Framework — let’s keep it out the codebase and make upgrades easier and the dev experience better
- Developers have to stop and think twice when building data models, and how they are exposing those via an API. This promotes taking the time to create good abstractions instead of quickly creating a DbContext just to pull one bit of data they need on their component, which could become a major performance concern.
- Non-Sitecore developers could potentially make changes without disrupting the front-end.
- Bug fixing logic issues that exist on the middleware layer becomes easier and a much faster turnaround time to deploy these changes.
- In this world of Composable DXP, the API could even be made publicly accessible (with the proper security in front of it of course) and let Headless front-end solutions interact with it. Another good option here is to simply proxy requests through Sitecore to the API, and control access within Sitecore if that framework is already in place.
What about YAGNI?
Over-architecting a solution is definitely a concern, but when it comes to mixing custom databases and Sitecore, I’m afraid that I can’t think of any reason to not implement a middleware API. My reasoning is that: if the customer is “enterprise” enough to afford Sitecore licensing fees, then they are most certainly enterprise enough to invest in good software architecture.
Another big reason, is that once a database is embedded into Sitecore, it is very expensive to rip it out and abstract it properly, and therefore it becomes a legitimately very hard sell to management to get budget to change it. The high cost to remove it becomes insurmountable, and the business could most likely just accept all the overheads outlined above.
Any exceptions?
Ok, fine, I do have an exception: where it is an optional dependency. For example the Sidekick Audit Log which allows you to log to a SQL database.
Conclusion
If you’ve read this far you’re probably one of two people:
- Someone who has gone through similar pain and is nodding along to what I’m saying
- Someone who needs to use a custom database and is thinking about their solution architecture.
If you’re in the latter group, congrats! Just by doing some reading here and hopefully taking these points into account, you could be saving the business $000’s in the future (and you totally deserve a bonus).
Drop me a comment if you have any more thoughts on the subject!