Doubling Data for Performance Testing

Or: The Most Impressive T-SQL Script I’ve Ever Written

I was recently working on a new application. After three months in the field, users were starting to complain about performance issues. We had done some limited performance tuning for the first release, and more as part of the second release, but new issues were popping up as more data got entered into the system. We could have continued fixing issues as they came up, one release at a time, but we wanted to get ahead of the problem, and the client wanted to know that the system would remain usable without developer intervention for a few years at least.

The nature of the business was such that therate of dataentry and number of concurrent users shouldn’t change much over time. This meant that increasing the amount of data would be a good approximation of how the system would look in the future. How do you increase the amount of data in a normalized relational database? It would be difficult to enter realistic test data manually, and unrealistic test data can cause unrealistic test results. Bad results means we waste time fixing issues that wouldn’t appear, and never get a chance to detect issues that will.

I proposed creating a T-SQL script that would double all the existing data. Running it once would simulate six months of usage, running it again simulates a year, twice more and we’re simulating four years of usage. Once it was approved, it took me a couple days to finish, about half of that going to testing and debugging. The result is the most impressive T-SQL script I’ve ever written.

The script I wrote didn’t do the actual duplication, it combined hand-entered metadata with data from sys.tables and sys.columns to generate a much bigger script. Not only was it faster to write this way, but it made it easier to fix issues, and it allowed the script to be reused after columns were added to the database.

I had one table variable with a list of tables to copy, and another defining the relationships between foreign keys and their source tables. Most foreign keys were named the same in all tables that they appeared in, so a single mapping was often enough for all of them. Most of the mappings could be determined from the list of tables itself, so I only had a few other relationships and special column rules that had to be entered manually.

Another factor that helped was our use of guids for all primary keys. Because the can be determined before inserting a row, it was possible to generate themapping from old to newkey at the start of the script. I couldalso use a single insert statement for each table, and the order of execution only mattered where foreign key constraints existed.

The results were tremendous. We found a bunch of issues we wouldn’t have found otherwise, and had a fairly solid indication of how the application would behave years in the future.

When to Add an ORM Tool

I’m working on the code that parses VCalendar data so that it can be processed. Im copying the data I care about into a simple data structure that can represent a calendar request in any format. Any logic that interacts with calendar requests would use this internal structure. I want it to be simple, only having the stuff that I need, but I don’t want to completely re-invent the wheel either, so I will use the VCalendar format as a guideline.

Ive used this pattern of processing an established or complex data format into a simple proprietary one before with good results. It allows you to isolate the complexities of a data format in the code that processes it, keeping the rest of your code clean. Like everything, there are cases where I wouldn’t use it, but for this kind of scenario it should work well.

Now that I’m building this neutral representation, my next thought is how it will be saved to the database. I don’t need a database at this stage of the project, so part of me wants to ignore the decision until later, but the whole point of the system is to persist and interact with these types, so it’s important that I get it right. If I make simple POCO classes now, and start writing a bunch of code that uses them, I might have to change a lot of code later if I want to switch to types generated by Entity Framework. I could write my own custom code to read and write my own types from the database, then I can use any type I want without restrictions, but it would be a waste to write this plumbing code myself when an ORM can do it faster and better.

Creating a table design now wouldn’t be a simple matter either. I don’t know enough about the needs of the scheduling service to know which parts of the VCalendar format to bring over. If I try to guess now I know I’ll bring over a bunch of stuff that I don’t need, but starting with a simple table and adding fields every time I need one is no fun either. Adding a database to a system is like attaching a ball and chain, and I want to wait until I can be sure I have my model correct before I do it.

It’s need to make a decision, so here it is: I’m going to keep building my own types, and try to keep them. When it comes time to hook up a database, I’ll play around with the new POCO support in Entity Framework 4, and if that fails I’ll try another ORM tool. I may need to change my model a bit to suit the ORM, but Im hoping that it wont need to change much.

SQLite vs. SQL CE 3.5 with Entity Framework 4

My wife and I have been taking a Japanese class. Im enjoying it quite a bit, even though learning languages has never been easy for me. I need to work hard to memorize all the new words, and that means lots and lots of practice.

My wife bought a pack of index cards. This is what a normal person would do. To me, it seems like defeat. Why use paper when theres a computer in the room? While shes at the store, I grab a white board and sketch out a data model.

I used WPF for a UI, two simple view models, and Entity Framework 4 to store the lists of characters, words, and phrases. I started out with SQL Compact Edition 3.5 as a database because that was the one that was there. That didnt work out so well, so on the advice of a colleague I switched my app to SQLite.

This is what I found:

SQL CE 3.5 SQLite
Pros
  • Can be created easily from the VS designer
  • Can view and modify data inside Visual Studio
  • Supports Identity Columns
  • Easy to deploy
  • Lots of good tools available for managing databases
  • Can be used from non .Net and even non Windows applications
Cons
  • Doesnt support Identity Columns in EF
  • Requires extra installation on client computers
  • Generating tables from a model is painful
  • Many model changes require regenerating the database
  • Doesnt map Integer type correctly
  • Doesn’t support foreign key constraints
  • Generating tables from a model is somewhat painful

SQL Compact Edition 3.5

Before I start dumping on SQL CE, I should say that I was trying the 3.5 version, not the fancy new 4.0 version. Maybe they’ve fixed all these issues.

Creating the model was easy, but as I went through a few iterations of the data model I quickly became frustrated with the inability to change the database to match the new model. I’m sure it’s possible to change it, but you have to write the scripts yourself and hope that you’ve named everything exactly right. Entity Framework created a handy database-generation script, but it won’t run directly on as SQL CE database because itdoesn’t support the GO statement. Copy and pasting each section of the script is annoying, but still better than typing all the names manually.

The realshock was adding data from inside my app. Entity Framework can’t generate data in a SQL CE database if the primary key is an identity. The best workaround I could find was to use a regular old integer for a primary key and search the model for the highest key value every time an entity is created. Not fun at all.

SQLite

It took a bit of fooling around to switch my database to SQLite. Most of the creation script ran on the database, though the types had to be changed a bit to match the database. All integers in SQLite are actually Int64 in .Net, but the Entity Framework maps them to Int32 in the model. Modifying the mapping files manually is annoying, but not hard to figure out.

Once the tables were set up and working, it was easy to copy the data over. The database tool I’m using for SQLite can import CSV files, even the multi-byte Japanese characters came across correctly.

The only thing left is to study.

You can get a .Net library for SQLite here: https://sqlite.phxsoftware.com/

It’s able to integrate with Visual Studio, even the Express version, but it didn’t work well for me. With a good database management tool you don’t need it. I’ve been using SQLite Administrator, which you can get here: https://sqliteadmin.orbmu2k.de/