Choosing Priorities

During the summer I started a difficult but important journey to reorganize my life. The first step was organizing my daily tasks. I was successful, butbecoming productive again hascreated new issues.

When you change from trying to do everything to doing what’s most important, you need to decide what important means. Figuring this out for myself has proven difficult. I still don’t have all the answers, but I am constantly making progress.

Deciding what isn’t important has been easier. I am abandoning Themis, my attempt at an open source project. I no longer have a need for it, and it will take a lot more effort to get it finished than I originally expected. I would be delighted if someone wanted to take it on, but there isn’t enough there for it to be likely.

Writing in this blog has also been pushed down the priority list. I intend to keep going, but instead of forcing out a steady pace of content, I’ll wait until I haveideas that I have a strong desire to share.

When to Add an ORM Tool

I’m working on the code that parses VCalendar data so that it can be processed. Im copying the data I care about into a simple data structure that can represent a calendar request in any format. Any logic that interacts with calendar requests would use this internal structure. I want it to be simple, only having the stuff that I need, but I don’t want to completely re-invent the wheel either, so I will use the VCalendar format as a guideline.

Ive used this pattern of processing an established or complex data format into a simple proprietary one before with good results. It allows you to isolate the complexities of a data format in the code that processes it, keeping the rest of your code clean. Like everything, there are cases where I wouldn’t use it, but for this kind of scenario it should work well.

Now that I’m building this neutral representation, my next thought is how it will be saved to the database. I don’t need a database at this stage of the project, so part of me wants to ignore the decision until later, but the whole point of the system is to persist and interact with these types, so it’s important that I get it right. If I make simple POCO classes now, and start writing a bunch of code that uses them, I might have to change a lot of code later if I want to switch to types generated by Entity Framework. I could write my own custom code to read and write my own types from the database, then I can use any type I want without restrictions, but it would be a waste to write this plumbing code myself when an ORM can do it faster and better.

Creating a table design now wouldn’t be a simple matter either. I don’t know enough about the needs of the scheduling service to know which parts of the VCalendar format to bring over. If I try to guess now I know I’ll bring over a bunch of stuff that I don’t need, but starting with a simple table and adding fields every time I need one is no fun either. Adding a database to a system is like attaching a ball and chain, and I want to wait until I can be sure I have my model correct before I do it.

It’s need to make a decision, so here it is: I’m going to keep building my own types, and try to keep them. When it comes time to hook up a database, I’ll play around with the new POCO support in Entity Framework 4, and if that fails I’ll try another ORM tool. I may need to change my model a bit to suit the ORM, but Im hoping that it wont need to change much.

Invitations and the VCard Format

My next goal for the Themis project is to parse an invitation from an email. I am starting with invitations generated by MS Outlook because that’s my target audience, but a peak inside of a Google Calendar invitation gives me hope that I’ll be able to support multiple calendars without much trouble.

Outlook invitations are sent in the VCalendar format, content type “text/calendar”. The standard was published as RFC 2445 in 1998. It describes a standard layout for calendar data in the VCard format, which is described in RFC 2425.

VCard is a simple text based format with a nested structure that’s similar to XML. It uses colons and semicolons instead of angle brackets, supports attributes, and has standard representations for several common data types. Here is an example of the same data represented in both formats (content from RFC 2425):

An example of VCard data beside an XML representation of the same data.

I looked around for a library to do the job,but decided to build my own. Writing this kind of code isn’t a good way to deliver business value, but it sure is fun. A bit of time with a white board and a couple of evenings coding, and I have a mostly functional VCard parser ready to go.

When I am setting out to build something like a parser, I like to consider the designs of other established libraries that perform a similar function. As I’ve already mentioned, VCard is a lot like XML in structure, so XML parsers are a great place to look for inspiration. I built a structure that loosely mimics the XmlDocument model, an object tree that holds the entire document in memory. It won’t perform as fast as using something like an XmlReader, but it makes it easier to handle variations in document format with polymorphism. Since the VCard documents I’m processing should never be too big, it won’t be much of a performance burden anyway.

The object structure that holds VCard data looks like this:

A UML diagram showing the class structure for holding VCard data.

At firstedI wanted to have a property like Value, but there is no reliable way to know the type of a VCard value without knowing the structure of the document, and at this level I don’t want to be coupled to the structure of the document. Instead, I will be adding a series of methods to get and set the EscapedValue as a specific type.

The VCardSimpleValue class was a late addition to the model. I needed a way to hold parameter values (the equivalent of attributes in XML), but since they can’t have parameters of their own I didn’t want to pick up the Parameters collection. I also considered making a type seperate from VCardValue for the parameters, but both these classes will need the ability to read and write the escaped value, and I don’t want to write that code twice.

I’ve added the first unit tests to the solution, checking the parser against a number of examples in the specifications and real snippets extracted from emails sent by Outlook. I’ve also added tests for a number of failure cases in the parser such as groups without an ending or values lines without a value delimiter.

My next addition will be parsers for the value types and a stub in the test harness that replies to emails with some info about the original request.