One of the features that are provided with the Windows Azure cloud computing platform, is the Windows Azure caching service, of which we will cover the basics in this post and show you how to set up the Windows Azure cache cluster.
1. Using memory caching to improve performance
By caching the information in memory and retrieving it from memory on subsequent request, we enhance performance greatly because we do not have to retrieve the information from disk every time, which is a lot slower then retrieving it from memory. Data that is being requested often and is not subject to fast change, is ideal to be cached in memory. The first retrieval the information is being retrieved from disk or database and after first retrieval we store the data in the memory cache. On the next request, we look if the data is available in cache and if it is, we serve it directly from the memory which is a lot faster.
You can get an idea of how big the difference in some cases could be by using memory retrieval instead of using disk retrieval.
The higher in the piramid, the faster the response will be. As you can see, using Registers and Level 1 and 2 Cache are insanely fast. We are talking about nanoseconds here … 1 nanosecond is 1 billionth of a second. I mean it should be obvious that that is lightning fast. However Level 1 cache (8KB to 64KB) is being situated on the processor itself and Level 2 cache (64KB to 2MB) is usually found on the motherboard. The Level 1 and 2 Cache are being used by the processor to be able to operate fast enough.
If you go down the pyramid, you will notice that the speed difference between Main Memory and a Disk is quite large. The difference between nanoseconds and milliseconds does not mean much to us, but if you know 1 millisecond equals 1.000.000 nanoseconds … well it’s pretty obvious the difference can be large, even if a time-span like that is unmeaning to a human being.
Apparently there is a difference between sequential access and random access, but let’s just assume that reading from memory is a lot faster then reading from disk. Reading from main memory can be a lot faster then reading from disk, and that’s what it’s about.
If you’re into the topic of memory management, parallelism and processor workings … and you actually want to know in depth why:
2. Why using the Windows Azure Caching Service
Windows Azure Caching Service information quoted directly from MSDN:
Windows Azure Caching enables you to easily provision a cache in the cloud, to be used from any applications or services that could benefit from caching. This includes a very common scenario of session state and output caching in ASP.NET. Caching increases performance by temporarily storing information from other backend sources. High performance is achieved by maintaining this cache in-memory across multiple cache servers. For a Windows Azure solution, caching can reduce the costs associated with database transactions in the cloud.
Windows Azure Caching is designed to be used with Windows Azure applications hosted in the cloud. This achieves the best throughput at the lowest latency. It is possible to test on-premises code that accesses a Windows Azure cache, but this design is not supported for production or valid for stress testing. On-premises applications can instead rely on an on-premises cache cluster that uses Windows Server AppFabric.
It’s quite important to notice that Windows Azure Caching is designed to be used with Windows Azure applications hosted in the cloud. Using the Windows Azure caching service as a caching mechanism for on-premise solutions is pointless. If you are looking for a caching service for your on-premise solutions, look for Windows Server AppFabric Caching Service. However when building windows azure cloud solutions, we will also need the possibility to cache information to improve performance of our cloud solutions and that is why Windows Azure Caching Service is for. The Windows Azure caching service is based on the Windows Server AppFabric Caching. There are some differences between both though, and you can find them here : http://msdn.microsoft.com/en-us/library/windowsazure/gg185678.aspx
Some of the features of the Windows Azure caching service:
- Usable from any .NET application hosted in Windows Azure.
- Provides automatic management of the caching infrastructure in the cloud.
- Provides ASP.NET session state and output caching providers.
- Configurable through application configuration files or programmatically.
- Provides an API that uses the familiar caching concept of key-value pairs.
- Usable with optimistic or pessimistic concurrency.
- Supports the local cache feature for additional performance and scalability benefits.
- Supports compression of cached objects.
Since using Windows Azure provides a scalable and elastic platform for our services, using the windows azure caching service could come in handy, for example for storing ASP.NET session state. Your service instances reside behind the non-sticky Windows Azure load balancer:
That means if you have more then 1 instance running for your hosted service (which you should for fail-over) that session state can not be saved on the machine itself, because the next request might end up at the other instance. The Windows Azure load balancer is stateless and routes the requests round-robin to the available instances of your hosted service. That means you need to store the session state it in a location that is accessible for all your instances. Windows Azure caching service could be a solution for storing your session state, just as using a SQL Azure database would be.
3. Getting started with the Windows Azure Caching Service
To be able to work with the Windows Azure caching service, you will need to create a namespace with caching enabled in the Windows Azure portal. Go to the Service Bus, Access Control & Caching menu:
Select the Caching service under the available services:
Hit the New Namespace button to create a new namespace which we will be using only for caching purposes:
The default size is 128MB Cache size. It’s important to not pick more of a cache size then what you would need. Caching size quota’s are pretty expensive. You can find more information about the cache prizing here: https://www.windowsazure.com/en-us/pricing/calculator/advanced/
At the time of this writing, the prices are as following:
- 128 MB cache: $45 / mo
- 256 MB cache: $55 / mo
- 512 MB cache: $75 / mo
- 1024 MB cache: $110 / mo
- 2048 MB cache: $180 / mo
- 4096 MB cache: $325 / mo
As you can notice, just as with the other Windows Azure services, the lowest provision is the most expensive one. The more you use, the cheaper it gets. We will take the 128 MB cache. After the namespace is created, you’ll be able to select it and the cache operations for the caching service will be available:
If you click the Change Cache Size in the top menu for the namespace caching service, you can change the cache size:
If you click the View Client Configuration you will get a screen that provides you with the configuration that you will need at your application configuration that uses the Windows Azure caching service:
If you click on the namespace, at the right side, you will be able to see the namespace properties for the caching service:
4. Integrating the Windows Azure caching service in our application
As explained in chapter 2, the Windows Azure Caching Service is designed to be used with hosted services in the cloud. So we will create an MVC web application that will we will host in Windows Azure.
- Create new project. Select a Windows Azure Project under the Cloud projects
- Add a ASP.NET MVC 4 Web Role to the Windows Azure solution
- Select Internet Application as the project template for our ASP.NET MVC4 web application
Instead of copying the default configuration to our web application so it can use the Windows Azure caching service, one of the possibilities to also do it is doing it by using the NuGet package manager:
Install the Windows Azure Caching package:
After you installed the package, the web.config our web application will be updated:
It also adds a piece of configuration to save the output cache to the distributed caching service:
It also adds a piece of configuration to save the session state to the distributed caching service:
If you do not want to store the output cache or the session state to the Windows Azure caching service, simply put these 2 configuration pieces in comment. We will come back to these in chapter 8.
Let’s update the DataCacheClient in our configuration so it contains the endpoint and the token:
You can find the caching service endpoint in the properties of the namespace for the caching service:
The ACS encrypted token can be found in the properties as well:
The ACS encrypted token is what is needed to be able to access the Windows Azure caching service. If you do not provide this token, access will be denied. This prevents unauthorized users to connect to your cache cluster obviously. You can change this authentication token by using the access control service on the caching service, but that’s out of the scope of this post now.
One thing you might need to pay attention to, is that you have these 2 namespaces referenced in your web application:
When I first deployed my azure solution and tried to use the Windows Azure caching service, I kept getting errors that the type initializer failed … I solved the issue by adding a reference to both these namespaces and setting Copy Local to true. These are the required namespaces:
If you are using caching in a web application and you need to store session state and output cache to the Windows Azure cache cluster:
5. Using the CacheDataFactory and reading & writing from and to the cache
I added 2 custom classes in my MVC web application:
The Caching class looks like this:
You will need to have an import of the Microsoft.ApplicationServer.Caching namespace. We have a static singleton field of the DataCacheFactory. The DataCacheFactory is apparently an expensive thing to create, so it’s optimal to only create a single instance of it. We have 1 public operation which returns the DataCache, which you get by using the DataCacheFactory.GetDefaultCache method. Windows Server AppFabric caching allows you to have multiple different named caches. Windows Azure caching service does only allow 1 cache, which you get back by the GetDefaultCache method.
To create the DataCacheFactory I pass along a DataCacheFactoryConfiguration option with the name of the configuration you want to load. You can also use the parameterless constructor of the DataCacheFactory. If you use the parameterless constructor, it attempts to load by default the DataCacheClient configuration with the name “default”. If you want to load a DataCacheClient configuration with another name, you will need to use the DataCacheFactoryConfiguration and pass the name of the configuration you want to load. A few screenshots back you can see our DataCacheClient configurations and their names in our configuration file.
Our Storage class looks like this:
It has a static field which holds the CloudStorageAccount that is defined with my credentials. I have 1 public method which is to retrieve a CloudBlob from a certain container with a certain name. Pretty straight forward code. In my Cloud Storage Explorer I uploaded a single image called azure.jpg, which is 3MB large:
In my HomeController I added some basic logic to retrieve the image from blob storage and to retrieve it from the distributed cache:
We have a LoadImage method which takes a boolean. If the boolean is false, we load the image as a byte array from blob storage. If the boolean is true, we load the image from the distributed cache. If you work with a caching system, you will usually always find the same structure of code back:
- We try to get an item with a specified key from the cache
- We check if the item retrieved from the cache exists.
- In case the data is not existing, we get the data from it’s original location, like disk or database. After we retrieved it, we store the data in the cache and return the data we retrieved.
- In case the data is existing, we simply return the result directly.
Basically that’s what you see in the GetCachedImage method. The Logic.Caching.GetCache operation returns the DataCache, on which we can execute certain operations:
- Get: Allows you to get an item from the cache with a specified key
- Add: Allows you to add an item to the cache if it does not exist yet. If an item already exists with the same key, an error will be thrown
- Put: Allows you to add an item to the cache. If the item already exists, it gets overwritten with the current data you are writing to the cache
- Remove: Allows you to remove an item with a specified key from the cache
- GetCacheItem: Allows you to retrieve an item from the cache with it’s metadata and version. The item is being returned as a DataCacheItem
You have a bunch of asynchronous and overload variants on those operations.
One thing to notice is that you Put an item in the cache by providing a key for the item and providing the data you want to store in the cache. One of the overload methods is that you can also specify a TimeSpan on how long the item can stay in the cache. In my case I specified 30 seconds, so that means 30 seconds after I added the item to the cache, it will expire. Next retrieval there will be an extra cost again to retrieve the item from the cache, since the item will have to be retrieved from blob storage and put into the cache again. If you do not specify a TimeSpan for expiration, the default value is 48 hours. Be consistent and specify a expiration timespan when adding data to your Windows Azure cache cluster.
Then we have some simple code in our Index controller to get both the image from blob storage and to get the image from the caching service and to time how long it takes to retrieve it from both. The retrieval time and the byte array size is being returned to the view, which will show the values for us:
As a side note, I store a byte array into the cache cluster, but you can store any kind of data in the cache, as long it is serializable.
We deploy our cloud project to staging environment:
And when we visit the public endpoint of our website we can test the time it takes to retrieve the same image from blob storage and to retrieve it from the caching service. One of the random picked results:
However there might be quite some variation in the time to retrieve the information:
But surprisingly the difference might not be that big sometimes, but the cache data is being retrieved faster then from the blob storage. Every time we retrieve an item from the cache at the moment, we retrieve the item from the central distributed cache cluster, which is being managed by the Windows Azure platform. This means for every request to the distributed cache cluster, we are making an out-of-machine request which is bringing latency to the request. Even though it is being retrieved from memory from the distributed cache, the difference with blob storage is not that big because apparently reading data from blob storage is also quite fast.
However there is also something like local cache available with the Windows Azure platform. You can enable to local cache for your DataCacheClient like this:
By default, the LocalCache is not enabled. You need to set isEnabled to true to be able to use the local cache. Some of the other properties you can set:
- objectCount : Allows you to specify the maximum of locally cached items. The default value is 10.000
- tllValue: Allows you to specify how many seconds the item can remain in the local cache. The default value is 300
However there is a reason that the local cache is being disabled by default. When a client enables the local cache, an in-memory cache is being created in the same memory space where the client is running in. With other words, the local cache will be allocated in the memory of your windows azure instance. Now this is a great thing since this will improve performance dramatically, since we do not have to request the cached data to the central caching cluster every time, since there is a latency penalty being paid for connecting to the out-of-machine cache cluster. The local cache is on the same memory as our hosted service, so this is a huge performance. Let’s see some numbers:
And another result:
Now I guess it’s pretty obvious how much of a difference enabling the local cache can make. However we do not know how much of a millisecond loading the item from the local cache takes, since it’s simply written out at 0. Now let’s calculate the real stuff by using a more precise measurement. We will use the Ticks on the Stopwatch instead of the Milliseconds, which are not enough anymore to measure:
One of the average results:
If you calculate this out, loading the item from the local cache is about 3020 times faster then loading the item from blob storage.
However there are also a drawbacks to using local cache which you should be aware off. Assume the following scenario where we have 2 Windows Azure instances running which have local cache enabled:
- Client 1 stores an object with a specified key in the cache. The object is being stored in the local cache and it is being stored in the distributed cache as well.
- Client 2 retrieves the cached object with the specified key from the Windows Azure cache. Since the local cache of client 2 does not hold the object, it is being retrieved from the distributed cache. After retrieval, the item is also being stored in the local cache of client 2
- Client 1 retrieves the cache object with the specified key. Because it is in the local cache of client 1, it is being returned from the local cache of client 1
And now things are starting to get messy when the following happens:
- Client 1 updates the retrieved object with the specified key and stores it again in the cache. The item is being updated in the local cache of client 1 and it is also being updated in the distributed cache cluster. However client 2 still has this object with the old values in the local cache of client 2. When client 2 retrieves the object with the specified key, it will find the object in the local cache of client 2 and return it from the local cache in client 2. Client 2 will get an object returned from it’s local cache that is not up to date anymore with the distributed cache
- Client 1 removed the retrieved object with the specified key from the cache. It is being removed from the local cache from client 1 and it is being removed from the distributed cache client. However client 2 still has this object with the old values in the local cache of client 2. When client 2 retrieves the object with the specified key, it will find the object in the local cache of client 2 and return it from the local cache in client 2. That basically means client 2 retrieved an object for local cache that is actually deleted in the distributed cache !! If client 2 decides to update the local object and store it in the cache again, it’s going to be stored in the distributed cache cluster again, even though client 1 just recently removed it.
Local cache is amazing to improve performance of your application, but you need to be aware of the inconsistencies that can exist. You need to be cautious with the TtlValue you set for your local cache. The Windows Azure local cache only supports timeout based invalidation policy. The Windows Server AppFabric caching supports notifications based synchronization, which means notifications can be used to invalidate the local cache items. I do not whether they will add support for this in the future to the Windows Azure caching service.
One thing you might want to keep in mind is that the item you store in the cache can be maximum 8 MB large. Cache is not ideal to save large files at, because you would run out of memory quite easily and memory is an expensive thing.
6. Using optimistic concurrency when saving data to the Azure caching service
As with Entity Framework, Windows Azure storage and so forth, you can use optimistic concurrency to provide a mechanism against concurrency. Optimistic concurrency mechanism comes in to avoid the following issue:
- You load an item from the cache
- You change some values on the item
- Another user on another location updates the item with some new values
- You save the changed item
When you save your updated item, the changes that were made by the other user are being overwritten with your local values and the changes that were made by the other user are lost. To prevent this for happening, you can use optimistic concurrency checks through the DataCacheItemVersion:
There isn’t much difference then when you would be working with the DataCache without concurrency checking. The only difference is that to use optimistic concurrency you need to use the DataCacheItemVersion. Instead of retrieving the data directly from the datacache, you retrieve a DataCacheItem through the GetCacheItem(key) operation. The DataCacheItem is basically a wrapper around the data you store and you are able to retrieve the data within the wrapper this way. To retrieve the actual value you use the DataCacheItem.Value and to retrieve the current DataCacheItemVersion you use the DataCacheItem.Version. The DataCacheItemVersion is the reason why we get our data as DataCacheItem’s, because when we want to update the item, we need to pass along the DataCacheItemVersion.
Suppose we run the same scenario as before:
- You load an item from the cache
- You change some values on the item
- Another user on another location updates the item with some new values
- You save the changed item
Which will now result in:
- You load an item from the cache as a DataCacheItem and get the version and value of the DataCacheItem (The version is 4)
- You change some value on the item
- Another user on another location updates the item with some new values (The version gets updated to 5 in the cache)
- You save the changed item as pass along the DataCacheItemVersion, which is 4.
When trying to save the item in the cache, it will notice that the version of the item in the cache is not the same anymore as the version you provided, which means someone else already updated the item since you retrieved it. An exception will be thrown and you will need to handle the exception. To handle the exceptions thrown on concurrency conflicts, you will need to catch the DataCacheException, which has a SubStatus of type DataCacheErrorSubStatus which is an enumeration of possible errors with the DataCacheException.
Trying to update an item that already has been updated since you retrieved it, will generate an error like this:
One important note to add is that you need to mark your classes as serializable to be able to serialize them and put them in your cache cluster. If they are not marked, they will not be serialized. You can use the Serializable attribute or work with the DataContract and DataMember attributes. You will need to have a reference to System.Runtime.Serialization:
7. Configuring connection pooling for the DataCacheFactory
One of the options you can set on the DataCacheClient is the MaxConnectionsToServer, which allows you to define how many connections are possible to the cache cluster from the DataCacheFactory instance. Configuring the connection pooling through the configuration file:
By default, the maxConnectionsToServer is set to 1, which means that the DataCacheFactory object uses 1 connection to work with the DataCache. If you set the connections count to 2, the DataCacheFactory object will be able to use 2 connections. That means if you have 2 DataCacheFactory instances created, a total of 4 connections will be used. That’s why it’s advised to use the DataCacheFactory is a singleton, so only 1 DataCacheFactory instance can be created.
However when you have a hosted service running in Windows Azure that is running over 4 instances, that would mean that you have 4 DataCacheFactory instances up, which both will be using 2 connections, which would result in a total of 8 connections being used.
You can disable connection pooling if you would wish to do so by the useConnectionPool setting:
Why is this important ? Because the amount of connections that you can maximum open with your Windows Azure caching cluster are limited and can be found in the following table:
If you use your Windows Azure caching cluster in multiple applications, you’ll have to watch out to not reach this amount of concurrent connections. If you use the same caching service namespace for 4 hosted web applications, which each have 2 running instances, which all use the default MaxConnectionsToServer, which is 1 … you end up being at 8 connections already. And that is if we assume that you made sure your DataCacheFactory is a singleton so only 1 can be created in your application. It’s a common thing to have multiple applications working on the same distributed cache since they share the same information and can use the same cache for performance. However with the Windows Azure caching service you need to keep an eye on the amount of concurrent connections to not surpass them.
8. Windows Azure session state and output cache provider for ASP.NET
In chapter 2, I talked about having multiple Windows Azure instances for a hosted service, which are sitting behind a non-sticky firewall, which is routing the traffic round-robin. This means you will have an issue with session state. Your session state will be created at instance X, while the subsequent request of the user will be routed to instance Y, where the state is not present … and your user might end up have an odd experience or your service might end up crashing.
The great thing is, the session state provider for the windows azure caching is being added by default when you add the Windows Azure Caching service package through NuGet:
You can get this configuration as well from the Windows Azure portal by going to the namespace for the Windows Azure Caching Service and use the View Client Configuration.
The Windows Azure session state provider has the following improvements over past ASP.NET session state providers:
- It internally uses the NetDataContractSerializer class for session state serialization.
- It can share session state among different ASP.NET applications.
- It supports concurrent access to same set of session state for multiple readers and a single writer.
- It can use compression.
There is also a Windows Azure output cache provider, when you want to use output cache. The default configuration also gets added by the NuGet package or you can copy it from the Windows Azure portal through View Client Configuration:
The Windows Azure output cache provider has the following benefits of storing output cache out-of-process:
- Developers can store larger amounts of output cache data because the only effective upper boundary on the quantity of data that can be cached is the cumulative amount of memory available to the Windows Azure cache cluster.
- Output cache data is not lost when a web application is recycled. Since the output cache data is stored externally outside of the IIS worker process, output cache data survives ASP.NET application restarts.
- It can use compression.
Other then that there is not much to discuss.If you have a web application and you are using session state or output cache, then you use these preconfigured bits that come with the configuration. If you do not need them, just comment them out.
9. Handling large cache items and request timeouts
One common thing you might run into is that you might sometimes get operation timeouts, especially if you start working with larger data cache items, for example:
The connection was terminated, possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown. —> System.TimeoutException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:00:40.
In case that the request operations can not complete within the default timeout, you’ll start receiving these errors. This can be easily solved like this:
Some properties which you can configure on the DataCacheFactoryConfiguration:
- ChannelOpenTimeout: Gets or sets the length of time that the cache client waits to establish a network connection with the server.
- MaxConnectionsToServer: Specifies the maximum number of channels to open to the cache cluster.
- RequestTimeout: Gets or sets the length of time that the cache client waits for a response from the server for each request.
- TransportProperties: Allows you to configure some transport level settings, like maximum buffer size, connection buffer size and the ReceiveTimeout
Windows Azure caching service is pretty straight forward to use. It contains preconfigured providers for ASP.NET applications and provides you with the default configuration you need to implement it in your cloud service.
There’s one thing you might want to be aware of about and that is the Quota Hours. Windows Azure caching service uses resources like memory, processor and network, which is what they call the Quota Hours:
Some final remarks:
- Objects are being serialized before they are being put to the cache. That’s why you need the serialization attributes.
- Serialized objects are always larger then the actual raw data, because a wrapper is being added to the raw data and some metadata is being added, like the versioning, size, key and timeout , which are to the size of the object put in the cache
- The maximum size of a serialized object, including wrapper and metadata, is 8 MB.
- When you reach the maximum size of the cache, some cached items will be evicted without any exceptions being thrown.
Some of the features I missed:
- There is no operation to clear the entire Windows Azure cache at once
- The notifications based synchronization for the local cache is a very big missing feature to avoid inconsistencies in the local cache. You can use the local cache for read-only data, but with data that can be changed you need to be careful!
Any suggestions, remarks or improvements are always welcome.
If you found this information useful, make sure to support me by leaving a comment.
Cheers and have fun,