Quantcast
Channel: Mart's Sitecore Art
Viewing all 81 articles
Browse latest View live

Sitecore SQL Session State Provider: What you need to know

$
0
0

Background 

While working with Sitecore Support to troubleshoot a SQL session issue that we encountered on a high-traffic, scaled,  Sitecore environment running Sitecore 8.1 Update 2, we discovered that the root cause of the issue was a connection leaking bug in the SessionStateStoreProvider that causes unnecessary load on SQL server making it unresponsive.

The purpose of this post is to arm you with the information that you need to implement a stable SQL Session State in your Sitecore deployment.

Symptoms 

When the issue / outage occurred, the exceptions in the Sitecore logs where the following:

 Exception: System.Data.SqlClient.SqlException  
Message: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - No such host is known.)
Source: .Net SqlClient Data Provider
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.Open()
at System.Data.SqlClient.SqlConnection.Open()
at Sitecore.SessionProvider.Sql.SqlSessionStateStore.UpdateItemExpiration(Guid application, String id)
at Sitecore.SessionProvider.Sql.SqlSessionStateProvider.ResetItemTimeout(HttpContext context, String id)
at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
Nested Exception
Exception: System.ComponentModel.Win32Exception
Message: No such host is known
Our New Relic Application monitoring system reported that the GetExpiredItemExclusive SQL stored procedure that was running against the session database was the most time consuming and was responsible for the highest throughput between our Sitecore and SQL server.

We discovered that once the execution of the stored procedure got above the 10,000 calls per minute range, the application would start having trouble and would eventually stop responding.

Root Cause

It was determined that .NET was naturally creating a number of session state provider objects, and during high traffic periods, the number got so large that it caused too much load on the SQL server and eventually caused the application to stop responding.

Stabilizing Session State

The Patch 

Sitecore Support issued us with a patch and noted that the issue was fixed in 8.2 Update 2 on. 

From a high level, the change involved Sitecore using their own factory and creating session state objects manually. 

The issue was registered as bug #98800. It's important to note that all prior versions will require a ticket to request the patch. 

We implemented the patch by following these steps:
1) Put the attached 'Sitecore.Support.98800.dll' assembly to the /bin folder of the website.
2) Changed session state provider type from
type="Sitecore.SessionProvider.Sql.SqlSessionStateProvider, Sitecore.SessionProvider.Sql"
to
type="Sitecore.Support.SessionProvider.Sql.SqlSessionStateProvider, Sitecore.Support.98800"

The change was made in both the Web.Config and Sitecore.Analytics.Tracking.config

Not Stable Yet

About 3 days later, our site was brought down to it's knees again. New Relic showed that the GetExpiredItemExclusive SQL stored procedure calls were well above the 10,000 calls per minute range.

Configuration Update

Working with Sitecore Support again, we increased the Session Provider polling interval from the default 2 seconds to 60 seconds and also increased the SQL connection timeout to 300 seconds.

The polling interval is basically the number of seconds to check the Session database for expired sessions. Under the covers, this would execute the GetExpiredItemExclusive SQL stored procedure.

The final configurations looked like these:

Sitecore.Analytics.Tracking.config

<add  
name="mssql"
type="Sitecore.Support.SessionProvider.Sql.SqlSessionStateProvider, Sitecore.Support.98800"
connectionStringName="session"
pollingInterval="60"
compression="true"
sessionType="shared"/>
Web.Config

<add   
name="mssql"
type="Sitecore.Support.SessionProvider.Sql.SqlSessionStateProvider, Sitecore.Support.98800"
connectionStringName="session"
pollingInterval="60"
compression="true"
sessionType="private"/>
ConnectionStrings.config

<add name="session"
connectionString="user id=xxx;password=xxx;Data Source=xxx,1433;Database=Sessions;MultiSubnetFailover=TRUE; Connection Timeout=300" />

Status

With the patch in place, and the final configuration updates, the application has been stable and has survived extremely high traffic days. 

An example of one of these days: 40,000 requests per minute, 7500 simultaneous users and 142,000 page views per hour.

Takeaways

If you intend to use SQL Session State for your Sitecore implementation, and are running a version of Sitecore prior to 8.2 Update 2, you need to create a ticket with Sitecore support to request the patch.

After this, it's critical that you increase your polling interval configuration from the default 2 seconds to something higher like we did. 60 seconds seems to be the perfect number.

If you have any questions, feel free to submit a comment and I will help you out to the best of my knowledge about this issue.

Sitecore xDB - Adding Custom Data to Outcomes and Using it for Personalization

$
0
0
In a recent Digital Strategist MVP Webinar hosted by Chris Nash, Chris made a quick mention of a custom values property on outcomes that was available to developers to store custom data associated with the outcome.

Having not known about this Easter egg prior, I looked into it straight away.

My research revealed that there wasn't any documentation or an example on the web, so I thought that I would take the opportunity to demonstrate the usage in a real-world implementation.

Use Case

One of the objectives in our Xccelerate roadmap was the ability to personalize based on a contact's previous purchase. So for example; "If a visitor has purchased a product in the last 10 days, let's show them a CTA of a related product". The obvious objective was to drive the contact directly into the purchase funnel, increasing the possibility of another conversion.

In our configuration, we had already created a Purchase outcome item and were capturing monetary value, so it was just a matter of attaching the purchased line items to the outcome and building a new condition.

It is important to note that the "Monetary Value Applicable" checkbox must be checked on the Outcome item in Sitecore so that the values will show up in the various reporting dashboards.



Adding Custom Data to the Outcome

The code to achieve this is pretty straightforward. In our case, I added the logic to a location where I had a list of line items available that a visitor had just purchased.

The method accepts a monetary value and a list of key value pairs containing data that I would attach to my registered outcome. In my usage, I stored all the product SKUs, along with the quantity purchased for each line.

1:            public void RecordPurchaseOutcome(decimal monetaryValue, List<KeyValuePair<string, string>> customValues = null)  
2: {
3: var id = ID.NewID;
4: var interactionId = ID.Parse(Tracker.Current.Interaction.InteractionId);
5: var contactId = ID.Parse(Tracker.Current.Contact.ContactId);
6: var definitionId = new ID(ItemConstants.ProductPurchaseOutcomeItem);
7:
8: var outcome = new ContactOutcome(id, definitionId, contactId)
9: {
10: DateTime = DateTime.UtcNow.Date,
11: MonetaryValue = monetaryValue,
12: InteractionId = interactionId,
13: };
14:
15: if (customValues != null && customValues.Any())
16: {
17: foreach (var customValue in customValues)
18: {
19: outcome.CustomValues[customValue.Key.ToUpperInvariant()] = customValue.Value;
20: }
21: }

22:
23: Tracker.Current.RegisterContactOutcome(outcome);
24: }

The id of the outcome item on line 6 displayed as ItemConstants.ProductPurchaseOutcomeItem is unique to my implementation.

After a contact's session ended, the custom data was stored in the MongoDB Outcome collection as follows:


Personalization using the Outcome's Custom Data

The final piece of the puzzle was to build the personalization condition that could pull out the custom data from the contact's recorded outcome, based on a time frame.

I created the condition item in the Outcomes element folder at this location: /sitecore/system/Settings/Rules/Definitions/Elements/Outcomes.

The rule text was set to the following:

 where the current contact has registered the [OutcomeId,Tree,root=/sitecore/system/Marketing Control Panel/Outcomes,specific] outcome with a custom data key that is case-insensitively equal to [CustomData,,,value] within the last [days,Integer,,number] day(s)  



This is the code that powered the condition, accessing the recorded custom data in the contact's outcome, and determined if it fell within a day range:

1:    public class CustomDataOutcomeRegisteredWithinLastDaysCondition<T> : WhenCondition<T> where T : RuleContext  
2: {
3: private OutcomeManager _outcomeManager;
4:
5: public string OutcomeId { get; set; }
6: public string Days { get; set; }
7: public string CustomData { get; set; }
8:
9: protected override bool Execute(T ruleContext)
10: {
11: Assert.ArgumentNotNull(ruleContext, "ruleContext");
12: Assert.IsNotNull(Tracker.Current, "Tracker.Current is not initialized");
13: Assert.IsNotNull(Tracker.Current.Session, "Tracker.Current.Session is not initialized");
14:
15: Guid result;
16: if (!Guid.TryParse(OutcomeId, out result))
17: {
18: Log.Debug(string.Format("Specified outcome [{0}] was not a valid Guid", OutcomeId));
19: return false;
20: }
21:
22: _outcomeManager = Factory.CreateObject("outcome/outcomeManager", true) as OutcomeManager;
23:
24: if (_outcomeManager != null)
25: {
26: int pastDays;
27: var validDays = int.TryParse(Days, out pastDays);
28:
29: if (!validDays)
30: {
31: return false;
32: }
33:
34: var targetDate = DateTime.Today.AddDays(-pastDays);
35: var pastOutcomes = _outcomeManager.GetForEntity<IOutcome>(Tracker.Current.Contact.ContactId.ToID(), result.ToID());
36:
37: foreach (var outcome in pastOutcomes)
38: {
39: if (outcome.DateTime >= targetDate)
40: {
41: return outcome.CustomValues[CustomData.ToUpperInvariant()] != null;
42: }
43: }
44:
45: return false;
46: }
47:
48: return false;
49: }
50: }

After my rule and code were added to Sitecore, I was able to apply the freshly minted condition to my component:



The end result was the ability to personalize based on a previous purchase that a contact had made within the last x number of days.

In my example, I personalized the content of my component if the contact had purchased a Large, Hot Penne Pasta and Meatballs Tray within the last 10 days.

Sitecore xDB: Goal Conversion Sweet Spot Demo

$
0
0

Background

This question seems to keep coming up during client engagements:

Where and how can we get the goals that our visitor's have converted during their interaction with our website so that we can send this information to our external (CRM) system?

I decided to put together a small, working example to help answer this question.

Purpose

The purpose of this code is to demonstrate how to obtain goals that were triggered during a visitor's interaction, after their session has ended.

So, why is this useful?

This is a useful sweet spot as this data can be sent to an external system where it can be used to help marketers by informing them of what customer’s and leads are doing on their website.

Full source code is available from my GitHub repository: https://github.com/martinrayenglish/GoalConversions.Demo





Sitecore xDB: A Mechanic's Guide to Personalization Testing Troubleshooting

$
0
0

Background

On my last few projects, I have experienced first-hand how marketers have leveraged the true power of Content Testing in the Experience Platform to truly gain some fantastic insights so that they can successfully optimize content in order to deliver an improved contextual customer experience.

Most of the tests I have experienced have been personalization-based, and I have helped various teams troubleshoot some glitches that have popped up along the way. I guess you can call me the "content testing mechanic".




In this post, I will provide some insight from my experiences to help other developers who face similar issues, get the issues resolved quickly.

Nuts and Bolts

The main entry point into content testing for users is workflow, and so the assumption is that you have some type of workflow in place to successfully launch tests.

For more information on this, please review Sitecore's documentation on Adding content testing to a workflow  as well as Jonathan Robbins' Sitecore 8 Content Testing post.

With a workflow in place, the following things happen under the covers when you launch a new test:

  • A new test item is created that contains all the information about the test. This can be found at this location: /sitecore/system/Marketing Control Panel/Test Lab.
  • The Final Renderings XML field will be updated with specific testing attributes that contain values with test reference information*.
  • The item that you are testing and the new test item will be published to the web database (based on workflow action).

* To view this, you will need to enable raw values and standard fields in the “View” section of the “View” tab

An example of the Final Renderings XML looks something like the below:


Looking at this XML, you will see that it contains a p:t attribute. This denotes a personalization test reference. More on this further down.

Springs that Pop Out

If after starting a personalization test, personalization on the component(s) that you are testing stops working and you don't see any data appearing in your Test Result dialogue, the most common error in your Sitecore logs will be the following "Evaluation of condition failed. Rule item ID: Unknown, condition item ID exception":

 ERROR Evaluation of condition failed. Rule item ID: Unknown, condition item ID: {4888ABBB-F17D-4485-B14B-842413F88732}  
Exception: System.NullReferenceException
Message: Object reference not set to an instance of an object.
Source: Sitecore.ContentTesting
at Sitecore.ContentTesting.Pipelines.RenderingRuleEvaluated.TestingRule.Process(RenderingRuleEvaluatedArgs args)
at (Object , Object[] )
at Sitecore.Pipelines.CorePipeline.Run(PipelineArgs args)
at Sitecore.Rules.RuleList`1.Run(T ruleContext, Boolean stopOnFirstMatching, Int32& executedRulesCount)


At first glance, you might think that this error is simply because of the condition item with ID {4888ABBB-F17D-4485-B14B-842413F88732} that is not published to the web database.

 Unfortunately, this isn't the case.

Under the Hood

After working through Sitecore's testing code, the exception occurs while Sitecore evaluates one of the conditions associated with the personalization rule.

Sitecore.ContentTesting.Pipelines.RenderingRuleEvaluated.TestingRule                 Sitecore.Rules.Evaluate


The invocation that fails and causes the NullReferenceException is the 'rule.Condition.Evaluate(ruleContext)' where:

rule - is the Rule object instantiated from the rule XML definition ( from the presentation details )
Condition - is the root condition definition from the rule XML definitions
ruleContext - is the object containing additional data for the rule evaluation such as:
  • Item reference: this should be the page definition item. 
  • Test reference: this should be the test associated with the rendering.
  • The current MVC rendering object reference

My analysis determined that the most common cause of the error is due to old tests that are still part of the content item's configuration, that are either not stopped correctly, inactive or have been removed.

Fixing the Issue

The fix is to remove the bad/old test references from the item in question's Final Renderings XML field. 

My process to do this is the following:

  • Determine what item is throwing the testing exception.
  • Enable raw values and standard fields in the “View” section of the “View” tab.
  • Copy the Final Renderings XML value of the item and format it so that it is easy to read. This site does a nice job: https://www.freeformatter.com/xml-formatter.html
  • Paste you’re the XML into Visual Studio or another editor.
  • Locate the attributes in the XML that have a s:pt and remove the attributes.
  • Copy and paste the updated XML back into the item's Final Renderings field.
  • Save and publish.

After this, the errors will stop appearing in your logs. You will however need to launch your test again.

Final Gotcha

The exact same"Evaluation of condition failed. Rule item ID: Unknown, condition item ID"exception error mentioned above could also occur if content testing has been disabled.

In XP 8.1 and later, it is disabled when the ContentTesting.AutomaticContentTesting.Enabled setting is set to false in the App_Config\Include\ContentTesting\Sitecore.ContentTesting.config file.

This is a bit obscure, as one would think that there would be some other messaging in the logs indicating that this setting has been disabled.

Testing Diagnostics Page for your Toolbox

I have seen some cases where the ribbon of the Optimization tab shows a different number of active tests if compared to the active test list. An example of this is shown below:



One of the first things that you can try is to rebuild your sitecore_testing_index index. If this doesn't help, you can use the diagnostics page below to help troubleshoot the issue.

The page output will look similar to this:

~~~~~~~~~~~~~~

Active filtered tests:
Test search result object checked for Item: sitecore://{703EED9B-C574-4310-AC47-EBCCB651F67E}?lang=en&ver=2, Test Item: sitecore://master/{57807f6f-a836-4132-8b2b-48124b0c4031}?lang=en&ver=1, Is Running: True, Is Cancelled: False

Test search result object checked for Item: sitecore://{0CA61CF3-D35A-4FE7-8CD6-90CF8F61179A}?lang=en&ver=9, Test Item: sitecore://master/{cb771932-d06d-4f42-9392-483fab3cbc1c}?lang=en&ver=1, Is Running: True, Is Cancelled: False
Configuration is null

Test search result object checked for Item: sitecore://{8C65AEB8-90A2-4348-BCDB-D8AB6CBA5974}?lang=en&ver=1, Test Item: sitecore://master/{0b298164-fcc5-4803-acda-5a321e8c2797}?lang=en&ver=1, Is Running: True, Is Cancelled: False

1 {57807F6F-A836-4132-8B2B-48124B0C4031}
2 {0B298164-FCC5-4803-ACDA-5A321E8C2797}
Active tests:
1 sitecore://master/{57807f6f-a836-4132-8b2b-48124b0c4031}?lang=en&ver=1
2 sitecore://master/{cb771932-d06d-4f42-9392-483fab3cbc1c}?lang=en&ver=1
3 sitecore://master/{0b298164-fcc5-4803-acda-5a321e8c2797}?lang=en&ver=1

~~~~~~~~~~~~~~~

The diagnostic output example above shows us that the issue lies with the test item with ID {cb771932-d06d-4f42-9392-483fab3cbc1c}.

To fix the issue, you will need to locate the test item with that ID and either set its "Is Running" field value to "No" or simply delete that item.

Make sure that these updates get published to your web database.


Exploring Sitecore xConnect: Working with Contacts and the xConnect Client API

$
0
0

Background

As I started exploring xConnect in XP 9, one of the questions I asked myself was how the change in the xDB contact and interaction model architecture would effect my existing Sitecore xDB implementations if we decided to upgrade. 

With this in mind, the focus of this post is on the changes to contact identification and updating contacts within Sitecore context, and what you need to know when you start working with the xConnect Client API in and outside of Sitecore context.

The xConnect documentation site was my initial point of reference, along with some guidance from Jason Wilkenson's series of posts

Working with Contacts

Identifying Contacts 

In XP 7.5 - 8.x, each xDB contact could be identified using a single, unique value. 

Your code looked like this:

 Tracker.Current.Session.Identify("menglish");  
Tracker.Current.Contact.Identifiers.IdentificationLevel = ContactIdentificationLevel.Known;

This changed in XP 9, as you now need to specify the source along with a unique value when identifying the contact.

 Tracker.Current.Session.IdentifyAs("corporateweb", "menglish");  
//corporateweb is the source and menglish is the identifier.

In XP 9, each contact can have multiple identifiers and sources within the new model. The magic lies in the ability to identify and merge contacts from all different sources together into a single contact.

Omnichannel contact identification and merging is a powerful thing!



Updating Contacts

In XP 7.5 - 8.x, updating contact information in xDB was achieved by using the Tracker Contact (Tracker.Current.Contact), and calling GetFacet using the Interface of the type and passing in the name of the Facet. 

Your code looked like this:

 var existingContact = Tracker.Current.Contact;  

var personalFacet = existingContact.GetFacet<IContactPersonalInfo>("Personal");

personalFacet.FirstName = "Martin";
personalFacet.Surname = "English";

This code will still run in XP 9, but it will no longer save the information to xDB. The update will only persist within session.

In XP 9, you need to use the xConnect Client API in order to update contact information.

 using (Sitecore.XConnect.Client.XConnectClient client = Sitecore.XConnect.Client.Configuration.SitecoreXConnectClientConfiguration.GetClient())  
{
try
{
var webContactIdentifier = Tracker.Current.Contact.Identifiers.FirstOrDefault(t => t.Source == "corporateweb")?.Identifier;
var existingContact = client.Get<Sitecore.XConnect.Contact>(new IdentifiedContactReference("corporateweb", webContactIdentifier), new Sitecore.XConnect.ContactExpandOptions(PersonalInformation.DefaultFacetKey));

if (existingContact != null)
{
var personalFacet = existingContact.GetFacet<PersonalInformation>() ?? new PersonalInformation();

personalFacet.FirstName = "Martin";
personalFacet.LastName = "English";

client.SetFacet(existingContact, PersonalInformation.DefaultFacetKey, personalFacet);

client.Submit();
}
}
catch (XdbExecutionException ex)
{
//Oops, something went wrong
}
}

Some important things that you need to be aware of:

  • The xConnect Contact and Tracker Contact models are different. 
  • The legacy facet classes are available in XP 9, so your existing code won't break. 
  • When you update your Tracker facet's in session, the update will persist throughout session, but won't save to xDB.
  • On session end, the Tracker contact data is run through a series of conversion pipelines where it ends up in xConnect. 
  • When the web contact returns to the site, the Tracker contact is hydrated through another set of conversion pipelines using the contact's data stored in xConnect.
  • If you have been updating xDB contact data using the Tracker Contact and calling GetFacet, you will need to update your code to use the xConnect Client API in order to update contact information. 

Working with the xConnect Client API

xConnect Client API within Sitecore Context

If you are working within a Sitecore Context, using the xConnect client is really straightforward. You don't have to worry about endpoints or certificates, as all that is abstracted. An example of this is shown above.

xConnect Client API outside Sitecore Context

Unsecured Client Connection

The example code that Jason has on GitHub requires an untrusted client connection in order to work.

Example:

 private static XConnectClient GetClient()  
{
var config = new XConnectClientConfiguration(new XdbRuntimeModel(CollectionModel.Model), new Uri("https://sc90.xconnect"), new Uri("https://sc90.xconnect"));

try
{
config.Initialize();
}
catch (XdbModelConflictException ex)
{
Console.WriteLine(ex.Message);
throw;
}

return new XConnectClient(config);
}

In order to run this, you need to disable these two xml files: sc.XConnect.Security.EnforceSSLWithCertificateValidation.xml and sc.XConnect.Security.EnforceSSL.xml located at:  [location of your xConnect instance]\App_data\config\sitecore\CoreServices

If you don't, you will receive the following Sitecore.XConnect.XdbCollectionUnavailableException "The HTTP response was not successful: Unauthorized".

Making this type of adjustment is fine if you are writing some POC code, but it is obviously not recommended as you start writing code for your customers.

Secured Client Connection

In order to establish a trusted client connection, you need to add the security certificate info to the request.

The most important thing you will need is the xConnect client certificate thumbprint that is found in the validateCertificateThumbprint setting in the your xConnect AppSettings.config, located at [location of your xConnect instance]\App_Config\ or in the ConnectionStrings.config of your Sitecore instance. The "FindValue" part of each xConnect Connection String contains this value.

For example:

<add name="xconnect.collection.certificate" connectionString="StoreName=My;StoreLocation=LocalMachine;FindType=FindByThumbprint;FindValue=ADC6D07F383B2E116CC7510F4681EA34EE822F22" />  


To see this value within the certificate itself, you can navigate to it within your Personal Certificates store, shown below:

Using this thumbprint, we can make a secure xConnect client connection using the following code sample:

 var certThumbprint = "adc6d07f383b2e116cc7510f4681ea34ee822f22";  
var xConnectUrl = "https://sc90.xconnect";

var options = CertificateWebRequestHandlerModifierOptions.Parse($"StoreName=My;StoreLocation=LocalMachine;FindType=FindByThumbprint;FindValue={certThumbprint}");
var certificateModifier = new CertificateWebRequestHandlerModifier(options);

var clientModifiers = new List<IHttpClientModifier>();
var timeoutClientModifier = new TimeoutHttpClientModifier(new TimeSpan(0, 0, 20));

clientModifiers.Add(timeoutClientModifier);

var collectionClient = new CollectionWebApiClient(new Uri($"{xConnectUrl}/odata"), clientModifiers, new[] { certificateModifier });
var searchClient = new SearchWebApiClient(new Uri($"{xConnectUrl}/odata"), clientModifiers, new[] { certificateModifier });
var configurationClient = new ConfigurationWebApiClient(new Uri($"{ xConnectUrl }/configuration"), clientModifiers, new[] { certificateModifier });
var config = new XConnectClientConfiguration(new XdbRuntimeModel(CollectionModel.Model), collectionClient, searchClient, configurationClient);

try
{
config.Initialize();
}
catch (Exception e)
{
Console.WriteLine(e);
throw;
}

return new XConnectClient(config);

Wrap Up

I hope that this post has helped you understand some of the contact changes that xConnect presents us with, and also provides enough crumbs to get you started using the xConnect Client API.

Happy Exploring!


My Upgrade Experience from Sitecore 8.1 Update-3 to 8.2 Update-6

$
0
0
I was assigned the task of upgrading an existing client's large, multisite 8.1 rev. 160519 (8.1 Update-3) instance to 8.2 rev. 171121 (8.2 Update-6).  This particular client wasn't ready to go all the way to version 9.0, but will do so in the near future.

It took me longer than anticipated to get things up and running, simply because I needed to perform some updates to their custom solution.


Side Notes

The Sitecore solution that I was upgrading was using Castle Windsor as the Inversion of Control container along with Glass Mapper.

Getting Ready

To get started, I navigated over to the dev.sitecore.net site to arm myself with the files needed to perform the upgrade. The files that I downloaded from the site included:

  • Upgrade guide: Sitecore-8.2-Update-6-Upgrade-Guide.pdf
  • Sitecore update package: Sitecore 8.2 rev. 171121 (update package)
  • Configuration files for upgrade: Sitecore 8.2 rev. 171121 (config files)
  • Sitecore Update Installation Wizard: Sitecore Update Installation Wizard 2.0.2 rev. 170703
  • ZIP archive of the Sitecore site root folder: Sitecore 8.2 rev. 171121.zip

The software tools I use when performing upgrades are:

The road to 8.2 Update-6

These are the steps necessary to perform the upgrade:

Disabled xDB located in Sitecore instance \Website\App_Config\Include\Sitecore.Xdb.config
  • <setting name="Xdb.Enabled" value="false" />
  • <setting name="Xdb.Tracking.Enabled" value="false" />

The instance didn't have Email Experience and WFFM modules, so I didn't need disable their respective config files.

Ran the SQL database script called "CMS82_BeforeInstall.sql" located in \Sitecore 8.2 rev. 171121 (config files)\Database Upgrade Script on all Sitecore databases:
  • Core
  • Master
  • Web
  • Reporting (Analytics)

Installed the Sitecore Update Installation Wizard 2.0.2 rev. 170703.zip regular Sitecore package.

After it completed, I proceeded to install the "Sitecore 8.2 rev. 171121.update" update package using the installation wizard:  /sitecore/admin/UpdateInstallationWizard.aspx.

You will need to unzip the Sitecore 8.2 rev. 171121 (update package).zip in order to obtain the "update" file that Sitecore requires.


After clicking the "Analyze the package" button,  I opted NOT to install files as I preferred to start with a clean copy of the web root of Sitecore 8.2 Update-6.

I feel like this is a cleaner approach as it helps avoid having legacy cruft make its way into the new instance.


The package installation completed without any issues.

Instance Preparation and Comparing Files

I proceeded to stand up a clean version of Sitecore 8.2 Update-6 alongside my legacy 8.1 Update-3 instance and used the Beyond Compare app to compare the files. 

Apart from the Web.Config, the Sitecore.config was the next file where I saw the most differences.

Pro Tip: It is best practice is to move any differences that you find in vanilla config files to separate patch files. That way, life will be much easier for future upgrades.

Updating your Custom Solution

As some of the config files compared could exist in your custom solution, it is best to update the files in your solution as soon as you have completed your comparisons / merges on your Sitecore instance.

I worked in a new branch in source control, so that I could gradually update the files, and commit them as I made progress.

Sitecore 8.2 moved to .NET Framework version 4.5.2 from 4.5 in 8.1. So the target framework in each of the solution's projects needed to be updated:


NuGet Fun

The custom solution I was working with had all the Sitecore referenced assemblies in a single Nuget package, consumed via a custom feed. I opted to switch to the Sitecore public NuGet feed: https://doc.sitecore.net/sitecore_experience_platform/82/developing/developing_with_sitecore/sitecore_public_nuget_packages_faq

A lot of time was spent making sure the correct NuGet packages were loaded so that references where correct. As I was working with 8.2 rev. 171121, I matched my NuGet packages to the version by using the 8.2.171121, "NoReferences" packages.

As Jeremy Davis pointed out: "...the .NoReferences packages include the binary files but don’t have dependencies on other packages. So if you want to pick out individual DLLs without having to import all of the other assemblies that file depends on, choose these packages. It’s a bit more effort to manually add each of these that you need – but it means your project only refers to the specific files you want."

Note: When updating packages like WebGrease for example, it is important to match the assembly version in the Sitecore bin folder to the NuGet package versions.

Working with Solr

As I was using Solr as my search provider, I used Patrick's Powershell script to set my config files to use Solr.

The Sitecore instance was using the Single Instance Solr Configuration - Patch #391039, that I discussed in this post: http://sitecoreart.martinrayenglish.com/2016/09/bulletproofing-your-sitecore-solr-and.html

Support for Solr out-of-the box with this patch was added from Sitecore 8.2 Update-1 on, so I didn't have to include any configurations and files referencing this patch. Most of my work involved me changing my Solr index configurations

From:
Sitecore.Support.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.Support.391039

To:
Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.ContentSearch.SolrProvider

Example

From:
 <index id="my_custom_master_index" type="Sitecore.Support.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.Support.391039">

To:
 <index id="my_custom_master_index" type="Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.ContentSearch.SolrProvider">

Bye Bye IsPageEditorEditing, Hello IsExperienceEditorEditing

As Darren mentioned in his post, Sitecore depreciated the variables IsPageEditor, and IsPageEditorEditing in Sitecore 8.0 Update 6, but kept the methods in all versions of 8.1. 

It would have been nice to have used the Obsolete attribute so that there wouldn't be such a surprise when upgrading to 8.2, and having all your usages of this method break your solution.

The fix was simple enough though. I performed a "find and replace" 

From:
Sitecore.Context.PageMode.IsPageEditorEditing

To:
Sitecore.Context.PageMode.IsExperienceEditorEditing

Problems with Castle Windsor

The solution I was working in was using Castle Windsor 3.3.0.51 and Castle Core 3.3.3.58. I opted to update Castle Windsor to version 4.1.0.0 and Castle Core 4.2.1.0 because I wanted the bug fixes and enhancements of the newer releases.

After deploying the updated assemblies to my upgraded Sitecore instance, I ran into the following error:

Could not load file or assembly 'Castle.Windsor, Version=3.3.0.0, Culture=neutral, PublicKeyToken=407dd0808d44fbdc' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference.

Castle Core changed the AssemblyVersion attribute to only include major version numbers so that they could avoid assembly binding errors with future releases/upgrades of Castle Core: https://github.com/castleproject/Core/issues/307

In my case, the error was happening because I had assemblies that were compiled against the new AssemblyVersion strategy.

Applying the following assembly binding redirects in my Web.config, fixed the issue.

<dependentAssembly>
        <assemblyIdentity name="Castle.Core" publicKeyToken="407dd0808d44fbdc" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-999.999.999.999" newVersion="4.0.0.0" />
</dependentAssembly>

<dependentAssembly>
        <assemblyIdentity name="Castle.Windsor" publicKeyToken="407dd0808d44fbdc" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-999.999.999.999" newVersion="4.0.0.0" />
</dependentAssembly>

Minor Problem with Glass Mapper

Like Castle, I also opted to update Glass Mapper to a higher version. By doing so,  I ran into a small issue, similar to what is described here: https://github.com/mikeedwards83/Glass.Mapper/issues/244

In my case, I discovered that I was simply missing the Glass.Mapper.Sc.Mvc references to the new assembly in the MVC 52 folder in the Nuget package, and the updated assembly in my Sitecore bin folder.

Minor Problem with WebGrease

After making my way through the above-mentioned problems, I ran into a WebGrease version issue. 

Inner Exception: Could not load file or assembly 'WebGrease, Version=1.5.2.14234, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)

The fix for this was to simply update my assembly binding redirect in the Web.config.

From:
<dependentAssembly>
        <assemblyIdentity name="WebGrease" publicKeyToken="31bf3856ad364e35" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-1.5.2.14234" newVersion="1.5.2.14234" />
      </dependentAssembly>

To:
<dependentAssembly>
        <assemblyIdentity name="WebGrease" publicKeyToken="31bf3856ad364e35" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-1.6.5135.21930" newVersion="1.6.5135.21930" />
      </dependentAssembly>

Let Me Hear You Say "Hallelujah"!

After I completed all these updates and fixes, I was presented with a beautiful new instance of Sitecore 8.2 Update-6 where my site loaded beautifully and my logs were clean.

Per Sitecore's upgrade guide, I completed the following final steps:
  • Cleared the browser cache. 
  • Published the site. 
  • Rebuilt the search indexes and the link database. 
  • Redeployed marketing definitions. 

I made sure to review my Sitecore logs after performing all of these tasks, and was happy to report that they stayed error free.

SolrProvider SolrSearchFieldConfiguration Error After Upgrade from Sitecore 8.1 Update-3 to 8.2 Update-6

$
0
0

Background

In a previous post, I wrote about upgrading an existing client's large, multisite 8.1 rev. 160519 (8.1 Update-3) instance to 8.2 rev. 171121 (8.2 Update-6).

After multiple rounds of testing, I deployed the upgraded instance to our client's Staging server (Windows Server 2008 R2). When navigating to the Content Editor, I ran into the following Solr related error:

Unable to cast object of type 'System.Collections.Concurrent.ConcurrentBag`1[Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration]' to type 'System.Collections.Generic.IReadOnlyCollection`1[Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration]'.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.InvalidCastException: Unable to cast object of type 'System.Collections.Concurrent.ConcurrentBag`1[Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration]' to type 'System.Collections.Generic.IReadOnlyCollection`1[Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration]'.



Digging In

After reviewing the stack trace and decompiling the Sitecore.ContentSearch.SolrProvider.dll , I focused on the SolrFieldNameTranslator.StripKnownExtensions method in the SolrFieldNameTranslator class:

1:  public string StripKnownExtensions(string fieldName)  
2: {
3: fieldName = this.StripKnownCultures(fieldName);
4: foreach (SolrSearchFieldConfiguration availableType in (IEnumerable<SolrSearchFieldConfiguration>) this.fieldMap.GetAvailableTypes())
5: {
6: if (fieldName.StartsWith("_", StringComparison.Ordinal))
7: {
8: if (!fieldName.StartsWith("__", StringComparison.Ordinal))
9: break;
10: }
11: string str = availableType.FieldNameFormat.Replace("{0}", string.Empty);
12: if (fieldName.EndsWith(str, StringComparison.Ordinal))
13: fieldName = fieldName.Substring(0, fieldName.Length - str.Length);
14: if (fieldName.StartsWith(str, StringComparison.Ordinal))
15: fieldName = fieldName.Substring(str.Length, fieldName.Length);
16: }
17: return fieldName;
18: }
Looking at line 4 above, the GetAvailableTypes method is shown below:

  private readonly ConcurrentBag<SolrSearchFieldConfiguration> availableTypes = new ConcurrentBag<SolrSearchFieldConfiguration>();  

public IReadOnlyCollection<SolrSearchFieldConfiguration> GetAvailableTypes()
{
return (IReadOnlyCollection<SolrSearchFieldConfiguration>) this.availableTypes;
}


As you can see in the code, when calling the GetAvailableTypes method, it casts the ConcurrentBag<SolrSearchFieldConfiguration> type to IReadOnlyCollection<SolrSearchFieldConfiguration> and this is where the "Unable to cast object" error message came from:

Unable to cast object of type 'System.Collections.Concurrent.ConcurrentBag`1[Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration]' to type 'System.Collections.Generic.IReadOnlyCollection`1[Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration]'.


Troubleshooting

According to docs.microsoft.com, the ConcurrentBag<T> class implements IReadOnlyCollection<T> interface in .NET Framework 4.5 and above. However, according to MSDN, in .NET Framework 4.0 it doesn't implement the IReadOnlyCollection<T> interface. The IReadOnlyCollection<T> interface was introduced in .NET Framework 4.5.

I went ahead and verified that the .NET Framework 4.5.2 was installed on the server. Next I checked my Web.config, and confirmed that the compilation targetFramework was set to "4.5.2" and that my httpRuntime targetFramework was set to "4.5.2":

<configuration>
  <system.web>
    <compilation targetFramework="4.5.2"/>
    <httpRuntime targetFramework="4.5.2"/>
  </system.web>
</configuration>

I then tried to repair the .NET framework, and even rebooted the server after the repair, and it didn't solve the problem.

Microsoft Security Update Woes

It was obvious that my 2 initial assumptions about not having the .NET Framework 4.5.2 installed or having a configuration issue were wrong.

As it turned out, .NET 4.5.2 did in fact contain the System.Collections.Concurrent.ConcurrentBag<T> class from System.dll which implemented the IReadOnlyCollection<T> interface, but there were a series of security updates that were rolled out by Microsoft, where the ConcurrentBag<T> did not implement the IReadOnlyCollection<T>. The security updates were built based on .NET framework 4.0 which did not have IReadOnlyCollection<T> interface at all.

In order to check whether one of the aforementioned security updates was installed, you could review the version of C:\Windows\Microsoft.NET\Framework\v4.0.30319\System.dll

For the regular .NET framework 4.5.2 this version looks like 4.5.2... , but if a security update was installed it will look like 4.0.30319.36388.

The Final Solution - So Simple It Hurt

Upgrading the .NET framework to 4.6.1 solved this issue because we were back to having the System.Collections.Concurrent.ConcurrentBag<T> class from System.dll which implemented the IReadOnlyCollection<T> interface.




Exploring Sitecore Managed Cloud Part 1: Tiers, Sizing, Provisioning and Upgrades

$
0
0

Background

I have been working within a client's Sitecore Managed Cloud environment for the last several months, and wanted to share some insights gained from my experience in a series of blog posts, this being the first.



Tier Configuration

Sitecore Managed Cloud Hosting offers a variety of tiers based on traffic volume.

Each tier has a recommended hardware configuration, as shown here: https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/sitecore_on_azure/deploying/sitecore_configurations_and_topology_for_azure

When you commit to a tier, Sitecore's Managed Cloud Team will provision your Azure infrastructure to support that tier. You will have the ability to increase to the next tier when traffic increases are expected.

If traffic exceeds the threshold of the currently subscribed tier, an overage charge will be applied. From what I understand, the overage cost is about 25% greater than the additional cost of jumping to the next tier.

Tier Sizing and Overage

Sitecore Managed Cloud has the concept of an “included Azure spend” linked to each tier.

If you need to scale up or out, and the cost associated with your scaling goes above your “included Azure spend”,  it is up to you to pay an overage fee. This fee is based on an “overage multiplier” that Sitecore provides you with.

Here are some overage examples:

  • Additional Traffic beyond tier
  • Additional Web Apps Used (Add 1 or more Content Delivery Web App)
  • Exceed Storage Limits (loading large amounts of videos or pdfs)
  • Exceeding Storage Limits for xDB

TIER 1TIER 2TIER 2TIER 4TIER 5
XP–XSmallXP–SmallXP–MediumXP–LargeXP-XLarge
TRAFFIC (in visits/mo.)0–100k100k –200k200k –1MM1MM –5MM5MM –10MM
Content Management1 (B2)1 (B2)1 (B2)1 (B2)1 (B2)
Content Delivery1 (B2)2 (B2)3 (B2)4 (S2)8 (S3)
Bandwidth20 GB40 GB40 GB60 GB100 GB

The Azure App Service Plan pricing page provides details regarding what B2s and S3s are: https://azure.microsoft.com/en-us/pricing/details/app-service/

Sitecore recommends proactive increase of a given topology’s tiers when traffic increases are expected. However, the alternative overage charge poses only a moderate increase in cost to maintain site performance during unexpected, temporary spikes in volume. It also can serve as an indicator that advancement to a larger tier should be considered.

Infrastructure Provisioning

Before provisioning the new infrastructure, the Sitecore Managed Cloud Hosting Team will request that you provide them with the following  information:

  1. Sitecore Version (Eg: Sitecore XP 9.0 Update-1)
  2. Logical Name (Eg: MySuperSolution)
  3. Location of Deployment (Eg: East US)
  4. Location of your Microsoft Application Insights Resources (Eg: East US)
  5. Microsoft Live IDs who can access your Managed Cloud set
From my experience, provisioning will take about a day.

NOTE: As mentioned here: Sitecore provisions you with a temporary license file that is valid for one month. When the temporary license expires, Sitecore stops working, therefore it is important that you upload a valid permanent license.xml file as soon as possible.

Upgrades

Unfortunately, you as a partner or customer will be responsible for upgrades after the Sitecore Managed Cloud Hosting Team has provisioned your infrastructure.

Sitecore initially provisioned our infrastructure using Sitecore 9 Initial Release. After a couple weeks, Sitecore Update-1 was released, and we opened a ticket to request for them to provision all environments on the newer version.

The only reason they did this for us was because we hadn't deployed anything to the new environments, and they could simply delete the existing and provision the new ones using the updated Azure Web Packages.

So the point is - they will NOT upgrade your environment after you have deployed your custom solution into it.



Sitecore Azure Search: Sharing a Search Service in a Development Environment

$
0
0

Background

When developing on the Sitecore Platform using Azure Search, the cloud-based service behaves differently, and so it is best to develop against an actual search service to ensure that you uncover any unexpected behavior before pushing to a higher environment that is using Azure Search.

I put together a quick patch file that updates the Sitecore Index names, allowing your development team to share a Azure Search service.

Simply update the "martin_dev_" prefix with your own and drop the file in the "Include/zzz" folder.


Note: There is a platform dependency on the "sitecore_marketingdefinitions_master" and "sitecore_marketingdefinitions_web" naming so those 2 indexes are excluded from the patch.

In a development environment, we can live with sharing these indexes.

Sitecore xDB: Performance Tuning your MongoDB Driver Configuration Settings

$
0
0

The Goal

Working with my team on a high-traffic client's Sitecore Commerce site, we were tasked with improving MongoDB connection management on the Content Delivery servers to help alleviate pressure on the servers and MongoDB, particularly during busy times of the day, and at times when traffic surges occur due to marketing campaigns or other real-world events.

The Key Settings

We confirmed that Sitecore ships with the default MongoDB driver settings that are actually set in the driver code.  You can view the default settings in the driver code, by following this GitHub link:
https://github.com/mongodb/mongo-csharp-driver/blob/v2.0.x/src/MongoDB.Driver/MongoDefaults.cs

Working with mLab Support, we determined that our focus would be on the following:

Min Pool Size

We decided to increase the Min Pool Size from the default 0 to 20. The mLab team approved this suggestion on the basis that we observed the Content Delivery server's connection pools maxing out due to the amount of operations that were happening during the Sitecore startup process.

Max Pool Size

We increased our Max Pool Size from the default of 100 to 150 in order to better accommodate surges in connection demand. The purpose of this update was to lessen the chance of running out of connections altogether.

Connection Idle Time

We increased the Connection Idle Time from the default of 10 minutes to 25 minutes to reduce the need to create new connections during normal and high-traffic surges.

Connection Life Time

We dropped the default setting of 30 minutes down to 0 (no lifetime). This change was based on the default setting that could also be a contributing factor to our observed connection churning.

Per this thread, a MongoDB engineer (driver author) suggested that this setting is likely not needed:
https://stackoverflow.com/questions/32816076/what-is-the-purpose-of-the-maxconnectionlifetime-setting

The How

As Kam explains in his post, Sitecore exposes an empty updateMongoDriverSettings pipeline that you can hook into to modify configurations that are not available in the connection string.

I created a processor to add to this pipeline that alters the MongoClientSettings:

Finally, I added the following patch to add the processor to pipeline, allowing us to pass the updated MongoDB driver settings to the custom processor:

Final Note

A special thanks to Dan Read (Arke), Alex Mayle (Sogeti) and the mLab Support Team for their contributions.

Sitecore xDB - GeoIP and Contention Dynamics in MongoDB

$
0
0

Background

In my previous post, I discussed how our team has been diligently working to alleviate pressure on the our servers and MongoDB, on a high-traffic client's Sitecore Commerce site.

We use mLab to host our Experience Database, and while monitoring the telemetry of cluster, we noticed a series of contention indicators related to the increased number of queries and connections during high-traffic surges during the day.

In our scenario, our client's site has a lunchtime traffic surge between 11am and 3pm every day.

Contention Dynamics

Overall, our MongoDB was not being over-taxed in terms of overall capacity, as we were not using up all the RAM and CPU, but the telemetry charts did show what looked like pretty clear contention.

We noticed a certain pattern and volume in the site's traffic that lead to contention dynamics on our MongoDB nodes. The contention would eventually start to affect the Sitecore Content Delivery servers, which were obviously also dealing with that day’s peak load of web lunchtime traffic.

We were seeing a surge in connections with data reads (as reflected in MongoDB metric) such as the count of Queries (Operations Per Second) and the Returned documents count (Docs Affected Per Second). This was leading to a high degree of contention, as reflected in various other MongoDB metrics (CPU time, queues, disk I/O, page faults).


Our initial theory supported the idea the root cause of this contention in MongoDB was caused by high volume of lunchtime traffic in Sitecore, but in an indirect way.

GeoIP and MongoDB

Having troubleshooted Sitecore's GeoIP service before, I had a pretty good understanding of the flow.

If you need some insight, I suggest reading Grant Killian's post: https://grantkillian.wordpress.com/2015/03/09/geoip-resolution-for-sitecore-explained

In summary, the flow looks like this:
  • Visitor visits Sitecore website
  • Sitecore performs a GeoIP information lookup from the memory cache using the visitor's IP address
  • If the GeoIP information IS in memory cache then it uses it in the visitor's interaction
  • If the GeoIP information IS NOT in memory cache, it performs a GeoIP lookup in the MongoDB Analytics database's GeoIps collection
  • If the GeoIP information IS in the MongoDB Analytics database's GeoIps collection, it uses it in the visitor's interaction and stores the result in memory cache
  • If the GeoIP information IS NOT in the GeoIps collection, it performs a lookup using the Sitecore Geolocation service and stores the result in memory and uses it in the visitor's interaction

Our high-traffic site makes heavy use of GeoIP, as the Home Page is personalized based on the visitor's location and local time. 

There had to be a correlation between the high-traffic, GeoIP and the activity we were seeing on our MongoDB cluster. 

The item that stood out at me was the highlight above - the GeoIP lookup against the MongoDB Analytics GeoIps collection.



Running a record count query against the GeoIps collection, we discovered that it contained 7.4 million records! This confirmed our theory that the MongoDB GeoIp collection was heavily populated and being used for the lookups to hydrate the visitor's interaction and memory cache.

As a side note, if you crack open the interaction collection, you can see how Sitecore ties the GeoIP data from the lookup to the visitor's interaction (this is old news):


GeoIP Cache Settings

After digging into the code, we discovered that Sitecore's GeoIP service uses the cache called LegacyLocationList to store the GeoIP lookup data after is has been returned from either MongoDB or the GeoLocation service.

The naming of the cache is what caught us by surprise. One would think that a "legacy" cache would no longer be used.

If you crack open the Sitecore.CES.GeoIp.LegacyLocation.dll with your favorite .NET Decompiler  and you will see the following:


We started monitoring this legacy location cache closely, and discovered that it was in fact hitting capacity and clearing frequently during our lunchtime traffic surge. This had a direct relationship with the contention we were seeing on our MongoDB nodes during that period of time.

It was obvious to us at this point, that the 12MB default size of this cache was not enough to handle all that GeoIP lookup data!



GeoIP Cache Size Updates and Results

Our team decided to increase the LegacyLocationList cache size to 20MB via a simple patch update:

<setting name="CES.GeoIp.LegacyLocation.Caching.LegacyLocationListCacheSize">  
<patch:attribute name="value">20MB</patch:attribute>
</setting>
After our deployment, we monitored the cluster's telemetry closely. It was apparent by looking at the connection count, that there was an instant improvement resulting from the increased cache size.

Before the deployment of the cache setting change (LegacyLocationList cache default set to 12MB), we were averaging around 400 connections during the traffic surge.


After the deployment (increase the LegacyLocationList cache size to 20MB), our connection count was only averaging around 200!


Over the course of several weeks, our team was happy to report that during our lunchtime traffic surges, there was a dramatic reduction in connections with Data Reads, Operations Per Second, Docs Affected Per Second, CPU time, queues, disk I/O, page faults on our MongoDB cluster.

Another positive step towards our overall goal of improving MongoDB connection management on our Content Delivery servers.

Final Note

Another special thanks to Dan Read (Arke), Alex Mayle (Sogeti) for their contributions.

Sitecore GeoIP - What Is Happening Under The Hood In 8.x?

$
0
0

Background

Most posts explain how Sitecore's GeoIP service works from a high-level point of view.

In this post, I intend to take the explanation a few steps deeper, so that developers can understand all the pieces that make this process work. The goal is to arm developers with the necessary details to successfully troubleshoot a problem if one arises.


Visitor Interaction - Start of visitor's session

  • Visitor visits Sitecore website, and this is regarded as a new interaction. Sitecore's definition of an interaction is ".. any point at which a contact interfaces with a brand, either online or offline". In our case, this is a new visitor session on the website.

  • Sitecore runs the CreateVisits pipeline. Within this pipeline, there is a processor called UpdateGeoIpData that fires a method called GeoIpManager.GetGeoIpData within Sitecore.Analytics.Tracking.CurrentVisitContext that initiates the GeoIP lookup for the visitor's interaction.

  • Within the GeoIP lookup logic, Sitecore will use the visitor's IP address to generate a unique identifier (GUID) based on the visitor's IP address. Eg. 192.168.1.100 => fd747022-dd48-b1ca-1312-eb4ba55030b2. 

NOTE: Sitecore performs all GeoIP lookups using this unique identifier. You can see this id by looking inside your MongoDB's GeoIPs collection. The field is named _id and this is the unique naming convention that MongoDB uses across all of its content. See my previous post for a snapshot.

  • Sitecore performs a GeoIP data lookup in memory cache.

  • If the GeoIP data IS in memory cache, then it will attach it to the visitor's interaction.

  • If the GeoIP data IS NOT in memory cache, it performs a GeoIP lookup in the MongoDB Analytics database's GeoIps collection.

  • If the GeoIP data IS in the MongoDB Analytics database's GeoIps collection, it attaches it to the visitor's interaction and stores the result in memory cache.

  • If the GeoIP data IS NOT in the GeoIps collection, it performs a lookup using the Sitecore Geolocation service and stores the result in memory cache and attaches it to the visitor's interaction.

NOTE: After a successful lookup, the GeoIP data is stored in the Tracker.Current.Interaction.GeoData (ContactLocation class)

GeoIP Data Cache

  • When the GeoIP data is obtained, it is added to a dictionary object that is part of the Sitecore Tracker so that it can be referenced via the Tracker.Current.Interaction.GeoData (shown above).

  • The odd thing that I noticed was that the cache expiration was set to 10 seconds (by default)
          Code reference:
          Sitecore.Analytics.Data.Dictionaries.TrackingDictionary
          private readonly TimeSpan defaultCacheExpirationTimeout = TimeSpan.FromSeconds(10.0);

GeoIP Data - End of visitor's session

  • At the end of the visitor's interaction / session, Sitecore will run the CommitSession pipeline.

  • Like the CreateVisits pipeline, there is a processor called UpdateGeoIpData that fires a method called GeoIpManager.GetGeoIpData (with the exact same code as in the CreateVisits pipeline). This initiates the GeoIP lookup flow once again (Cache / MongoDB / GeoIP Service).

  • Seems like the intention here is to confirm the visitor's GeoData before storing the data in MongoDB that will ultimately make it's way to the reporting database.

More To Come

Next, I intend to dig into Sitecore's GeoIP code for the 9.x series, and talk about the differences identified in that implementation.

Sitecore GeoIP - A Developer's Guide To What Has Changed In Sitecore 9

$
0
0
In my previous post, I took a dive into the 8.x version of the Sitecore GeoIP service from a developer's point of view. Sitecore 9 introduced great improvements to xDB, GeoIP being one of those features.

In this post, I intend to help developers understand what has changed in Sitecore 9 GeoIP.  Like my previous post, the purpose is to arm developers with the necessary details to understand what is happening under the hood, so that they can successfully troubleshoot a problem if one arises.


Reference Data

One of the first things that I discovered when diving into version 9 is the use of a series of "ReferenceDataClientDictionaries" that are exposed to us as "KnownDataDictionaries".

As inferred by the name, these are known collections of things that are used to store common data, one being IP Geolocation data. The data is ultimately stored in a SQL database, so that it can be referenced throughout the Experience Platform.

There is a new pipeline in Sitecore 9 that initializes these dictionaries, as shown here:

Config: 

<initializeKnownDataDictionaries patch:source="Sitecore.Analytics.Tracking.config">
<processor type="Sitecore.Analytics.DataAccess.Pipelines.InitializeKnownDataDictionaries.InitializeKnownDataDictionariesProcessor, Sitecore.Analytics.DataAccess"/>
<processor type="Sitecore.Analytics.XConnect.DataAccess.Pipelines.InitializeKnownDataDictionaries.InitializeDeviceDataDictionaryProcessor, Sitecore.Analytics.XConnect" patch:source="Sitecore.Analytics.Tracking.Database.config"/>
</initializeKnownDataDictionaries>

Processor Code:

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
namespace Sitecore.Analytics.DataAccess.Pipelines.InitializeKnownDataDictionaries
{
publicclassInitializeKnownDataDictionariesProcessor : InitializeKnownDataDictionariesProcessorBase
{
publicoverridevoid Process(InitializeKnownDataDictionariesArgs args)
{
Condition.Requires<InitializeKnownDataDictionariesArgs>(args, nameof (args)).IsNotNull<InitializeKnownDataDictionariesArgs>();
GetDictionaryDataPipelineArgs args1 = new GetDictionaryDataPipelineArgs();
GetDictionaryDataPipeline.Run(args1);
Condition.Ensures<DictionaryBase>(args1.Result).IsNotNull<DictionaryBase>("Check configuration, 'getDictionaryDataStorage' pipeline must set args.Result property with instance of DictionaryBase type.");
args.LocationsDictionary = new LocationsDictionary(args1.Result);
args.ReferringSitesDictionary = new ReferringSitesDictionary(args1.Result);
args.GeoIpDataDictionary = new GeoIpDataDictionary(args1.Result);
args.UserAgentsDictionary = new UserAgentsDictionary(args1.Result);
args.DeviceDictionary = new DeviceDictionary(args1.Result);
}
}
}

If you look at line 13 above, the GeoIpDataDictionary object being created is inherited from Sitecore's new ReferenceDataDictionary.

This is the glue between GeoIP and the new Reference Data "shared storage" mechanism.

Here is what the code looks like:

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
namespace Sitecore.Analytics.DataAccess.Dictionaries
{
  publicclassGeoIpDataDictionary : ReferenceDataDictionary<Guid, GeoIpData>
  {
    public GeoIpDataDictionary(DictionaryBase dictionary, int cacheSize)
      : base(dictionary, "GeoIpDataDictionaryCache", XdbSettings.GeoIps.CacheSize * cacheSize)
    {
      this.ReadCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsReads;
      this.WriteCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsWrites;
      this.CacheHitCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsCacheHits;
      this.DataStoreReadCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsDataStoreReads;
      this.DataStoreReadTimeCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsDataStoreReadTime;
      this.DataStoreWriteTimeCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsDataStoreWriteTime;
    }

    public GeoIpDataDictionary(DictionaryBase dictionary)
      : this(dictionary, XdbSettings.GeoIps.CacheSize)
    {
    }

    publicoverride TimeSpan CacheExpirationTimeout
    {
      get
      {
        return TimeSpan.FromSeconds(600.0);
      }
    }

    publicoverride Guid GetKey(GeoIpData value)
    {
      returnvalue.Id;
    }

    publicstring GetKey(Guid id)
    {
      return id.ToString();
    }
  }
}


Notice on line 25 that this object is cached for 10 minutes. More on this below.

Reference Data Storage and the GeoIP Lookup Flow

You may be wondering how this Reference Data feature changes what you know about the GeoIP flow from previous versions of the platform.

Let's review the steps:

  • Sitecore runs the CreateVisits pipeline. Within this pipeline, there is a processor called UpdateGeoIpData that fires a method called GeoIpManager.GetGeoIpData within Sitecore.Analytics.Tracking.CurrentVisitContext that initiates the GeoIP lookup for the visitor's interaction.

  • Sitecore performs a GeoIP data lookup in the GeoIP memory cache.
    • NOTE: Cache expiration is set to 10 seconds => TimeSpan.FromSeconds(10.0)

Sitecore.Analytics.Lookups.GeoIpCache:

1
2
3
4
5
6
7
8
    publicvoid Add(GeoIpHandle handle)
{
Assert.ArgumentNotNull((object) handle, nameof (handle));
if (this.cache.Count >= this.maxCount)
this.Scavenge();
this.cache.Add(handle.Id, (object) handle, TimeSpan.FromSeconds(10.0));
AnalyticsTrackingCount.GeoIPCacheSize.Value = (long) this.cache.Count;
}

  • If the GeoIP data IS in the GeoIP memory cache, then it will attach it to the visitor's interaction.

  • If the GeoIP data IS NOT in the GeoIP memory cache, it performs a lookup in the Reference Data's GeoIpDataDictionary (KnownDictionaries) memory cache.
    • NOTE: Cache expiration is set to 10 minutes => TimeSpan.FromSeconds(600.0). See above for the 10 minute CacheExpirationTimout property on the Sitecore.Analytics.DataAccess.Dictionaries.GeoIpDataDictionary class.

  • If the GeoIP data IS in the Reference Data's GeoIpDataDictionary memory cache, it attaches it to the visitor's interaction and adds it to the GeoIP memory cache.

  • If the GeoIP data IS NOT in the Reference Data's GeoIpDataDictionary memory cache, it performs a lookup in the SQL ReferenceData database and if found, stores the result in the Reference Data's GeoIpDataDictionary cache and GeoIP memory cache, and then attaches it to the visitor's interaction.

  • If the GeoIP data IS NOT in the SQL ReferenceData database, it performs a lookup using the Sitecore Geolocation service and stores the result in the SQL ReferenceData database, the Reference Data's GeoIpDataDictionary cache and GeoIP memory cache, and then attaches it to the visitor's interaction.

Reference Data Storage in SQL

By using SQL Server Management Studio, and opening up the ReferenceData database's DefinitionTypes table, you can see the different types of reference data that is being stored. The GeoIp data type name as you can see below, is called "Tracking Dictionary - GeoIpData".


By looking at the Definitions table, you can see that the data is stored as a Binary data type:


The following SQL Query will return the top 100 GeoIP reference data results:

1
2
3
4
SELECT TOP 100 [xdb_refdata].[DefinitionTypes].Name, [xdb_refdata].[Definitions].Data, [xdb_refdata].[Definitions].IsActive, [xdb_refdata].[Definitions].LastModified, [xdb_refdata].[Definitions].Version
FROM [xdb_refdata].[Definitions]
INNERJOIN [xdb_refdata].[DefinitionTypes] ON [xdb_refdata].[DefinitionTypes].ID = [xdb_refdata].[Definitions].TypeID
WHERE [xdb_refdata].[DefinitionTypes].Name = 'Tracking Dictionary - GeoIpData'



Changes to the GeoIpManager class

Finally, I wanted to provide a glimpse of the changes in the GeoIpManager class that I referenced in my previous post.

By comparing the 8.x version of the GeoIpManager code to 9, you can see the usage of the KnownDataDictionaries.GeoIPs dictionary instead of the Tracker.Dictionaries.GeoIpData (ContactLocation class) from 8.x:



Final Words

I hope that this information helps developers understand more about Reference Data and the updated GeoIP Lookup Flow in Sitecore 9.

As always, feel free to comment or reach me on Slack or Twitter if you have any questions.


Sitecore Azure PaaS: Updating Your License File For All Deployed App Service Instances

$
0
0

Background

If you are provisioning a new set of Sitecore environments on your own, or if the Sitecore Managed Cloud Hosting Team provisions your environments for you, you will most likely be using a temporary license file that is valid for 1 month while you are waiting for your permanent license file .

When the temporary license expires, your Sitecore instances will stop working. Therefore, it is important that you upload a valid permanent license.xml file as soon as it is available.

File Locations

In an XP Scaled environment, there are many different App Services and locations where the license.xml file will need to be updated.

I created a list of the App Service roles and the license file locations for your reference:

App Service RoleLicense File Location
xc-search \App_data & \App_data\jobs\continuous\IndexWorker\App_data
ma-ops \App_data & \App_data\jobs\continuous\AutomationEngine\App_Data
cd \App_data
cm \App_data
ma-rep \App_data
prc \App_data
rep \App_data
xc-collect \App_data
xc-refdata \App_data

Updating the License File

The easiest way to update the file is to use the Debug console in the Kudu "Advanced Tools" in your App Service Instance, or an FTPS client to connect directly to the App's filesystem.


Fix Email Campaign Pausing: Sitecore Email Experience Manager 3.x Retry Data Provider

$
0
0

Background

My company uses Email Experience Manager (EXM) to send several million emails a day, and we have been facing issues where our large campaigns would pause mid-send.

We have a scaled EXM environment with 2 dedicated dispatch servers, and a separate SQL Server, all with appropriate resources so the hardware was not an issue. We also ensured that databases were kept in tiptop condition (proper maintenance plans with stats being updated), and configurations where optimal for our environment.


The causing of the pausing

After digging in, I discovered that the pausing was caused by SQL deadlocks due to the massive amount of records and CRUD activity on the EXM SQL databases.

Sample Exception:

 ERROR Transaction (Process ID 116) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.  
Exception: System.Data.SqlClient.SqlException
Message: Transaction (Process ID 116) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Source: .Net SqlClient Data Provider
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.Read()
at System.Data.SqlClient.SqlCommand.CompleteExecuteScalar(SqlDataReader ds, Boolean returnSqlValue)
at System.Data.SqlClient.SqlCommand.ExecuteScalar()
at Sitecore.Modules.EmailCampaign.Core.Data.SqlDbEcmDataProvider.CountRecipientsInDispatchQueue(Guid messageId, RecipientQueue[] queueStates)
at Sitecore.Modules.EmailCampaign.Core.Gateways.DefaultEcmDataGateway.CountRecipientsInDispatchQueue(Guid messageId, RecipientQueue[] queueStates)
at Sitecore.Modules.EmailCampaign.Core.Analytics.MessageStatistics.get_Unprocessed()
at Sitecore.Modules.EmailCampaign.Core.Analytics.MessageStatistics.get_Processed()
at Sitecore.Modules.EmailCampaign.Core.MessageStateInfo.InitializeSendingState()
at Sitecore.Modules.EmailCampaign.Core.MessageStateInfo.InitializeMessageStateInfo()
at Sitecore.Modules.EmailCampaign.Factory.GetMessageStateInfo(String messageItemId, String contextLanguage)
at Sitecore.EmailCampaign.Server.Services.MessageInfoService.Get(String messageId, String contextLanguage)
at Sitecore.EmailCampaign.Server.Controllers.MessageInfo.MessageInfoController.MessageInfo(MessageInfoContext data)


How does this new data provider fix the problem?

The new data provider introduces efficient SQL deadlock handling. When a deadlock is detected, it will wait 5 seconds and then retry the transaction. The code will try to execute a deadlocked transaction 3 times.

Configuration

Defaults are set to wait 5 seconds for the retry, and the max retry attempts is 3. The DelaySeconds and RetryCount settings can be modified to suit your needs.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">  
<sitecore>
<ecmDataProvider defaultProvider="sqlretry">
<providers>
<clear/>
<add name="sqlretry" type="Sitecore.EmailCampaign.RetryDataProvider.RetrySqlDbEcmDataProvider, Sitecore.EmailCampaign.RetryDataProvider" connectionStringName="exm.master">
<Logger type="Sitecore.ExM.Framework.Diagnostics.Logger, Sitecore.ExM.Framework" factoryMethod="get_Instance"/>
<DelaySeconds>5</DelaySeconds>
<RetryCount>3</RetryCount>

</add>
<add name="sqlbase" type="Sitecore.Modules.EmailCampaign.Core.Data.SqlDbEcmDataProvider, Sitecore.EmailCampaign" connectionStringName="exm.master">
<Logger type="Sitecore.ExM.Framework.Diagnostics.Logger, Sitecore.ExM.Framework" factoryMethod="get_Instance"/>
</add>
</providers>
</ecmDataProvider>
</sitecore>
</configuration>

Source Code and Documentation

Full source code, documentation and package download is available from my GitHub repository:

https://github.com/martinrayenglish/Sitecore-EXM-3.x-Retry-Data-Provider


Improving the Sitecore Broken Links Removal Tool

$
0
0

Background

While working through an upgrade to Sitecore 9.1, I ran into a broken links issues that couldn't be resolved using Sitecore's standard Broken Links Removal tool.

While searching the internet, I was able to determine that I wasn't the only one that faced these types of issues.

In this post, I intend to walk you through the link problems that I ran into, and why I decided to create an updated Broken Links Removal tool to overcome the issues that the standard links removal tool wasn't able to resolve.

NOTE: The issues that I present in this post are not specific to version 9.1.  They exist in Sitecore versions going back to 8.x.


Exceptions after Upgrade Package Installation

After installing the 9.1 upgrade package and completing the post installation steps of rebuilding the links database and publishing, I discovered that lots of my site's pages started throwing the following exceptions:



The model item passed into the dictionary is of type 'Castle.Proxies.IGlassBaseProxy', but this dictionary requires a model item of type 'My Custom Model'.

During the solution upgrade, I had upgraded to Glass Mapper to version 5, so I thought that the issue could be related to this.  After digging in, I noticed that my items / pages that were throwing exceptions had broken links.  I determine this by turning on Broken Links using the Sitecore Gutter in the Content Editor.


Next, I attempted to run Broken Links Removal tool located at http://{your-sitecore-url}/sitecore/admin/RemoveBrokenLinks.aspx.

After it had run for several minutes, it threw the following exception:

ERROR Error looking up template field. Field id: {00000000-0000-0000-0000-000000000000}. Template id: {128ADD89-E6BC-4C54-82B4-A0915A56B0BD}
Exception: System.ArgumentException
Message: Null ids are not allowed.
Parameter name: fieldID
Source: Sitecore.Kernel
   at Sitecore.Diagnostics.Assert.ArgumentNotNullOrEmpty(ID argument, String argumentName)
   at Sitecore.Data.Templates.Template.DoGetField(ID fieldID, String fieldName, Stack`1 stack)
   at Sitecore.Data.Templates.Template.GetField(ID fieldID)

Digging In

I needed to understand why this exception was being thrown, and started down the path of decompiling Sitecore's assemblies.  My starting point for reviewing the code was Sitecore.sitecore.admin.RemoveBrokenLinks.cs which is the code behind for the Broken Links Removal page.

I took all the code and pasted it into my own ASPX page so that I could throw in a break point and debug what was going on.  After a lot of trial and error and a ton of logging,  I discovered that code that was throwing the error existed in the FixBrokenLinksInDatabase method on line 11 shown below:

If the Source Field ID / "itemLink.SourceFieldID" on line 11 is null (this is the field where it has determined that there is a broken link), the exception noted above will be thrown.

The Cause of the Null Source Field

During my investigation, I determined that the cause of this field being null was due to the item being created from a branch template that no longer existed.

To put this another way, the target item represented as the sourceItem in the code above (line 8), had a reference to a branch template that no longer existed, and the lookup for item was returning a null source field.

Through my code logging and Content Editor validation, I found that we had a massive amount of broken links caused by a developer deleting several EXM branch templates:



Stack Exchange and Sitecore Community uncovered some decent information regarding this type of issue, and how to solve it manually by running a SQL query:

https://community.sitecore.net/developers/f/8/t/1784

https://sitecore.stackexchange.com/questions/88/how-do-i-fix-a-broken-created-from-reference-when-the-branch-no-longer-exists/89

Now, to fix this problem automatically using the tool, I just needed to add a null check in the code, and also create a way to clean up the references to the invalid branch templates.

Improved Broken Links Tool

The outcome of my work was an improved Broken Links Removal tool that I call the "Broken Links Eraser".

The tool does everything that the Sitecore Broken Links Removal tool does, with the following improvements:

  • Detects and removes item references to branch templates that no longer exist.
  • Removes all invalid item field references to other items (inspects all fields that contain an id).
  • Allows you to target broken links using a target path, you don't have to run through every item in the target database. This is useful when working with large sets of content.
  • Has detailed logging while it is running and feedback after it has completed. 

The tool is built as a standalone ASPX page, so you can simply drop the file in your {webroot}/sitecore/admin folder to use it. No need to deploy assemblies and recycle app pools etc.


All updates were made using Sitecore's SqlDataApi, so the code is consistent with Sitecore's standards. The code is available on GitHub for you to download and modify as needed:



Final Thoughts

I hope that you find this tool useful in solving your broken link issues. Please feel free to add comments or contact me with any questions on either Sitecore Slack or Twitter.


Going to Production with Sitecore 9.1 on Azure PaaS: Critical Patches Required For Stability

$
0
0
After spending several months upgrading our custom solution to Sitecore 9.1, and launching on Azure PaaS, I have learned a lot about what it takes to eventually see the sunshine between those stormy clouds.

This is the first of a series of posts intended to help you and your team make the transition as smooth as possible.



Critical Patches

There are several patches and things that you will need to deploy that are imperative to your success on Azure PaaS.


High CPU - Excessive Thread Consumption

Sitecore traditional server roles (Content Management, Content Delivery etc) operate in a synchronous context while xConnect operations are asynchronous. Therefore, communication between your Sitecore traditional servers and xConnect are performed in a synchronous to asynchronous context.

This sync to async operation requires double the number of threads on the sync side in order to do the job.  This could result in there not being enough threads available to unblock the main thread.

Sitecore handled this excessive threading problem in their application code by building a custom thread scheduler. What this does is take advantage of a blocked thread to execute the operation, thus reducing the need for the additional thread, and making this synchronous to asynchronous context more efficient.

Great stuff right? Well, the problem that everyone will be faced with is that if you are not using an exact version of the System.Net.Http library, this thread scheduler simply doesn't work!

New versions of System.Net.Http don't respect the custom thread schedulers that Sitecore has built.

With the configurations that are shipped with Sitecore 9.x, the application uses the Global Assembly Cache to reference System.Net.Http, and 9 times out of 10, it will be a newer version of this library.

Without this thread scheduler working, you will end up with high CPU due to thread blocking, and your application will start failing to respond to incoming http requests.

In my case, I saw blocking appear in session end pipelines, and also in some calls on my Content Management server when working with EXM and contacts.

More detail about his issue, and the fix is described in this article: https://kb.sitecore.net/articles/327701

When you read the article, you would think that it doesn't apply to you because it is referring to .NET 4.7.2, and if you are working with Sitecore 9.x, the application ships using 4.7.1.

The truth is that it does! You need to perform the following actions in order to fix the threading problem:

1. Apply the binding redirect to your web.config to force Sitecore to use System.Net.Http version 4.2.0.0 mentioned in the article:


2. Deploy the System.Net.Http version 4.2.0.0 to the bin folder on all your traditional Sitecore instances.

NOTE: Make sure you remove any duplicate System.Net.Http binding redirect entries in your web.config, and that you only have the one described above.

Reference Data

First Issue

This first patch you need adds the ability to configure cache sizes and expiration time for the UserAgentDictionaryCache, ReferringSitesDictionary, and GeoIpDataDictionary, and the size for ReferenceDataClientDictionary cache. Without this patch, you will see high DTU (up to 100%) in your Reference Data database as there is a bug that allows the cache size to grow enormously, which leads to performance issues and shutdowns.

In order to fix the issue, you need to review the following KB article: https://kb.sitecore.net/articles/067230

In our 9.1 instance, I used the 9.0.1.2 version of the patch.

Second Issue

This first patch is not enough to fix your Reference Data woes. There is another set of Stored Procedure performance issues related to SQL when querying the Reference Data database. 

You will need to download and execute the following SQL scripts in order to fix this issue:

Redis Session Provider

First Issue

If you are on Azure PaaS, you will most definitely using Redis as your Out of Proc Session State Provider.

Patch 210408 is critical for the stability of session state in your environment https://kb.sitecore.net/articles/464570

This patch limits the number of worker threads per CPU core and also reserves threads so they can handle session end requests/threads with the least amount of delay as possible. Reading between the lines, this patch simply handles the Redis timeout issue more gracefully.

Without this, you will see session end events using all the threads and leaving no room to handle incoming http requests. After hanging for some time, they eventually end up with 502 error due to a timeout.

After applying the patch, the timeout settings referenced in this KB article will need to be made in both your web.config and Sitecore.Analytics.Tracking.config. You also want to update your pollingInterval to 60 seconds to reduce the stress on your Redis instance as well.

Note: Depending on how much traffic your site takes on, you may need to adjust the patch settings in order to free up more threads.

So for example, you can take the original settings, and add a multiplication factor of 3 or 4. As I mentioned before, this will be up to you to determine, based on your experienced load.

Example with multiplication factor of 3:


For my shared session tracker update, I created a patch file like the following:


Second Issue

Gabe Streza has a great post regarding the symptoms experienced when Redis instances powering your session state are under load: https://www.sitecoregabe.com/2019/02/redis-dead-redemption-redis-cache.html

It's important to read through his post, and also Sitecore's KB article: https://kb.sitecore.net/articles/858026

What both are basically saying is that you will need to create a new Redis instance in Azure, so that you can split your private sessions and shared sessions. So, to be clear, you will have one Redis Instance to handle private sessions and another to handle shared sessions.

I decided to keep my existing Redis instance to handle shared sessions, and used the new Redis instance to handle private sessions.

Similar to Gabe's steps, I created a new redis.sessions.private entry in the ConnectionString.config.

I then updated my Session State provider in my web.config to the following:

Final Thoughts 

These fixes have made a night and day difference on the stability of our high traffic 9.1 sites on Azure PaaS.

Feel free to reach out to me on Twitter or Sitecore Slack if you have any questions.

Demystifying Pools and Threads to Optimize and Troubleshoot Your Sitecore Application

$
0
0

Background

If you are .NET application developer that works on Sitecore or not, it is important to have an understanding of how the Microsoft .NET Common Language Runtime (CLR) Thread Pool works will help you determine how to configure your application for optimal performance and help you troubleshoot issues that may present themselves in high traffic production environments.

This topic has been of great interest to me, and it's understanding has helped me troubleshoot and solve many difficult problems within the Sitecore realm.

I am hoping that this post helps other fellow Sitecore developers who may not be as familiar with the inner workings of the .NET CLR and Thread Pool, to have a starting pointing to understand where potential threading issues may occur if the application you support shows symptoms similar to what I intend to discuss.



Thread Pool and Threads 

To put it simply, a thread pool is a group of warmed up threads that are ready to be assigned work to process. 

The CLR Thread Pool contains 2 types of threads that have different roles.

1) Worker Threads 

Worker threads are threads that process HTTP requests that come into your web server - basically they handle and process your application's logic. 

2) Input/Output (I/O) Completion Port or IOCP Threads 

These threads handle communication from your application's code to a network type resource, like a database or web service.

There is really no technical difference between worker threads and IOCP threads. The CLR Thread Pool keeps separate pools of each simply to avoid a situation where high demand on worker threads exhausts all the threads available to dispatch native I/O callbacks, potentially leading to a deadlock. However, this can still occur under certain circumstances.

Out of the Box / Default Thread Pool Thread Counts 

Minimums 

By default, the number of Worker and IOCP threads that your Thread Pool will have ready for work is determined by the number of processors your server has.

Min Formula: Processor Count =  Thread Pool Worker Threads = Thread Pool IOCP Threads

Example: If you have a server with 8 CPUs, you will start with only 8 worker and 8 IOCP threads.

Maximums 

By default, the maximum number of Worker and IOCP threads is 20 per processor.

Max Formula: Processor Count * 20 =  Max Thread Pool Worker Threads = Max Thread Pool IOCP Threads

Example: If you have a server with 8 CPUs, the default max worker and IOCP threads will be 20 x 8 = 160.

Safety Switch 

The Thread Pool WILL NOT inject new threads when the CPU usage is above 80%. This is a safely mechanism to prevent overloading the CPU.

The Thread Pool In Action

As requests come into your web server, the Thread Pool will inject new worker or I/O completion threads when all the other threads are busy until it reaches the "Minimum" number for each type of thread.

After this "Minimum" has been reached, the Thread Pool will throttle the rate at which it injects new threads and will only add or remove 1 thread per 500ms / 2 threads per second, or as a thread has completed work and becomes free, whatever comes first.

Through its "hill climbing technique algorithm", it is self-tuning and will stop adding threads and remove them if they are not actually helping improve throughput. The thread injection will continue while there is still work to be done until the "Maximum" number for each thread type has been reached.

As the number of requests is reduced, the threads in the Thread Pool start timing out waiting for new work (if an existing thread stays idle for 15 seconds), and will eventually retire themselves until the pool shrinks back to the minimum.

"Bursty" Web Traffic, Thread Starvation and 503 Service Unavailable

Let's say you have your Sitecore site running on an untuned, single Content Delivery server that has 8 processors with the default Thread Pool thread settings. For the sake of the simple example, let's assume we have an under-powered web service (perhaps used for looking up customer information from a backend CRM system) that under heavy load takes 5 seconds to provide a response to a request. Our developers have not implemented asynchronous programming in this example, and use the HttpWebRequest class.

We start out with 8 warmed up and ready worker and IOCP threads in our Thread Pool.

Now, lets say we have burst of 100 visitors accessing different pages (pages that consume the web service) on our site at the same time. The Thread Pool will quickly assign the 8 threads to handle the first 8 requests that will be busy for the next 5 seconds, while the other 92 sit in a queue. As you can see, it will take many 500ms intervals to catch up with the workload. IIS will wait some time for the threads to get free, so that the requests in queue can be processed. If any thread gets free in the waiting time, then it will be used to process the request. Otherwise IIS will return a 503 Service Unavailable error message. Both the slow web service and the untuned Thread Pool will result in some unhappy visitors seeing the 503 error message.

Looking at this a bit closer, a call to a web service uses one worker thread to execute the code that sends the request and one IOCP thread to receive the callback from the web service. In our case, the Thread Pool is completely saturated with work, and so the callback can never get executed because the items that were queued in the thread pool were blocked.

This problem is called Thread Pool Starvation - we have a "hungry" queue waiting to be served threads from the pool to perform some work, but none are available.

This example is a good reason for using asynchronous programming. With async programming, threads aren’t blocked while requests are being handled, so the threads would be freed up almost immediately.

Optimizing Thread Settings 

The ability to tune / manage thread settings has been available in the .NET framework for ages - since v1.1 actually.

Arguably, the most important settings are the minWorkerThreads and minIOThreads where you can specific the minimum number of threads that are available to your application's Thread Pool out of the gate (overriding the default formula's based on processor count as described above).

Threads that are controlled by these settings can be created at a much faster rate (because they are spawned from the Thread Pool), than worker threads that are created from the CLR's default "thread-tuning" capabilities - 1 thread per 500ms / 2 threads per second when all available threads in the pool are busy.

These and other important thread settings can be set in either your server's machine configuration file (in the \WINDOWS\Microsoft.Net\Framework\vXXXX\CONFIG directory) or with the Thread Pool API.

Beware: Out-of-Process Session State and Redis Client  

Out-of-Process Session State

If you are using Out-of-Process Session State in your Sitecore environment, you need to tune your Thread Pool!

Each of your Sitecore Content Delivery instances are individually configured to query expired sessions from your session store. This mechanism will add a ton of additional request overhead to your CD instances, and if your Thread Pools aren't tuned to handle this, you will find yourself in a Thread Starvation situation.

For more background on how and why this happens, please check out Ivan Sharamok's great post: http://blog.sharamok.com/2018-04-07/prepare-cd-for-experience-data-collection

Redis Client

If you are running your Sitecore environments on Microsoft Azure, you will be using Redis for session management. Sitecore makes use of the StackExchange.Redis client within the platform. Even though the client is built for high performance, it get's finicky if your Thread Pool threads are all busy, the "minimum" has been reached and thread injection slows down. You will start seeing Redis service request timeouts.

It is important for you to go through a Thread Pool tuning exercise to ensure that you don't run into Thread Starvation issues.

The nice thing is that the client prints Thread Pool statistics to your logs with details about worker and IOCP threads, to help you with your tuning exercise.

For more details, follow this Microsoft Redis FAQ link: https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-faq#important-details-about-threadpool-growth

Self-adjusting Thread Settings 

Lucky for us on Sitecore 9 and above, there is a pipeline processor that allows the application to adjust thread limits dynamically based on real-time thread availability (using the Thread Pool API).

By default, every 500 milliseconds, the processor will keep adding 50 to the minWorkerThreads setting via the Thread Pool API until it determines that the minimum number of threads is adequate based on available threads.

In my next post, I intend to explore this processor in detail and provide information on it's self-tuning abilities.

Sitecore xDB - Optimizing Your xDB Index Rebuild For Speed

$
0
0
Having performed Sitecore xDB index rebuilds many times with large data sets, I wanted to share some key tips to ensure a successful and speedy rebuild.

My experience has been on Azure PaaS, with both Azure Search and SolrCloud, but these techniques can be applied to on-premise and IaaS as well.

Disable Your Active Index Worker

The first thing you want to do is disable the indexing of live data to reduce the load on your shard databases. This can be achieved by going to your active xConnect Search app service, navigating to Settings and WebJobs, and then clicking the stop button.

Please note that it is not recommended to disable your indexer for more than 5 days to avoid synchronization issues between your shard databases and the xDB index.

It is possible to increase this retention period, but before thinking about this, please review this article for more information regarding the change tracking mechanism:  https://doc.sitecore.com/developers/91/sitecore-experience-platform/en/change-tracking-retention.html

Change The Log Level

Before starting a large rebuild job, it's important to enable the proper logging in case you need to investigate any issues that may arise. By default, the log level is set to Warning. Change it to Information.

Navigate to the: App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\CoreServices\sc.Serilog.xml file, and change the MinimumLevel to Information: 
   <MinimumLevel>
      <Default>Information</Default>
    </MinimumLevel>

Optimize Your Indexer Batch Size

Don't be over eager with your indexer's batch size setting. This setting determines how many contacts or interactions are loaded per parallel stream during an index rebuild. This setting is found in the following location:
App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\SearchIndexer\sc.Xdb.Collection.IndexerSettings.xml file:
<BatchSize>1000</BatchSize>

I have had success reducing this size to 500, as it helps execution go faster and prevents you from hitting timeouts during your rebuild when your shard databases are under heavy load.

Optimize Your Databases

Optimize, optimize, optimize your shard databases!!!

If you are not already using the AzureSQLMaintenance Stored Procedure on your Sitecore databases, do it today!

This is critically important for not only your shard databases but also your other Sitecore databases like Core, Master and Web.  Marketing Automation and Reference databases also get hammered pretty hard, so make sure that this gets applied and run regularly on these.

Note that this maintenance is 100% necessary. As Grant Killian says: "The expectation that Azure SQL is a 'fully managed solution' is somewhat misleading, as rebuilding query stats and defragmentation are the user’s responsibility."

Sitecore recommends an approach like this: https://techcommunity.microsoft.com/t5/Azure-Database-Support-Blog/Automating-Azure-SQL-DB-index-and-statistics-maintenance-using/ba-p/368974

Schedule an Azure Automation “Runbook” to attend to this after hours for all Sitecore databases.

Run the Rebuild

After all this things have been completed, run the xDB rebuild as Sitecore's docs mentions.  In Kudu, go to site\wwwroot\App_data\jobs\continuous\IndexWorker and execute this command:

.\XConnectSearchIndexer.exe -rr

After this, you will see the magic start with docs in your inactive / rebuild index first be reset to 0, and then counts start gradually increasing.

Unfortunately, there is no way to see how far along you are. There is however a query that you can run against your Azure Search or Solr indexes to check the status.

Azure Search Query
$filter=id eq 'xdb-rebuild-status'&$select=id,rebuildstate

Solr Query
id:"xdb-rebuild-status"

Returned Index Rebuild Status:
Default = 0
RebuildRequested = 1
Starting = 2
RebuildingExistingData = 3
RebuildingIncomingChanges = 4
Finishing = 5
Finished = 6

Post Rebuild Tasks

After your rebuild has completed (I.E. you see a "6" status in your query), you can go ahead and start up the IndexWorker WebJob on your xConnect Search app service.

At this point, you have a successfully completed the rebuild, and new data will start flowing into your xDB index.

Sitecore xDB - Troubleshooting xDB Index Rebuilds on Azure

$
0
0
In my previous post, I shared some important tips to help ensure that if you are faced with an xDB index rebuild, you can get it done successfully and as quickly as possible.

I mentioned a lot of things in the post, but now, I want to mention common reasons where and why things can go wrong, and highlight the most critical items that impact the rebuild speed and stability.


Causes of Need To Rebuild xDB Index

Your xDB relies on your shard database's SQL Server change tracking feature in order to ensure that it stays in sync. This basically determines how long changes are stored in SQL. As mentioned in Sitecore's docs, the Retention Period setting is set to 5 days for each collection shard. 

So, why would 5-day old data not be indexed in time?
  • The Search Indexer is shut down for too long
  • Live indexing is stuck for too long
  • Live indexing falls too far behind

Causes of Indexing Being Stuck or Falling Behind, and Rebuild Failures

High Resource Utilization: Collection Shards 
99% of the time, this is due to high resource utilization on your shard databases. Basically, if you see your shard databases hitting above 80% DTUs, you will run into this problem.

High Resource Utilization: Azure Search or Solr
If you have a lot of data, you need to scale your Azure Search Service or Solr instance.  Sharding is the answer, and I will touch in this further down.

What to check?

If you are on Azure, make sure your xConnect Search Indexer WebJob is running.
Most importantly, check your xConnect Search Indexer logs for SQL timeouts. 

On Azure, the Webjob logs are found in this location: D:\local\Temp\jobs\continuous\IndexWorker\{randomjobname}\App_data\Logs"

Key Ingredients For Rebuild Indexing Speed and Stability

SQL Collection Shards

Database Health 

Maintaining the database indexes and statistics is critically important. As I mentioned in my previous post:  "Optimize, optimize, optimize your shard databases!!!" 

If you are preparing for a rebuild, make sure that you run the AzureSQLMaintenance Stored Procedure on all of your shard databases.

Database Size

The amount of data and the number of collection shards is directly related to resource utilization and rebuild speed and stability. 

Unfortunately, there is no supported way to "reshard" your databases after the fact. We are hoping this will be a feature that is added to a future Sitecore release.

xDB Search Index

Similarly to the collection shards, the amount of data and the number of shards is directly related to resource utilization on both Azure Search and Solr. 

Specifically on Solr, you will see high JVM heap utilization.

If your rebuilds are slowing down or failing, or even if search performance on your xDB index is deteriorating, it's most likely due to the amount of data in your index, the number of shards and distribution amongst nodes that you have set up.  

Search index sharding strategies can be pretty complex, and I might touch on in these in a later post.

Reduce Your Indexer Batch Size

Another item that I mentioned in my previous post. If you drop this down from 1000 to 500 and you are still having trouble, reduce it even further. 

I have dropped the batch size to 250 on large databases to reduce the chance of timeouts (default is 30 seconds) when the indexer is reading contacts and interactions from the collection shards.


Viewing all 81 articles
Browse latest View live