What is Data Skew in Salesforce?

Overview

Data skew in Salesforce refers to situations where a large number of records are related to a single record (or a small set of records) in a way that causes performance, locking, or sharing problems. It is an important operational concept for architects, admins, and developers because skew can create hotspots that degrade system behavior during record updates, bulk operations, or sharing recalculations.

Types of Data Skew

There are three common types of data skew in Salesforce:

1) Ownership Skew

Occurs when many records are owned by a single user (e.g., 100k+ Accounts or Cases owned by one queue/user). Bulk updates to those records can cause record-lock contention because Salesforce locks owner-related rows during updates.

2) Lookup Skew

Happens when many child records reference the same parent record via a lookup/master-detail field. Frequent updates to the child records or parent can trigger heavy sharing calculations and index hotspots.

3) Record Type / Picklist Skew (also called Sharing or Role Hierarchy Skew)

This is when many users or records fall into the same role or sharing grouping, causing large-scale sharing recalculations when changes occur to role or group membership, or when sharing rules need to be recalculated.

Symptoms of Data Skew

Common signs you have data skew include:

– Slow bulk data operations (Insert/Update/Delete) or frequent timeouts.

– Lock contention or “unable to obtain lock” errors during mass updates or integrations.

– Long running sharing recalculations or sudden spikes in CPU usage when a parent/owner changes.

Why Data Skew Causes Problems

Salesforce uses row-level locking and has shared resources for owner and parent relationships. When too many records consistently touch the same rows, it creates a hotspot. Sharing calculations, trigger logic, and index contention amplify the issue during bulk operations.

How to Detect Data Skew

Start by querying counts grouped by the suspect field. For example, to find ownership skew for Accounts:

SELECT OwnerId, COUNT(Id) FROM Account GROUP BY OwnerId ORDER BY COUNT(Id) DESC LIMIT 10

For lookup skew, group child records by the lookup field:

SELECT Parent__c, COUNT(Id) FROM Child__c GROUP BY Parent__c ORDER BY COUNT(Id) DESC LIMIT 10

Mitigation and Best Practices

– Distribute ownership: Avoid assigning huge record volumes to a single user. Use queues, multiple service users, or automated ownership assignment to spread load.

– Reconsider data model: If a parent has millions of children, consider archiving old data, splitting into multiple parent records, or using external objects / Big Objects for historical data.

– Use asynchronous processing: Use Batch Apex, Queueable, or Bulk API to perform large updates in smaller chunks and reduce lock contention.

– Minimize triggers & synchronous sharing calculations: Keep trigger logic efficient and avoid unnecessary DML that causes cascading locking. Where possible, defer heavy operations to async processes.

– Monitor and index: Ensure fields used for grouping are indexed and monitor org usage to proactively detect growing skew.

When to Involve Salesforce Support

If you consistently see locking errors or performance degradation tied to skew and your mitigation attempts aren’t enough, open a case with Salesforce Support. They can analyze internal locking patterns, recommend org-specific fixes, or help with very large data scenarios.

Summary

Data skew is an operational anti-pattern where excessive concentration of relationships (ownership, lookup, or sharing) causes locking, sharing, and performance issues. Detect skew by grouping and counting records, and mitigate it by redistributing ownership, redesigning the data model, using async processing, and minimizing synchronous sharing or trigger-heavy operations.

Archives

Categories

Meta