What is Data Skew in Salesforce?

Understanding Data Skew in Salesforce

Data skew in Salesforce refers to an imbalance in the distribution of data that causes performance degradation, unexpected record locking, or sharing calculation overhead. When too many records are related to a single user or parent record (or share the same value on a key field), common platform operations such as updates, sharing recalculations, or batch processing can slow down or fail.

Common types of data skew

There are three common forms of data skew you should watch for:

1. Ownership skew

Occurs when a single user owns a very large number of records (for example, one user owning 100k+ Account or Case records). This leads to locking and contention when updating records owned by that user (sharing recalculation, updates to the owner field, or rollups).

2. Lookup (parent) skew

Happens when many child records reference the same parent record (e.g., tens or hundreds of thousands of Contact records pointing to one Account). Operations that touch the parent or many children can cause record locking and slow queries.

3. Sharing or ownership-group skew (role/group skew)

When sharing rules or sharing computations affect a disproportionately large number of records (for example, a public group or role associated with a massive record set), recalculating sharing can be expensive and slow.

Why data skew matters

Consequences include:

  • Record locking and update failures (e.g., SYSTEM.LIMIT_EXCEPTION, ROW_LOCKED errors)
  • Slow DML operations and long transaction times
  • Non-selective queries and poor SOQL performance
  • Expensive sharing recalculations and apex timeouts

How to identify data skew

Use these approaches:

  • Run reports or SOQL to count records per owner or per parent. Example SOQL to find top owners:
    SELECT OwnerId, COUNT(Id) c FROM Case GROUP BY OwnerId ORDER BY c DESC LIMIT 50
  • Use Admin tools: Salesforce reports, Workbench, or Developer Console to analyze record distribution
  • Monitor Apex exceptions and locking errors in debug logs or the Apex Jobs queue
  • Watch sharing recalculation jobs and the time they take

Practical ways to prevent and mitigate data skew

1. Distribute ownership

– Avoid assigning a huge volume of records to a single user. Use queues, multiple owners, or programmatic ownership rotation.

2. Reparent or shard large parent records

– Split child records across multiple parent records instead of pointing many children to a single parent. For example, create multiple Account records for large customers and distribute Contacts among them.

3. Batch updates and asynchronous processing

– Use smaller batch sizes in Batch Apex (e.g., 200 or fewer), Queueable Apex, or Platform Events to perform large updates with reduced lock contention.

4. Use non-owner fields for heavy processing

– Where possible, avoid DML that updates OwnerId. If you must, do it in carefully controlled batches and consider using Database.update(records, false) to continue on errors.

5. Reduce sharing recalculations

– Minimize complex sharing rules and avoid frequent changes to ownership or to fields used by sharing rules. For large data changes, schedule sharing recalculations during off-peak windows.

6. Index and design selective queries

– Ensure SOQL queries remain selective. Create custom indexes on fields used in filters. Avoid queries that force full table scans on huge datasets.

7. Consider architecture changes

– Use external objects, custom lookup strategies, or middleware to handle extremely large datasets. In some cases, denormalization or summary objects (reports, rollup snapshots) can offload load from transactional objects.

Sample Apex pattern for safer reparenting (bulk, small batches)

public class ReparentBatch implements Database.Batchable{
public Database.QueryLocator start(Database.BatchableContext bc){
return Database.getQueryLocator([SELECT Id, AccountId FROM Contact WHERE AccountId = :oldParentId]);
}
public void execute(Database.BatchableContext bc, List scope){
for(Contact c : scope){
c.AccountId = newParentId; // use logic to distribute among multiple parents
}
// Update in small chunks reduces lock contention
Database.update(scope, false);
}
public void finish(Database.BatchableContext bc){}
}

Key best practices (summary)

  • Proactively monitor owner and parent distributions
  • Avoid extreme ownership or parent concentration
  • Process large changes asynchronously and in small batches
  • Keep sharing model as simple as possible for high-volume objects
  • Test large-scale operations in a sandbox that mirrors production volumes

Understanding and preventing data skew is critical for maintaining performance and stability in high-volume Salesforce orgs. By monitoring distributions, using proper batching, and designing an ownership/parenting strategy that avoids hot spots, you can reduce locking and sharing issues and keep your org responsive.