What is “Data Skew” in Salesforce?

Understanding Data Skew in Salesforce

Data skew in Salesforce refers to an imbalance in data distribution that causes performance problems, sharing recalculations, or record-locking issues. It typically occurs when a large number of records reference the same parent, owner, or lookup record — creating a hotspot that the platform must repeatedly evaluate or lock.

Why this matters (SEO keywords: data skew Salesforce, ownership skew, lookup skew)

Data skew can lead to failed deployments, slow queries, governor-limit errors, and frequent UNABLE_TO_LOCK_ROW errors during bulk operations. Understanding and preventing data skew is crucial for scalable Salesforce implementations.

Common types of Data Skew

1. Ownership Skew (Record Ownership Skew)
When many records are owned by a single user (or queue), sharing and lead/account/contact ownership operations can trigger sharing recalculations and lock contention.

2. Lookup/Parent-Child Skew
When many child records point to the same parent record (for example, many Contact records referencing the same Account), operations on the parent (or mass child operations) can cause performance bottlenecks.

3. Campaign Member Skew
A specific case where large numbers of campaign members associated with a single campaign cause slowdown during campaign-related operations.

Symptoms and errors

– Repeated UNABLE_TO_LOCK_ROW errors during bulk updates or imports.
– Long-running sharing recalculations or scheduled sharing jobs.
– Slow SOQL queries or timeouts when querying by the skewed key.
– Failed batch jobs or asynchronous operations that touch the hotspot records.

Quick detection queries

Use SOQL to find possible ownership skew (adjust threshold as needed):

SELECT OwnerId, COUNT(Id) cnt FROM Account GROUP BY OwnerId HAVING COUNT(Id) > 1000

For parent-child (lookup) skew, query child counts per parent:

SELECT AccountId, COUNT(Id) cnt FROM Contact GROUP BY AccountId HAVING COUNT(Id) > 5000

Example scenario (locking issue)

If you run bulk updates that set the OwnerId for many records to the same value in parallel batches, Salesforce may attempt to lock those owner rows, causing UNABLE_TO_LOCK_ROW errors:

for (Contact c : contacts) { c.OwnerId = '005xxxxxxxxxxxx'; // same owner for many records } update contacts; // May cause locking conflicts in concurrent batches

Mitigation strategies

– Distribute record ownership: avoid assigning extremely large record counts to a single user; use queues or split ownership across multiple users.
– Use asynchronous, serial processing: process records in a single-threaded or sequential batch to avoid concurrent locks.
– Reduce sharing recalculation: simplify sharing rules, prefer criteria-based sharing where possible, and limit triggers that cause ownership changes.
– Use skinny tables or indexed fields for heavy read operations where applicable (Salesforce managed feature).
– De-normalize or archive data: move historical or inactive child records to an external store or an archive org.
– Avoid mass ownership changes during business hours; schedule them during off-peak windows.
– Where appropriate, use automated ownership rotation or create multiple parent records to distribute child counts.

Best practices

– Monitor counts regularly using scheduled reports or tooling queries.
– Establish thresholds and alerts (e.g., if any OwnerId has > X records).
– Design data model to avoid very high fan-in to a single parent or owner.
– Test bulk operations in a full sandbox to observe locking and sharing behavior before production runs.

Conclusion

Data skew is a common cause of performance and locking issues in Salesforce. Proactively identifying hotspots and applying the mitigations above will keep your org scalable and reduce unexpected errors during bulk processing.