Skip to main content
SFDC Developers
Apex

Beating 6MB Apex Heap Limit: Native Document Generation

Vinay Vernekar · · 5 min read

Beating the 6MB Apex Heap Limit: Native Document Generation and E-Signatures

The synchronous Apex heap limit (6MB) and asynchronous limit (12MB) are significant bottlenecks when attempting high-volume, server-side document generation that includes binary assets like images. Conventional wisdom suggests that loading base64-encoded images or decompressing large DOCX templates immediately consumes the heap, necessitating external middleware or paid AppExchange solutions.

This article details an open-source, 100% native Salesforce architecture that successfully generates complex documents, incorporating numerous high-resolution images and iterating over hundreds of child records, all while adhering strictly to Apex governor limits.

The Heap Consumption Problem in Native DOCX Generation

The primary heap consumption vectors when using standard Apex document generation techniques are:

  1. Template Decompression: DOCX files are ZIP archives. Decompressing and manipulating the internal document.xml consumes several megabytes for even moderately sized templates.
  2. Image Loading: Storing an image blob (even moderately sized ones) in memory, often as a base64 string, results in a significant heap cost (e.g., a 1MB image costing ~1.3MB of heap).

By eliminating ZIP manipulation at runtime and ensuring images never enter the Apex heap, these constraints are overcome.

Architectural Strategy for Heap Reduction

The successful implementation relies on two critical shifts in the processing pipeline: pre-processing templates and leveraging native URL resolution for asset embedding.

1. Pre-Decomposed Template Storage

Instead of performing ZIP decompression and manipulation synchronously during document generation, we shift this heavy lifting to an administrative setup phase.

  • Template Ingestion: When an admin saves a DOCX template version, the system immediately extracts all constituent parts (e.g., document.xml, relationship files, headers, footers).
  • ContentVersion Storage: Each extracted XML component and embedded image asset is stored as a separate ContentVersion record.
  • Generation Time: At runtime, Apex only loads the raw, string-based XML components. Merging data (string substitution, conditional logic execution) is performed entirely on these strings, drastically reducing the initial template processing overhead—estimated at a 75% heap reduction for this step alone.

2. Zero-Heap Image Embedding via Blob.toPdf()

The key innovation for images revolves around deferring the asset loading until the final PDF rendering step, which is managed internally by Salesforce via the Blob.toPdf() method (requires the Spring '26 Release Update).

When an image merge tag is encountered in the template (e.g., {%Description:600x400}), instead of loading the ContentVersion blob:

  1. Only the ContentVersionId and file extension are queried—minimal heap impact.
  2. A relative URL is constructed for embedding into the target HTML structure:
    /sfc/servlet.shepherd/version/download/{contentVersionId}
    
  3. This relative URL is placed within an <img> tag in the document's generated HTML.
  4. When Blob.toPdf() executes, the native PDF engine resolves this internal Salesforce URL on the server side and embeds the image data directly into the final PDF stream, never allocating heap memory for the image blob in Apex.

Dual-URL Strategy for Guest User Previews

A challenge arises when generating public previews for e-signature flows, as guest users lack session context to resolve relative URLs.

  • PDF Generation (Server-Side): Uses the relative /sfc/servlet.shepherd/version/download/{cvId} URL.
  • Browser Preview (Client-Side): Before dispatching the signature request, a public, unauthenticated URL is generated using ContentDistribution:
    ContentDistribution cd = new ContentDistribution();
    cd.ContentVersionId = cvId;
    // ... set appropriate sharing/expiration settings
    insert cd;
    // Use cd.ContentDownloadUrl for the public preview HTML
    
    The guest user's browser resolves these absolute public URLs directly, viewing the fully rendered document without needing Salesforce authentication.

3. Template-Based E-Signature Stamping

The older method involved generating a DOCX, sending it, then re-processing the DOCX to stamp signatures, leading to multiple costly decompression/recompression cycles.

The optimized flow removes the DOCX entirely from the signing path:

  1. The system prepares the pre-decomposed XML parts.
  2. Signatures are applied by inserting raw DrawingML XML fragments directly into the template XML string using simple string replacement (String.replace()).
  3. A single Queueable job loads the signature-stamped XML, generates the final PDF via Blob.toPdf() (which handles the URL-resolved images), completing the entire process in one asynchronous, low-heap transaction.

4. Addressing Content Sharing Model Limitations

Guest users and the Automated Process User cannot query ContentVersion records due to the Salesforce content sharing model, even with WITH SYSTEM_MODE.

The workaround is upfront computation:

  • Cache the required public ContentDistribution URLs in a field on the primary request record (e.g., a JSON blob).
  • The signing Visualforce page queries the request record, retrieving the pre-computed JSON map, thus bypassing direct Content object queries during the guest user flow.

Stress Test Results

Testing involved a template generating an Account header, 500 Contact rows (20 containing unique 1.3MB images), Opportunity data, and two signature placeholders.

  • Total Image Data: ~27MB.
  • Output: Fully signed PDF with all images and audit trail.
  • Result: Successful generation entirely within synchronous governor limits due to heap avoidance strategies.

The Stack Summary

  • Apex: Core merging logic, XML manipulation, signature stamping, and ZIP writing utilities (for template setup).
  • Blob.toPdf(): Essential for server-side PDF rendering using referenced assets.
  • LWC: Management interface for template configuration and initiation.
  • Visualforce: Utilized for the public, unauthenticated signing endpoint.
  • Platform Events/Queueable: Orchestration layer for asynchronous, high-volume generation tasks.
  • Pure JS: Used only for client-side handling of initial DOCX structure assembly during template creation.

Key Takeaways

  • Decouple Template Parsing: Never decompress and manipulate ZIPs at runtime; pre-decompose DOCX structures into their constituent XML parts stored as ContentVersion records.
  • Leverage Blob.toPdf() for Images: Embed images via relative /sfc/servlet.shepherd/version/download/{cvId} URLs in the HTML payload, ensuring the PDF rendering engine handles the blob loading off the Apex heap.
  • String Operations over Binary Operations: Apply merges and signature stamping as pure XML string manipulations (e.g., DrawingML insertion) to avoid multiple resource-intensive ZIP cycles.
  • Pre-calculate Guest Access Assets: Proactively generate and cache public ContentDistribution URLs to satisfy content sharing restrictions during unauthenticated signature flows.

Share this article

Vinay Vernekar

Vinay Vernekar

Salesforce Developer & Founder

Vinay is a seasoned Salesforce developer with over a decade of experience building enterprise solutions on the Salesforce platform. He founded SFDCDevelopers.com to share practical tutorials, best practices, and career guidance with the global Salesforce community.

Comments

Loading comments...

Leave a Comment

Trending Now