Handling Large Salesforce File Downloads in Integrations
When developing Salesforce integrations, especially for offline-first mobile applications, downloading large files presents a significant challenge. Standard "one big GET" requests are prone to failure on unstable networks, leading to lost progress and user frustration. This article outlines strategies for implementing robust, chunked, and resumable download mechanisms for Salesforce Files.
Salesforce Files, managed via ContentDocument and ContentVersion objects, can store files up to 2 GB. This capability necessitates a download strategy that accounts for unreliable network conditions common on mobile devices, such as hotel Wi-Fi or roaming cellular data.
Two Primary Download Methods
Salesforce offers two main approaches for retrieving file content:
- Connect Files API: This REST API provides access to Files through endpoints like
/connect/files/.... It typically involves fetching file metadata first, then making a separate call to retrieve the binary content. While suitable for backend integrations on stable networks, its default streaming pattern makes implementing chunking and resumability for mobile apps complex. - Shepherd Servlet Download Endpoint: The
/sfc/servlet.shepherd/version/download/{ContentVersionId}or/sfc/servlet.shepherd/document/download/{ContentDocumentId}endpoints offer a more mobile-friendly approach. Crucially, this endpoint supports the HTTPRangeheader, which is essential for building chunked downloaders.
Alternatively, the ContentVersion/{Id}/VersionData blob-retrieve URL can also be used as the byte endpoint, requiring similar client-side logic for chunking and resumption.
Implementing a Chunked Download Strategy
The core of a resilient download mechanism lies in breaking large files into smaller, manageable chunks and implementing a resumable process.
Step 1: Prepare File Context
Before initiating a download, gather essential metadata into a compact file context object. This object should include:
contentDocumentIdandcontentVersionId: Identifiers for the file.totalSize: The total size of the file in bytes.checksum: An MD5 or similar hash for final file validation.fileName/extension: For local file naming.remoteUrl: The URL to download from (e.g., Shepherd Servlet orVersionDataendpoint).downloadingURL: The local temporary path for the downloaded file.startByte: The byte offset to resume from (0 for a new download).chunkSize: The size of each chunk (e.g., 2–10 MB).
Step 2: Download in Chunks with HTTP Range
To download in chunks, utilize the HTTP Range header. For each chunk, send a GET request to the remoteUrl with an Authorization header and a Range header specifying the byte range (e.g., bytes=offset-end).
offset = context.startByte // 0 for new download, >0 if resuming
chunkSize = context.chunkSize
totalSize = context.totalSize
open file at context.downloadingURL in append mode
seek(file, offset)
while offset < totalSize:
end = min(offset + chunkSize - 1, totalSize - 1)
response = HTTP GET context.remoteUrl with headers:
- Authorization: Bearer <token>
- Range: "bytes=offset-end"
if response is not successful (timeout, 5xx, 429, etc):
// Network problem: keep the partial file
// and remember how far we got
context.startByte = offset
save context (status = "paused")
return "resumable error"
data = response.body
write data to file
offset += data.length
context.startByte = offset
// Persist progress so we can resume from here next time
save context (status = "inProgress")
report progress = offset / totalSize
close file
After each successful chunk, persist the startByte to allow resumption. The checksum calculation is deferred until the entire file is downloaded.
Step 3: Resume After Network Failures
Assume network failures will occur. Instead of discarding partial downloads, treat most failures as recoverable. Persist the startByte and mark the context as paused or failed_resumable. When resuming, load the FileDownloadContext, and if startByte > 0 and the partial file exists, open it in append mode and seek to startByte to continue the download.
This approach ensures that progress is not lost due to network interruptions or application backgrounding.
Step 4: Validate the File With MD5
Once the download loop completes ( startByte equals totalSize ), validate the integrity of the downloaded file using its MD5 checksum.
if context.startByte != context.totalSize:
return "download not complete"
open file at context.downloadingURL for read
init MD5 calculator
while there is data to read:
chunk = readNextBlock(file)
if chunk is empty:
break
update MD5 with chunk
computed = finalize MD5
if computed == context.checksum:
mark context as "completed"
return success(context.downloadingURL)
else:
delete file at context.downloadingURL
delete context entry
return "checksum mismatch"
If the checksums match, mark the file as completed and move it to its final location. If they do not match, delete the corrupted file and its context to trigger a fresh download on the next attempt.
Connect Files vs. Servlet: Optimal Use Cases
- Connect Files API: Best suited for backend services and integrations on stable networks where complex resume behavior is not a primary concern. It offers a higher-level REST interface.
- Servlet (or
VersionDatablob) with HTTP Range: Ideal for offline-first mobile applications where robust resuming capabilities and reliable performance on unreliable networks are critical. This approach requires managing metadata like size and checksum at the client level.
In many comprehensive solutions, a combination of both might be employed, leveraging the Connect Files API for simpler server-side operations and the Shepherd Servlet or VersionData endpoint with a custom FilesDownloader for mobile scenarios.
Key Takeaways
- Large file downloads on unreliable networks require chunking and resumability.
- The Shepherd Servlet or
ContentVersion.VersionDataendpoints, supporting HTTPRange, are ideal for building mobile downloaders. - Maintain a
FileDownloadContextto track progress (startByte) and essential file metadata. - Treat network failures as recoverable, persisting progress to allow seamless resumption.
- Validate downloaded files using checksums (e.g., MD5) after completion to ensure data integrity.
Leave a Comment