SAP SUCCESSFACTORS CLOUD ARCHIVE

    SAP SuccessFactors Cloud Archive — Parquet, Tiered, Queryable

    The sap successfactors cloud archive product: Parquet on your own object storage (S3 / Azure Blob / GCS / OCI), partitioned by legal employer and fiscal year, queryable via Athena / Synapse / BigQuery / Snowflake External Tables, hash-signed manifests, tiered storage hot/warm/cold. Replaces the SF subscription as the long-term home for historical HR data.

    50–300 GB
    Typical 20k-employee archive
    $thousands/yr
    Total cloud storage cost
    Open Parquet
    20-year format longevity
    Hash-signed
    Audit-grade provenance

    What a sap successfactors cloud archive actually is

    Not a database backup. Not a flat-file dump. A queryable, governed, hash-signed Parquet archive in your own cloud account that replaces the SF tenant as the long-term home for historical HR data.

    SuccessFactors holds a decade of effective-dated worker history, performance forms, recruiting records, learning history and Foundation Object metadata behind a multi-module SaaS subscription that costs $30–60 PEPM combined for full HXM scope. For active HR ops, that subscription is the point. For data that is no longer being transacted against — ex-employees beyond active retention, closed job requisitions, expired learning certifications, pre-migration performance forms — paying full subscription rates to keep the bytes accessible is increasingly indefensible.

    The Syntra sap successfactors cloud archive is the answer. Full extract via OData v2/v4 and Compound Employee API into Parquet on the customer's cloud object storage, partitioned by legal employer and effective fiscal year, with hash-signed manifests at file and batch level for forensic-grade audit. Tiered storage rotates partitions hot/warm/cold automatically — recent 12 months on hot storage for sub-second queries, 1–7 years on warm storage for SOX and works-council audits, 7+ years on cold storage for pension and statutory long-tail.

    Queries hit the Parquet directly through Athena, Synapse Serverless, BigQuery External Tables, Snowflake External Tables or Oracle ADW external tables. No re-load. No shadow warehouse. No vendor lock-in — Parquet is an open format that any SQL engine will read for the next 20+ years. The SF subscription ends. The data, and the answers, stay live.

    What the cloud archive holds

    1
    Worker effective-dated history
    PerPerson, PerEmployment, EmpJob, EmpCompensation with every per-change version row preserved — typically 5–8M rows for a 20k-employee 10-year history.
    2
    Talent records
    FormHeader, FormReview, goal libraries, comp plans, succession pools — full audit context including approvals, comments and amendments.
    3
    Recruiting & learning history
    JobReq, Application, candidate profiles, offer letters, learning completion records, certifications, curricula — full pre-migration footprint.
    4
    Foundation Objects & MDF
    FOLocation, FOCompany, FODepartment, FOPayGrade, FOCostCenter snapshots with effective dates; full MDF custom object data with original schema.

    What makes the sap successfactors cloud archive different from raw exports

    OData JSON dumps to a file share are not an archive — they're write-only history. These six capabilities are what turn extracts into a useful archive.

    📊

    Parquet columnar format

    Columnar compression and predicate pushdown let queries scan billions of rows in seconds for a few cents. JSON/CSV exports are write-only and require re-load to query.

    📁

    Partitioned by LE + fiscal year

    Queries for one legal employer in one fiscal year scan only the relevant partition files — minimal data scanned, minimal cost, sub-second response.

    🔏

    Hash-signed manifests

    Every file SHA-256 hashed, every batch manifest signed with extraction timestamp + OAuth token + RBP context + source row count. Forensic-grade provenance for audit.

    🌡️

    Tiered storage rotation

    Hot (last 12 months, ~$0.023/GB/mo) / Warm (1–7 years, ~$0.0125/GB/mo) / Cold (7+, ~$0.004/GB/mo) — automatic lifecycle policy, total cost in low thousands per year.

    🔍

    External-table query model

    Registered as external tables in Athena/Synapse/BigQuery/Snowflake/ADW. Native SQL access without re-load. Same SQL works across query engines as you swap clouds.

    🔐

    RBP-equivalent access control

    SF RBP captured at extraction, enforced at query layer via cloud-native IAM + column masking + row-level security. Same access posture, no SF subscription.

    Standing up a sap successfactors cloud archive — the deployment

    A six-step deployment that ends with the SF subscription archived, the data queryable, and the cost down by orders of magnitude.

    1

    Cloud target selection — Days 1–3

    Customer picks cloud (AWS / Azure / GCP / OCI) and region (matched to SF data center, EU residency requirements, existing data-lake footprint). IAM, encryption keys (KMS / Key Vault / Cloud KMS / OCI Vault), object-storage buckets provisioned.

    2

    Full SF extraction — Days 3–10

    Syntra ETL pulls full SuccessFactors footprint via OData v2/v4 and Compound Employee API. Parquet output partitioned by legal employer and effective fiscal year, hash-signed at file and manifest level. 50–300 GB typical for 20k-employee 10-year history.

    3

    External-table registration — Days 10–12

    Athena/Synapse Serverless/BigQuery External Tables/Snowflake External Tables/Oracle ADW external tables registered against Parquet. Logical entity views (vWorker, vAssignment, vSalary, vForm) created for analyst-friendly access.

    4

    RBP & access-control mapping — Days 12–15

    SF RBP roles inventoried and converted to cloud-native IAM + column masking + row-level security policies. Consumer-portal access (ex-employee, HR audit, works council, GDPR DSAR) scoped against the model.

    5

    Tiered storage lifecycle — Days 15–16

    S3/Azure/GCS/OCI lifecycle policies configured: hot for first 12 months, warm 1–7 years, cold 7+. Cost target validated against retention horizon.

    6

    Parallel-run & cutover — Weeks 4–8

    Both SF tenant and archive live in parallel for 2–4 weeks. Sample requests routed to archive. HR ops sign-off. SF tenant terminated; archive becomes sole long-term home for SF historical data.

    The query engines that read the SuccessFactors cloud archive

    Open Parquet format means any SQL engine reads it. Pick your cloud, pick your engine, pick your tool — no lock-in.

    ☁️

    AWS Athena

    Serverless SQL against Parquet on S3, $5 per TB scanned, integrates with Lake Formation for column-level security and row-level filters. Most cost-effective for ad-hoc queries.

    🔷

    Azure Synapse Serverless

    Serverless SQL against Parquet on Azure Blob / ADLS Gen2, T-SQL syntax, integrates with Purview for governance. Native fit for Microsoft-centric estates.

    🌈

    BigQuery External Tables

    External tables over Parquet on GCS, standard SQL, integrates with IAM and column-level security. Fast for analytic workloads with BI Engine.

    ❄️

    Snowflake External Tables

    External tables over Parquet on any cloud, Snowpark integration, native masking policies and row-access policies for RBP-equivalent enforcement.

    🔶

    Oracle ADW External Tables

    External tables over Parquet on OCI Object Storage, native fit for Fusion HCM Analytics co-existence, low-cost serverless query against the archive.

    📊

    Power BI / Tableau / Looker

    All connect to the above engines as standard data sources. HR analytics dashboards on archive data without re-engineering.

    Frequently asked questions

    What is a SAP SuccessFactors cloud archive?+

    A sap successfactors cloud archive is a Parquet-on-object-storage archive of your SuccessFactors HXM data — PerPerson, PerEmployment, EmpJob, EmpCompensation, FormHeader, JobReq, learning history, MDF custom objects, Foundation Objects — held in your own cloud account (AWS S3 / Azure Blob / GCS / OCI Object Storage) with tiered storage (hot/warm/cold), queryable directly via Athena / Synapse Serverless / BigQuery External Tables / Snowflake External Tables, with hash-signed manifests for audit. It replaces the cost and operational burden of keeping the SuccessFactors tenant subscription live solely to hold historical HR data while satisfying every downstream need — ex-employee lookups, works-council audits, GDPR DSARs, SOX HR-control evidence, pension calculations.

    Why move SuccessFactors data to a cloud archive instead of keeping the tenant live?+

    Cost, control and longevity. Cost: a 20,000-employee SF tenant subscription across EC + Performance + Comp + Recruiting + Learning typically runs $7–14M/year at $30–60 PEPM combined; keeping it live just for historical access burns that budget on data that nobody is actively transacting against. A sap successfactors cloud archive holding the same data on tiered object storage costs single-digit-thousands per year for the same 20k-employee footprint. Control: the archive sits in your own cloud account, in your chosen region, with your IAM and encryption — not in SAP's multi-tenant control plane. Longevity: the archive is in open Parquet format that any SQL engine can read for the next 20+ years, independent of SAP's roadmap, SF's bi-annual upgrade cycle and any future SAP HXM rebranding.

    How is a SuccessFactors cloud archive different from raw OData exports to flat files?+

    Three things. (1) Format and queryability — Parquet with columnar compression and predicate pushdown lets SQL engines scan billions of rows in seconds for a few cents; flat-file JSON or CSV exports are essentially write-only and require a re-load to anywhere queryable. (2) Schema governance — the archive carries the SF logical entity model (Worker, Assignment, Salary, Form, JobReq) with type-stable columns evolved as SF entities change across the bi-annual upgrade cycle; raw OData exports embed every entity-version change in the file structure, so a 2018 export and a 2026 export have incompatible shape. (3) Provenance and audit — the archive is hash-signed at file and manifest level with extraction timestamp, OAuth token, RBP context and source-row count; flat-file exports have none of that and fail any forensic-grade audit.

    What's the typical storage profile of a sap successfactors cloud archive?+

    For a 20,000-employee tenant with 10 years of effective-dated history, the archive typically sits in the 50–300 GB range across all entities — modest by cloud-storage standards. PerPerson + PerEmployment + EmpJob + EmpCompensation effective-dated history is the bulk (often 5–8M version rows), followed by FormHeader (every performance form ever issued), then JobReq + Application (recruiting history), then learning history (completion records). MDF custom objects vary wildly — some tenants have very rich MDFs that dominate, others have almost none. Tiered storage strategy: hot (S3 Standard / Hot Blob / GCS Standard) for the last 12 months at ~$0.023/GB/month, warm (S3 Standard-IA / Cool Blob / GCS Nearline) for 1–7 years at ~$0.0125/GB/month, cold (S3 Glacier / Archive Blob / GCS Archive) for 7+ years at ~$0.004/GB/month. Total annual storage cost for 20k-employee 10-year archive: single-digit-thousands of dollars.

    How does the cloud archive get refreshed from SuccessFactors?+

    Two patterns. Pattern one (post-migration archive) — single full extract at SF tenant decommissioning time, then the archive is read-only forever. Pattern two (ongoing co-existence archive) — initial full extract, then daily or near-real-time incremental loads via Syntra ETL's watermark-based OData modified-since extractors while the SF tenant remains live for some active use. Both patterns produce the same hash-signed Parquet output. The choice depends on whether the SF tenant is being fully retired (pattern one) or kept live for a long-tail of active functionality with the archive providing analytics and long-term storage (pattern two). Most SF cloud archive deployments end up on pattern one within 12–24 months of full Fusion HCM cutover.

    Can we query the SuccessFactors cloud archive directly without re-loading anywhere?+

    Yes — that is the central design principle. The archive is in open Parquet format, partitioned by legal employer and effective fiscal year, and registered as external tables in your query engine of choice: Athena (AWS), Synapse Serverless (Azure), BigQuery External Tables (GCP), Snowflake External Tables (any cloud), or Oracle ADW Object Storage external tables (OCI). Queries hit the Parquet directly with predicate pushdown and column projection — typical 'show me a worker's full job history as of 14 March 2019' query scans a few MB of one date-partition file and returns in under a second for a few cents of compute. No re-loading to a warehouse, no separate ETL pipeline, no shadow copies of the archive.

    How does the cloud archive handle SuccessFactors RBP and access control?+

    RBP (Role-Based Permissions — SF's permission roles + permission groups model) is captured at extraction time and converted to a logical access-control model attached to the archive. When archive queries are issued through the consumer portals (ex-employee self-service, HR audit, works council, GDPR DSAR), the same RBP-equivalent filtering is enforced at the query layer using cloud-native IAM, column-level masking (Snowflake masking policies, BigQuery column-level security, Athena Lake Formation), and row-level security. The result: the same access-control posture you had in the live SF tenant, enforced against the archive, without keeping the SF tenant subscription active. Every query is logged for GDPR Article 30 RoPA and SOX audit trail.

    Is a sap successfactors cloud archive compliant with EU GDPR and German Betriebsverfassungsgesetz?+

    Yes, with the right deployment posture. GDPR compliance requires: data minimization (the archive holds only data with documented retention basis — ex-employees beyond retention purged on schedule), right of access (DSAR responder UI indexed by national identifier returns every record in minutes), right to erasure (forget-me workflow removes subject records from Parquet using copy-on-write delta partitions while preserving the audit trail), processing record (every access logged for Article 30 RoPA). German Betriebsverfassungsgesetz (works council law) compliance adds: works-council representative access portal, statutory headcount filings, gender-pay-gap historical analysis, 10+ year retention for some records. The archive is typically deployed in EU-region object storage (S3 eu-central-1, Azure Germany West Central, GCS europe-west3, OCI Frankfurt) so data never leaves the EU.

    Plan your sap successfactors cloud archive deployment

    Book a 30-minute discovery call. We'll size your archive based on your SF entity profile, recommend cloud target and region for residency requirements, and walk through the cost model versus keeping the live SuccessFactors subscription.