Scope & Threat Modeling Approach
This threat model analyzes the AltScore platform using the STRIDE framework — a structured method for enumerating threats by category. Each identified threat is assessed for likelihood and impact, and paired with mitigations. The goal is a live, actionable document — not a compliance artifact.
A threat model is only useful if it is honest about what is scary. This document does not soften threats or over-claim mitigations. Where a genuine residual risk exists that we have not fully mitigated, it is declared — so it can be monitored and prioritized.
THREAT MODELING PRINCIPLE — ALTSCORESTRIDE Category Key
System Boundary & Trust Zones
| Zone | Components | Trust Level | Attack Surface |
|---|---|---|---|
| External (Untrusted) | Internet, lender systems, distributor on-premise networks, WhatsApp | Zero trust — all traffic verified | API gateway, ERP connector ingestion endpoint, WhatsApp consent webhook |
| DMZ | API gateway, WAF, load balancer, CDN, consent webhook receiver | Low trust — input validation enforced | All internet-facing endpoints; highest exposure |
| Application Zone | Score API service, consent service, feature pipeline coordinator, dashboard backend | Medium trust — authenticated service-to-service calls | Internal API calls; service compromise would need to pivot here |
| Data Zone | Bronze data lake, Silver feature store, Gold score store, identity resolution store, consent DB | High trust — access requires elevated privileges + audit | Most valuable target — direct access most impactful |
| ML Zone | Training cluster, model registry, MLflow, Isolation Forest service | High trust — isolated from application zone | Model poisoning, artifact tampering |
Assets at Risk
Before enumerating threats, we identify what an attacker would want. This drives prioritization — threats targeting Crown Jewel assets receive the most mitigation investment.
| Asset | Classification | Value to Attacker | Business Impact if Compromised | Crown Jewel |
|---|---|---|---|---|
| Retailer PII (GST, phone, address) | Class 1 — Highly Sensitive | Identity theft; regulatory extortion; sold on dark web | DPDP breach; regulatory fine up to ₹250Cr; retailer trust destruction; product shutdown risk | ✦ Yes |
| AltScore model weights + feature logic | Trade Secret | Competitive reverse engineering; gaming the model systematically; disclosure to competitor | Score gaming at scale destroys model validity; competitive moat lost; lender trust collapse | ✦ Yes |
| AltScore values + PD estimates per retailer | Class 2 — Sensitive Business | Credit market manipulation; blackmail of retailers; competitive intelligence for lenders | DPDP breach; lender trust breach; retailer harm (if scores disclosed improperly) | ✦ Yes |
| Distributor ERP transaction data | Class 2 — Sensitive Business | Commercial intelligence on distributor's retailer network; competitor analysis; extortion | Distributor trust destruction; DPA breach; product shutdown; legal liability | ✦ Yes |
| Consent records | Class 4 — Audit | Proof destruction; regulatory cover-up; enable unauthorized processing claims | Regulatory exposure; inability to prove compliance; DPDP enforcement | Partial |
| NBFC lender portfolio data | Class 2 — Sensitive Business | Financial intelligence; portfolio strategy leakage; cross-lender exposure | Lender trust breach; commercial damage; potential NBFC regulatory consequences | Partial |
| Score API availability | Operational | Ransom; competitive disruption | Lender unable to underwrite; SLA breach; revenue loss; lender churn | No — recoverable |
ERP Connector Threats
The ERP Connector SDK is deployed inside distributor environments — a context outside AltScore's direct control. This creates a unique threat surface: the connector could be compromised by a threat actor with access to the distributor's systems, or the connector itself could be weaponized against AltScore's ingestion infrastructure.
An attacker creates a fake ERP connector that generates synthetic or manipulated transaction data. If accepted, fabricated retailer payment histories flow into the Bronze layer and corrupt AltScore credit profiles. At scale, this could be used to inflate scores for retail borrowers controlled by the attacker, enabling fraudulent loan approvals.
- mTLS client certificates required on all connector connections; certificates are distributor-specific and issued by AltScore CA
- Payload HMAC-SHA256 signature verified at ingestion endpoint; unsigned payloads rejected
- Isolation Forest anomaly detection flags statistically improbable synthetic payment patterns
- Per-distributor data hashing and chain-of-custody: each batch references prior batch hash
- Manual audit of any distributor with sudden dramatic score distribution shift across their retailer network
A man-in-the-middle attacker intercepts the connector's outbound data upload and modifies invoice amounts, payment dates, or return rates before they reach AltScore's ingestion endpoint. Modified data flows into scoring pipeline, generating incorrect credit profiles.
- TLS 1.3 with certificate pinning prevents MITM interception
- HMAC signature covers entire payload; any byte modification invalidates signature
- Static IP allowlisting reduces MITM opportunity surface
- Ingestion endpoint validates signature before writing any data to Bronze
A distributor employee (or the distributor themselves) deliberately modifies ERP records before the connector sync — altering a retailer's payment history to inflate their score (in exchange for a bribe from the retailer) or to damage a competitor retailer's creditworthiness. Unlike a technical exploit, this threat originates from a trusted data source.
- Multi-distributor cross-validation: if a retailer has data from 2+ distributors, both must agree on payment behavior
- Isolation Forest detects anomalous patterns in payment recording that differ from the distributor's broader network behavior
- Statistical consistency checks: a distributor where 80% of retailers suddenly improve payment behavior triggers a review
- Contractual deterrent: data partnership agreement includes a fraud and manipulation clause with financial penalties and termination rights
- ERP audit log comparison: connector reads directly from ERP; ERP's own audit log should reflect the changes — if connector data doesn't match ERP audit trail, alert raised
An attacker compromises the AltScore SDK distribution pipeline and ships a malicious SDK version to distributors. The malicious SDK extracts broader data from the ERP than the declared scope (e.g., bank account numbers, personal guarantor data), exfiltrates it, or plants a backdoor that allows the attacker to inject data into future sync batches.
- SDK binary is code-signed; distributors must verify signature before installation
- SDK updates delivered only via signed, integrity-verified package
- SDK distribution pipeline is separate from the main application deployment pipeline (blast radius containment)
- SDK open-source core components with pinned, audited dependency versions
- Outbound data from SDK is schema-validated against published extraction spec before transmission; fields outside spec are stripped (defense in depth even if SDK is compromised)
Ingestion & Data Lake Threats
A compromised service account or misconfigured IAM policy allows a data pipeline job for Distributor A to read Bronze-layer data belonging to Distributor B. This is the highest-impact information disclosure scenario — a distributor's competitor could access their full retailer transaction history.
- Per-distributor S3 prefix with separate CMK — key boundary is a hard technical barrier
- IAM policies scoped to specific prefixes; no wildcard S3 ARNs in any policy
- Spark jobs run with distributor-scoped IAM roles; cross-prefix reads are denied at the IAM level
- Quarterly IAM policy review; alerts on any cross-prefix access attempt
- Data lake access logging enabled; all GetObject calls logged with caller identity
An attacker with write access to the Bronze layer modifies or deletes raw ERP records after ingestion. This breaks the audit lineage — it becomes impossible to trace a score back to its source data. Could be used to cover up previous data manipulation or to impede regulatory investigation.
- S3 Object Lock (WORM mode) on Bronze layer — objects cannot be modified or deleted for the retention period
- Batch hash chain: each batch records a hash of all prior batches; any modification breaks the chain (detectable)
- Separate backup to a second region with independent access credentials
- Data ops service accounts have PutObject only; no DeleteObject or overwrite capability
An attacker with access to the Silver (pseudonymized) feature store attempts to re-identify a retailer by combining feature values (e.g., unique combination of GMV range + district + SKU category) with external knowledge about specific retailers in a geography. If successful, PII protections in the Silver layer are undermined.
- k-anonymity analysis run at feature computation time — any feature combination unique to fewer than k=5 retailers is generalized (binned or suppressed)
- Geographic features binned at district level, not shop-location level
- Silver layer access restricted to ML Engineers + automated pipelines; no analyst direct access
- Identity resolution store (UUID → GST/phone mapping) physically separated; ML pipeline has no access
ML Model & Adversarial Threats
The AltScore ML system introduces threat categories that traditional software threat models do not cover — model poisoning, score gaming, model inversion, and algorithmic bias amplification.
An attacker who has compromised a distributor's ERP injects systematically manipulated training data into the NBFC partner repayment dataset — e.g., marking bad borrowers as repaid, or reporting good borrowers as defaulted. Over multiple training cycles, the model learns incorrect associations and begins systematically mis-scoring retailers in the attacker's favor. This is a slow, high-impact attack that may not be detected until default rates spike significantly above predictions.
- NBFC partner repayment data ingested via a separate, dedicated secure channel — not the same pipeline as ERP data
- Cryptographic hash of every training dataset snapshot; hashes stored immutably in WORM storage
- Champion/challenger model evaluation: new model version must outperform champion on a holdout — manipulated training data would create a weaker model, not a stronger one (in most cases)
- Continuous monitoring of predicted PD vs. actual default rate per distributor cohort — systematic manipulation by one distributor creates a detectable deviation in that cohort
- Annual adversarial data injection test (red team) to validate detection capability
An attacker with API access systematically queries the Score API with modified inputs for the same retailer (or similar synthetic retailers), studying how the score changes in response to feature variations. Over many queries, the attacker reconstructs the model's decision boundary — effectively reverse-engineering the scoring logic to either game it or sell the knowledge to competitors.
- API does not accept arbitrary feature inputs — only retailer_id; features are computed internally and not exposed
- Rate limiting: max 3 score queries per retailer per 24h per lender — limits systematic probing
- Reason codes are categorical labels, not numeric feature values; labels reveal direction, not magnitude
- Model score output includes controlled noise for borderline cases (score ±5 for P(Default) within 1% of band boundary)
- Anomaly detection on lender query patterns — systematic probing of many retailers with minor variations triggers review
A retailer who learns (through their distributor or a leaked reason code pattern) which behaviors improve their score deliberately inflates orders before a scoring event, makes unusually punctual payments for a short period, or avoids returns — without a genuine underlying business improvement. If successful, they receive a higher credit limit than their actual risk profile warrants, leading to over-lending and eventual default.
- Isolation Forest detects order velocity spikes >3σ from 12-month baseline
- 60-day look-back window — manipulation must be sustained for 2 months, which requires actual behavior change
- Long-term payment history (3+ years) dominates the payment signal; 2-month improvement minimally affects band
- Return rate counter-signal: inflated orders are often returned; high returns penalize even if orders are high
- Reason code opacity: retailers see categories ("consistent_order_frequency"), not the specific numerical thresholds
An attacker with access to the model registry modifies the serialized XGBoost model artifact or the feature scaler objects — introducing a backdoor that causes specific retailer profiles (identifiable by the attacker) to receive artificially high scores. Because the model binary is serialized, the change may not be detectable through output monitoring until default rates spike.
- SHA-256 hash of all model artifacts logged at training time; hash verified at every model load
- Hash mismatch = model load failure + immediate alert to ML Ops + Security
- Model registry access restricted to ML Ops role; no developer direct write access
- Promotion of any model version requires: hash verification + performance validation on holdout + human sign-off from two ML team members
- Artifact store on S3 WORM — model artifacts are immutable after training
The model, through correlations in training data, systematically under-scores retailers in a specific geography, industry, or demographic group — not due to malicious tampering, but due to spurious correlations in historical data (e.g., a region where the NBFC partner historically denied credit, creating a self-fulfilling training signal). This is not a traditional security threat, but the harm to affected retailers is real and the regulatory exposure under DPDP Act is significant.
- Quarterly fairness audits: score distribution compared across state, district tier, trade category, and gender proxy cohorts
- Disparate impact testing — score gap >15% for matched behavioral cohorts triggers model review
- Geography excluded as a direct feature; only behavioral signals permitted
- Training data de-biasing: NBFC partner data reweighted to correct for historical denial bias if detected
- Model freeze protocol: if bias detected, model frozen and retraining initiated before next production scoring cycle
Score API & Lender Interface Threats
A vulnerability in the API authorization layer (e.g., an IDOR — Insecure Direct Object Reference) allows a lender to query the score of a retailer that belongs to a different lender's portfolio — one where the retailer has not consented to that lender. At scale, a single lender could enumerate the entire retail credit portfolio of a competitor NBFC.
- Row-level security (RLS) in PostgreSQL Score Store — every query automatically scoped to requesting lender_id; bypass requires DB-level privilege not available to application
- Consent scope check: API verifies retailer has consented to requesting lender before DB query is issued
- No sequential retailer IDs; UUIDs used throughout — enumeration attacks yield no useful identifiers
- Penetration testing includes IDOR testing on every release; dedicated OWASP API Top 10 test suite
- Anomaly detection: lender querying many retailers they don't have a prior relationship with → alert
A volumetric DDoS attack against the Score API makes it unavailable to lenders during a peak underwriting period (e.g., before a major festival credit surge). Lenders cannot underwrite retailers; loan disbursements are delayed; AltScore breaches SLAs with NBFC partners. Could be financially motivated (extortion) or competitive sabotage.
- Cloud-native DDoS protection (AWS Shield Advanced / Azure DDoS Standard) upstream of application
- WAF rate limiting at network layer before application-layer rate limiting
- API responses cached in Redis (TTL: 12 hours per retailer) — cached scores served even if scoring pipeline is under load
- Multi-region API deployment for failover; automated DNS failover on health check failure
- Lender SLA includes 99.5% uptime; incident comms playbook for DDoS events pre-drafted
A lender's API key is leaked (e.g., committed to a public GitHub repository, phished from an NBFC developer, or stolen in a lender-side breach). An attacker uses the stolen key to query scores within the lender's authorized retailer scope, accessing sensitive credit intelligence they are not entitled to receive.
- Compromised key revocation: lender can revoke via API or support request; revocation effective within 60 seconds
- Anomaly detection on API key usage: queries from new IP geographies, unusual hours, or sudden volume spikes trigger alert to both AltScore security and NBFC CISO
- API keys are short-lived access tokens (1h TTL) with rotating refresh tokens — a leaked key expires quickly
- Lenders advised to use IP allowlisting for their API keys; AltScore enforces this for enterprise NBFC integrations
After a loan defaults, an NBFC lender claims the score they received was different from what AltScore served, or that AltScore provided a score with stale data without adequate disclosure — in order to shift liability for the bad loan to AltScore. Without an immutable record of the exact score served, AltScore has no defense.
- Every score API response logged immutably: lender_id, retailer_id (UUID), score, PD, risk_band, model_version, data_freshness_days, generated_at, request_ip — all fields
- Score record stored in Score Store with the same values; lender API response reconstructable from stored record
- data_freshness_days prominently included in every response; lender acceptance of stale data is a documented API interaction
- Lender API agreement includes: "AltScore's audit log constitutes the authoritative record of scores served"
Insider Threats
Insider threats are among the most difficult to detect and mitigate. AltScore's design assumes that any employee could be a threat actor — through malice, coercion, or negligence — and implements controls accordingly.
| Threat ID | Insider Threat Scenario | Category | Severity | Key Mitigations |
|---|---|---|---|---|
| TM-IN-01 | Data Exfiltration — Employee sells retailer PII Employee with data access exports retailer GST + phone numbers and sells to a competitor or phishing operation |
I — Disclosure | CRITICAL | No direct PII access for most roles; DLP alerts on bulk data export; four-eyes approval for admin access; all data access logged and reviewed; background checks at hiring; anomalous query volume alerts |
| TM-IN-02 | Score Manipulation — Employee adjusts score for a retailer Data engineer or ML engineer directly modifies a score in the Score Store DB (e.g., bribed by a lender or retailer) |
T — Tampering | CRITICAL | Score Store only writeable by automated pipeline service account; no human write access to production Score Store; manual overrides only via lender dashboard with mandatory audit trail; all DB changes logged and reviewed weekly |
| TM-IN-03 | Model Backdoor — ML engineer embeds model manipulation Rogue ML engineer embeds a trigger in the model that causes high scores for specific retailer identity patterns controlled by an external party |
T — Tampering | HIGH | Mandatory peer code review for all model training code changes; two-person sign-off for model promotion; model artifact hash verification; champion/challenger evaluation catches performance deviations introduced by backdoors |
| TM-IN-04 | Credential Sharing — Employee shares admin access Employee shares their admin credentials with an unauthorized party (contractor, former colleague, or external attacker who has socially engineered them) |
S — Spoofing | HIGH | Hardware security key (FIDO2) for admin access — credentials are phishing-resistant and non-shareable; session anomaly detection (geographic, device, time-of-day); immediate credential revocation on off-boarding |
| TM-IN-05 | Negligent Data Handling — Employee exposes data via unsecured channel Employee exports a report containing retailer data and emails it unencrypted, posts it in a public Slack channel, or leaves a laptop with unencrypted data unattended |
I — Disclosure | MEDIUM | DLP policy on email and Slack (auto-detect PII patterns, block unencrypted export); endpoint encryption (FileVault / BitLocker) mandatory; security awareness training quarterly; no PII in analytics reports (aggregated only) |
Supply Chain Threats
AltScore's technology stack depends on open-source libraries, cloud provider services, and third-party APIs. Each dependency is a potential attack vector if compromised upstream.
| Component | Threat | Severity | Mitigations |
|---|---|---|---|
| Python ML libraries (XGBoost, SHAP, scikit-learn) | Malicious package version published to PyPI mimicking a dependency; dependency confusion attack; typosquatting | HIGH | Pinned dependency versions in requirements.txt; private PyPI mirror for production dependencies; automated dependency vulnerability scanning (Dependabot / Snyk) in CI/CD; no internet access from ML training cluster (packages from internal mirror only) |
| Cloud Provider (AWS/Azure) | Cloud provider infrastructure breach; availability event; regional outage; insider at cloud provider accessing customer data | MEDIUM | Customer-managed encryption keys (CMK) — cloud provider cannot decrypt data without AltScore's key; multi-region deployment for availability; cloud provider SLA and DPA reviewed; no unencrypted PII in any cloud service |
| WhatsApp Business API (Meta) | Meta-side breach exposes consent message content; API outage disrupts consent flow; Meta changes API terms restricting use case | MEDIUM | AltScore stores only consent_decision + timestamp (not message content); SMS fallback consent channel for WhatsApp outages; alternative consent channel (web portal) available; Meta API terms reviewed annually; consent channel is contractually abstracted — can swap provider |
| Tally / SAP B1 ERP software | Vulnerability in ERP software exploited to provide false data to the connector; ERP vendor breach exposes distributor data | MEDIUM | Connector reads ERP via read-only DB connection; ERP vulnerabilities don't directly affect AltScore systems; distributor is responsible for ERP security; AltScore connector does not expose new attack surface to the ERP |
| npm / node dependencies (dashboard frontend) | Malicious npm package in dashboard build; XSS via compromised CDN-loaded script | MEDIUM | Subresource Integrity (SRI) hashes on all CDN-loaded scripts; npm audit in CI/CD; no CDN scripts loaded by default without SRI; Content Security Policy (CSP) headers strict on dashboard |
Risk Heat Map Summary
All identified threats consolidated and ranked by residual risk (after mitigations applied). This summary drives prioritization of security investment and quarterly review focus.
| Threat ID | Threat Summary | STRIDE | Inherent Risk | Residual Risk | Review Cadence |
|---|---|---|---|---|---|
| TM-ERP-01 | Spoofed ERP connector — malicious data injection | S | CRITICAL | MEDIUM | Quarterly |
| TM-ML-01 | Training data poisoning — systematic score manipulation | T | CRITICAL | MEDIUM | Quarterly |
| TM-ML-04 | Model artifact tampering — silent score manipulation | T | CRITICAL | LOW | Monthly |
| TM-DL-01 | Cross-distributor data lake access — PII leakage | I | CRITICAL | LOW | Quarterly |
| TM-API-01 | Cross-lender data leakage via API (IDOR) | I | CRITICAL | LOW | Every release |
| TM-IN-01 | Employee exfiltrates retailer PII | I | CRITICAL | MEDIUM | Continuous monitoring |
| TM-IN-02 | Employee directly manipulates score store | T | CRITICAL | LOW | Monthly audit |
| TM-ERP-03 | Distributor deliberate data manipulation | T | HIGH | MEDIUM | Quarterly |
| TM-ERP-04 | ERP connector SDK supply chain attack | E | HIGH | MEDIUM | Per SDK release |
| TM-ML-03 | Score gaming — retailer behavioral manipulation | T | HIGH | LOW | Quarterly |
| TM-ML-05 | Bias amplification — systematic cohort under-scoring | I | HIGH | LOW | Quarterly fairness audit |
| TM-API-02 | DDoS against Score API | D | HIGH | LOW | Continuous monitoring |
| TM-API-03 | Compromised lender API key | S | HIGH | LOW | Continuous monitoring |
| TM-DL-03 | PII re-identification from pseudonymized features | I | MEDIUM | LOW | Annually |
| TM-ML-02 | Model inversion — reconstructing scoring logic | I | MEDIUM | LOW | Quarterly |
Residual Risks & Accepted Risks
Not all risks can be fully mitigated. The following residual risks are explicitly acknowledged by the Security team and CISO — accepted because the cost of full mitigation exceeds the risk, or because the mitigations available are structurally limited.
An accepted risk is not an ignored risk. Every accepted risk has a named owner, a monitoring mechanism, and a re-evaluation date. If the threat landscape changes — a new attack vector is discovered, a mitigation fails, or the residual impact estimate turns out to be wrong — the risk is escalated back to active mitigation.
RESIDUAL RISK MANAGEMENT PRINCIPLE| Risk ID | Residual Risk Description | Why Accepted | Monitoring | Owner | Review Date |
|---|---|---|---|---|---|
| RR-01 | Distributor data manipulation (TM-ERP-03) — a determined, sophisticated distributor could manipulate ERP data in ways that pass all current anomaly detection | Full prevention requires physical access to distributor ERP, which is structurally impossible. Current mitigations make manipulation detectable at scale but not for a single carefully crafted event. | Multi-distributor cross-validation; PSI monitoring; manual audit of outlier distributor cohorts quarterly | Head of Risk & Data Quality | Q3 2026 |
| RR-02 | Training data poisoning (TM-ML-01) — slow, coordinated poisoning over multiple training cycles that stays below anomaly detection thresholds is not fully preventable | Perfect training data integrity requires verified repayment outcomes, which depend on NBFC partner systems outside AltScore's control. The mitigation (batch hashing, champion/challenger) reduces but does not eliminate the risk. | Monthly PD calibration check (predicted vs. actual per distributor cohort); annual adversarial data injection test | Head of ML + CISO | Q3 2026 |
| RR-03 | Social engineering of employees — a sophisticated attacker could social-engineer a key employee into taking an action that bypasses technical controls (e.g., approving a fraudulent four-eyes access request) | Social engineering of motivated humans cannot be fully mitigated through technical controls alone. Awareness training reduces but does not eliminate the risk. | Quarterly security awareness training; phishing simulation exercises; anomalous access pattern alerts | CISO | Annual |
| RR-04 | Zero-day vulnerability in cloud infrastructure — an unpatched vulnerability in AWS/Azure infrastructure or managed services could expose data before a patch is available | Zero-days in cloud providers cannot be pre-mitigated. CMK encryption provides defense in depth (cloud provider cannot decrypt data), but a zero-day in the IAM or key management layer could potentially bypass this. | AWS/Azure security bulletins monitored daily; incident response plan covers cloud provider zero-day scenario; multi-cloud architecture considered for Phase 3 | Head of Engineering + CISO | Annual |