AltScore — Security Architecture

01 —

Security Design Principles

AltScore processes sensitive commercial transaction data belonging to distributors and retailers. A breach or misuse does not just harm a company — it erodes the trust of the very small businesses the platform exists to serve. Security is therefore a first-class product constraint, not an afterthought.

The security model is built around one foundational assumption: every component can be compromised. Defense is layered so that the compromise of any single layer does not result in a material breach. No single key, credential, or service is a single point of failure.

SECURITY DESIGN PRINCIPLE — ALTSCORE

Principle 01

Least Privilege

Every service, user, and API consumer receives the minimum access required for their function. No shared admin credentials. Role boundaries enforced at infrastructure level, not application level.

Principle 02

Defense in Depth

Security controls applied at every layer — network, application, data, and identity. Compromise of one layer does not cascade. Layers include: network perimeter, WAF, application auth, field-level encryption, audit logging.

Principle 03

Data Minimization

Only the data necessary to compute credit features is ingested. PII is pseudonymized before entering the feature pipeline. Raw ERP data never enters the ML scoring layer.

Principle 04

Immutability of Raw Data

Bronze-layer raw data is append-only and immutable. Audit lineage requires the ability to reconstruct any score from source. No destructive writes permitted in the raw storage tier.

Principle 05

Explicit Consent as a Security Control

Consent records are cryptographically signed and stored immutably. Processing any retailer's data without a valid consent record is a system-level block, not a policy aspiration.

Principle 06

Assume Breach

Detection and containment capabilities are designed for the scenario where an attacker has already gained some foothold. Segment everything. Alert on lateral movement. Mean Time to Detect (MTTD) is a primary operational metric.

02 —

Authentication & Authorization

Identity Tiers

Actor	Auth Mechanism	MFA Required	Session / Token TTL	Access Scope
NBFC Lender (API)	Signed JWT bearer token (RS256) + client secret	N/A (machine-to-machine)	Access token: 1 hour; refresh: 24 hours	Score API read; own portfolio data only
NBFC Credit Officer (Dashboard)	SSO via SAML 2.0 / OIDC (lender's IdP)	Yes — enforced by lender IdP	Session: 8 hours; idle timeout: 30 min	Own lender portfolio; no cross-lender data
Distributor (Portal)	Email OTP + phone OTP (two-factor)	Yes — OTP is MFA	Session: 4 hours; idle timeout: 20 min	Own retailer network data only
AltScore Internal (Admin)	Hardware security key (FIDO2/WebAuthn) + SSO	Yes — phishing-resistant hardware key mandatory	Session: 4 hours; re-auth for sensitive actions	Role-gated (see RBAC below)
ERP Connector (Service Account)	Mutual TLS (mTLS) client certificate + API key	N/A (service account)	Certificate rotation: 90 days; key rotation: 30 days	Data ingestion only; write to Bronze lake; no read of other distributors
ML Pipeline (Service Account)	IAM role-based (AWS/Azure managed identity)	N/A (platform-managed)	Short-lived credentials: 1 hour TTL	Read Silver; write Gold; no raw Bronze PII access

RBAC Matrix — Internal Roles

Internal Role	Raw ERP Data	Feature Layer	Score Store	Model Artifacts	Consent Records	Audit Logs
Data Engineer	Read (own dist. only)	Read/Write	No	No	No	Read
ML Engineer	No	Read	Read	Read/Write	No	Read
Credit Risk Analyst	No	Read (aggregated)	Read	Read	No	Read
Legal / Compliance	No	No	No	No	Read	Read
Security / Audit	No	No	No	No	Read	Read/Write
Engineering Ops	No	No	No	No	No	Read
Super Admin	Restricted + audit	Restricted + audit	Restricted + audit	Restricted + audit	Restricted + audit	Read/Write

Super Admin access to raw data always requires a four-eyes approval (a second admin must approve the access request) and generates an immutable audit entry. There is no unilateral admin access to raw retailer or distributor data.

ACCESS CONTROL REQUIREMENT

API Authorization Constraints

Lender API Scoping

Each lender API key is bound to a specific lender_id at issuance
Score queries return only retailers that lender has been granted access to via consent
Cross-lender data leakage is prevented at the DB query layer (row-level security), not just application logic
API keys are non-transferable and cannot be delegated

Token Validation Rules

All JWTs validated for: signature (RS256), expiry (exp), issuer (iss), audience (aud)
Revocation list checked on every request — no grace period after revocation
Token claims are immutable — no server-side claim augmentation after issuance
Refresh tokens are one-time use; rotating refresh token scheme

03 —

Encryption & Key Management

Encryption Standards by Layer

In Transit

TLS 1.3 minimum for all external API calls, ERP connector traffic, and dashboard access. TLS 1.2 permitted only for legacy ERP integrations with documented exception and 6-month migration plan. Certificate pinning on the ERP Connector SDK. HSTS enforced on all web properties with 1-year max-age.

At Rest — Bronze

AES-256-GCM server-side encryption for all Bronze-layer objects in S3/ADLS. Customer-managed encryption keys (CMK) stored in AWS KMS / Azure Key Vault. Separate CMK per distributor — compromise of one distributor's key does not expose others. Key rotation: annually or on suspicion of compromise.

At Rest — Silver/Gold

AES-256 at the storage layer. Feature tables pseudonymized — retailer_id replaced with an opaque UUID mapped in a separate, restricted identity resolution store. Score Store encrypted column-by-column for fields: retailer_id, lender_id, score, PD estimate. Backups encrypted with separate backup CMK.

Field-Level PII

Application-level envelope encryption for high-sensitivity fields: retailer phone number, GST number, shop address. Envelope key stored separately from data key. Fields encrypted before writing to any store — neither database admins nor cloud provider can read in plaintext without the application key.

Consent Records

Cryptographic signature (ECDSA P-256) on every consent grant and revocation. Consent records stored immutably — no UPDATE or DELETE permitted; all changes are new signed records. Signature verification is part of the consent check on every data processing event.

Key Management Requirements

Key Type	Storage	Rotation Period	Access	Backup
API Signing Key (RS256 private)	HSM-backed KMS	6 months	Auth service only; no human access	Geo-replicated KMS
Distributor CMK (data lake)	AWS KMS / Azure Key Vault	Annual	Automated only; no console access	Cross-region replication
ERP Connector Client Cert	Per-connector secure keystore	90 days	Connector service only	Revocable via CRL
Field-Encryption DEK	Envelope-encrypted with KEK in KMS	Monthly re-wrapping	Application service account only	KEK backup in secondary region
Consent Signing Key (ECDSA)	HSM	Annual	Consent service only	Cold backup — not network-accessible
Dashboard TLS Certificate	Certificate Manager (auto-renew)	90 days (auto)	Load balancer / CDN	Auto-renewed before expiry

04 —

Network Security

Network Segmentation

The AltScore infrastructure is segmented into four isolated network zones. No direct connectivity between zones — all cross-zone traffic routes through an explicit gateway or service mesh with mutual authentication.

Zone	Contents	Internet-Accessible
DMZ	API Gateway, WAF, Load Balancer, CDN	Yes (controlled)
Application	Score API, Consent Service, Dashboard Backend	No
Data	Data Lake, Feature Store, Score DB, Redis Cache	No
ML Compute	Training cluster, MLflow, Model registry	No — egress via proxy only

Perimeter Controls

WAF Rules

OWASP Top 10 rule set enforced at WAF
Rate limiting: 100 req/min per lender API key; 1000 req/min per IP at WAF
Geo-blocking configurable at WAF — default: India only for production
SQL injection, XSS, and path traversal patterns blocked at ingress
Bot detection: minimum CAPTCHA score threshold for dashboard login

DDoS Protection

Cloud-native DDoS shield (AWS Shield Advanced / Azure DDoS Standard)
Volumetric attack mitigation at network layer — upstream from application
API-layer throttling independent of network-layer protection
Automatic traffic scrubbing for sustained attack patterns

ERP Connector Network Requirements

The ERP Connector SDK operates inside distributor on-premise or cloud environments. Network security must be enforced for both the outbound connection from the distributor and the inbound data ingestion endpoint at AltScore.

Requirement	Specification	Rationale
Outbound-only connection	Connector initiates HTTPS; AltScore never initiates inbound connection to distributor network	No VPN or firewall hole required in distributor environment
IP allowlisting	AltScore publishes a static IP range for ingestion endpoints; distributors can allowlist	Reduces phishing/man-in-the-middle risk
mTLS on ingestion endpoint	Client certificate required; self-signed certs not accepted	Prevents unauthenticated data injection
Payload size limit	Max 50MB per batch; connector enforces chunking above this threshold	Prevents resource exhaustion on ingestion endpoint
Retry with backoff	Exponential backoff; max 5 retries; failure logged and alerted	Prevents thundering-herd on transient network failures

05 —

Data Access Controls

AltScore holds sensitive commercial data from multiple distributors and retailers. Strict logical isolation between data owners is a non-negotiable requirement — not just a policy, but an engineering constraint enforced at the database level.

Data Isolation Architecture

Distributor Isolation

Each distributor's raw data partitioned in a separate S3 prefix / ADLS container with its own CMK
Service accounts for each distributor's connector have write access only to their own partition
Bronze-layer queries require explicit distributor_id scope — no wildcard reads
Cross-distributor data joins only permitted in the pseudonymized Silver layer, after identity resolution

Lender Isolation

Row-level security (RLS) enforced in Score Store PostgreSQL — every query is automatically scoped to the requesting lender_id
Lender A cannot enumerate, infer, or access retailer scores belonging to Lender B
API responses never include competitor lender identifiers or exposure amounts
Lender dashboard data is server-side rendered with lender scope — no client-side filtering

Retailer PII Pseudonymization

GST number and phone pseudonymized to UUID before entering Silver/Gold layers
Identity resolution store (UUID ↔ GST/phone) is physically separate, access-restricted
ML pipeline never accesses the identity resolution store — operates entirely on UUIDs
Score API re-maps UUID → retailer identity only at response time, via a separate lookup service

No Cross-Contamination Rules

Feature computation jobs run in isolated compute environments per distributor batch
Shared infrastructure (Spark, Airflow) uses resource tagging to prevent job-level data mixing
Score Store views enforce RLS; direct table access requires separate privileged role with audit
Redis cache keys namespaced by lender_id; no shared cache keys across lenders

Sensitive Data Handling Rules

Data Type	Classification	Masking in Logs	Masking in Dashboard UI	Permitted Export
Retailer GST Number	Highly Sensitive	Last 4 chars only	Partial (XXXX...R1ZX)	No — API response only
Retailer Phone	Highly Sensitive	Fully masked	Last 4 digits only	No
Invoice Values	Sensitive	Aggregated only	Range bands, not exact	Aggregated reports only
AltScore Value	Sensitive	Logged (score only)	Shown in full to authorized lender	Lender's own portfolio — yes
Probability of Default	Sensitive	Logged (value only)	Shown to authorized lender	Lender's own portfolio — yes
Reason Codes	Internal	Logged	Shown to lender + retailer (translated)	Yes — per consent scope
Model Weights/Features	Trade Secret	Never logged	Never shown	No

06 —

ERP Connector Security

The ERP Connector SDK is deployed inside distributor environments — a context AltScore does not control. The connector is the highest-risk ingestion surface and must be designed to be secure even when the host environment is compromised.

Connector Design Constraints

Read-Only Access to ERP

SDK requests only SELECT permissions to relevant ERP tables — never INSERT, UPDATE, DELETE
ERP database credentials stored in the SDK's encrypted local keystore, not in plaintext config files
Schema is documented and reviewed; any deviation from expected schema triggers a quarantine and alert
SQL queries are parameterized — no dynamic query construction from ERP data fields

Tamper-Resistance

Signed Payload Chain of Custody

Every data batch signed by connector before transmission (HMAC-SHA256 using per-connector key)
Signature verified at ingestion endpoint before data is written to Bronze layer
Unsigned or invalid-signature batches are rejected and the distributor is notified
Connector binary is code-signed; SDK updates delivered via signed package only

Data Extraction Scope Limits

ERP Table / Module	Fields Extracted	Fields Excluded (Always)
Sales Ledger	Invoice date, invoice amount, retailer ID, payment date, payment amount	Bank account numbers, internal distributor margin, credit notes to third parties
Item / SKU Master	SKU category code, order quantity, unit price band	Distributor cost price, supplier contracts, internal SKU profitability
Retailer Master	Retailer GST, shop name, locality, district, active status	Personal data beyond business identity, owner's personal income/assets
Payment Terms	Credit days agreed, overdue flag, payment method type (UPI/cash/cheque)	Cheque numbers, bank account details, personal guarantor data

The data extraction scope is contractually fixed in the distributor data partnership agreement. Any modification to the schema extraction list requires a new agreement version and explicit distributor consent. The SDK enforces scope technically — schema drift triggers a halt-and-alert, not a silent expansion of data collection.

ERP CONNECTOR SECURITY CONSTRAINT

07 —

ML Pipeline Security

The ML scoring pipeline introduces attack surfaces that are distinct from traditional software — adversarial inputs, model poisoning, training data leakage, and score inversion attacks. Each is addressed as a security constraint, not just a model quality concern.

Threat: Training Data Poisoning

Data Provenance Enforcement

Every training data batch carries a cryptographic hash. The training pipeline validates hashes before use. Any batch with a modified hash is rejected and an alert is raised. NBFC partner data is ingested via a separate authenticated channel and stored in an isolated training store.

Threat: Model Inversion / Leakage

Output Minimization

API response exposes score, risk band, credit limit, and reason codes — not raw feature values. Reason codes are categorical (e.g., "payment_delay_p90_gt_15days") — not the exact numeric value that would allow reconstructing the feature. Model weights are never exposed via any interface.

Threat: Adversarial Score Manipulation

Input Validation & Anomaly Detection

Score requests validated for: valid retailer_id format, authorized lender scope, and request rate per retailer (max 3 score queries per retailer per 24h per lender). Isolation Forest anomaly signal is computed before scoring — flagged retailers receive a "review" disposition, not a raw score.

Model Artifact Security

Artifact	Storage	Access Control	Integrity Check
Trained model binary (XGBoost)	MLflow artifact store (encrypted S3)	ML Ops role only; no developer direct access	SHA-256 hash logged at training; verified at load time
SHAP explainer object	Same as model binary, versioned together	Same as model binary	Hash validated before serving
Feature mean/std scalers	Versioned alongside model in MLflow	Same as model binary	Hash validated; scaler version must match model version
Training dataset snapshot	Encrypted S3, isolated training partition	ML Engineer + Security Audit only	Immutable — S3 Object Lock (WORM)
Model performance logs	CloudWatch / Azure Monitor	Read: ML Ops, Security; Write: pipeline only	Append-only log group; no log deletion permitted

08 —

Audit Logging & Monitoring

Every access to sensitive data, every score issued, and every consent event must be traceable to an identity and a timestamp. Audit logs are the forensic backbone of the platform and must be tamper-resistant.

Mandatory Audit Events

Event	Fields Logged	Retention	Alert Trigger
Score API call	lender_id, retailer_id (UUID), timestamp, score_version, response_band, latency_ms, request_ip	7 years	Rate > 100 req/min per lender
Consent grant	retailer_id, distributor_id, timestamp, consent_scope, signature, channel (WhatsApp/web)	7 years	None (normal event)
Consent revocation	retailer_id, revocation_timestamp, requested_by, reason_code	7 years	Alert to data ops team within 1 hour
ERP data batch ingested	distributor_id, batch_id, record_count, data_hash, ingestion_timestamp, schema_version	7 years	Schema mismatch, hash failure
Admin data access	admin_user_id, resource_accessed, justification_ticket, approver_id, timestamp	7 years	All admin accesses alert to CISO
Failed authentication	user/lender_id, IP, timestamp, failure_reason	2 years	5 failures in 10 min → account lock + alert
Model version promotion	model_version, promoted_by, champion_challenger_metrics, timestamp	7 years	All promotions alert to Risk team
Grievance filed	retailer_id, lender_id, score queried, grievance_type, timestamp, resolution_status	7 years	SLA breach (7 days) → escalation alert

Log Integrity & SIEM

Log Tamper-Resistance

Audit logs written to append-only log groups (CloudWatch Logs with no-delete policy / Azure Immutable Blob Storage)
Log events hashed and chained — each log record includes hash of previous record (blockchain-style integrity)
Log export to cold storage (S3 Glacier / Azure Archive) after 90 days; deletion requires dual-admin approval
Log integrity verified weekly via automated hash chain check

SIEM & Alerting

All audit logs fed to SIEM (e.g., Splunk / AWS Security Hub / Microsoft Sentinel)
Correlation rules: unusual access hours, geographic anomalies, sudden score volume spikes
P1 alerts: paged to on-call security engineer within 5 minutes
Weekly automated threat report generated from SIEM for CISO review

09 —

Regulatory Compliance Requirements

Standard / Regulation	Applicable Requirement	Security Control	Status
DPDP Act 2023	Consent, data minimization, purpose limitation, right to erasure, breach notification (72 hours)	Consent service with cryptographic records; pseudonymization pipeline; deletion workflow; breach runbook	Design complete — implementation in progress
RBI FLDG Guidelines	Explainability of credit decisions; audit trail for automated decisions	SHAP reason codes on every score; immutable audit log of all score decisions	Implemented in model design
PCI-DSS	No storage, processing, or transmission of payment card data	ERP extraction scope explicitly excludes card data; enforced at SDK level	By design — no card data in scope
SOC 2 Type II	Security, Availability, Confidentiality trust service criteria	12-month audit program; continuous monitoring; vendor assessment program	Target: Month 12 — audit readiness program begins Month 1
ISO 27001	Information security management system (ISMS)	ISMS policy suite; risk register; vendor management; business continuity plan	Target: Month 18 — gap assessment Month 3
RBI Account Aggregator Framework	Consent-based financial data sharing; AA licensing	AA integration for supplementary bureau data in v2; legal review of AA licensing requirements	v2 roadmap — legal review Month 4

Penetration Testing Schedule

Test Type	Scope	Frequency	Performed By
External Network Pen Test	All internet-facing endpoints (API, dashboard, connector ingestion)	Annual + before each major release	Accredited third-party firm
Web Application Pen Test	Dashboard, consent portal, API endpoints	Annual + after major feature changes	Accredited third-party firm
ERP Connector Security Review	SDK binary, client certificate handling, data extraction scope	Before each SDK major version release	Internal security + third-party code review
ML Adversarial Testing	Score API input fuzzing; model inversion attempts; gaming simulation	Quarterly	Internal ML security team
Internal Red Team	Full-scope including insider threat simulation, social engineering	Annual	External red team firm

10 —

Incident Response Plan

A documented, practiced incident response plan is a SOC 2 requirement and a DPDP Act obligation. The plan is reviewed and rehearsed quarterly via tabletop exercises.

Severity Classification

Severity	Definition	Response Time	Notification
P1 — Critical	Confirmed data breach; unauthorized access to PII or score data; ransomware; full platform outage	15 min to acknowledge; 1 hour to contain	CISO immediately; legal within 1 hour; regulators within 72 hours (DPDP); affected data owners notified
P2 — High	Suspected breach (under investigation); auth system down; model serving down; consent system down	30 min to acknowledge; 4 hours to resolve or escalate	CISO + Engineering Lead; affected lenders if score API down >30 min
P3 — Medium	Anomalous access pattern detected (no confirmed breach); single component degraded; failed pen test finding	2 hours to acknowledge; 24 hours to remediate	Security team; Engineering lead
P4 — Low	Policy violation; non-critical vulnerability; audit finding	Next business day	Security team internal

Breach Response Workflow

Contain (0–1 hour)

Isolate affected systems — network segment block at firewall
Revoke all active API tokens and sessions immediately
Preserve forensic state — snapshot affected instances before remediation
Activate war room — CISO, Engineering Lead, Legal on bridge
Disable data ingestion and score serving if breach scope unclear

Notify (1–72 hours)

Legal counsel determines regulatory notification scope
DPDP Act: notify Data Protection Board within 72 hours of confirmed breach
Notify affected distributors and retailers per breach notification template (pre-approved by legal)
Notify affected NBFC lenders if their data is in scope
Publish incident status page update every 2 hours during active incident

Recover (72 hours – 7 days)

Deploy patched or clean-state infrastructure from IaC (no manual config)
Re-issue all affected credentials — API keys, certificates, session tokens
Verify data integrity via hash chain audit before resuming operations
Staged traffic restoration with enhanced monitoring

Learn (7–30 days)

Post-incident review — root cause, detection gap, response time analysis
Action items with owners and deadlines tracked in security risk register
Update threat model and relevant security controls
Board-level incident report if P1; share summary with NBFC partners