AltScore is a data pipeline and ML scoring platform. Its functional architecture separates concerns cleanly across ingestion, transformation, inference, and delivery — enabling each layer to evolve independently while preserving strong data lineage and consent enforcement at every boundary.
FA-ALTSCORE-001 · Architecture CharterThe system is structured around a medallion data lake (Bronze → Silver → Gold) as the single source of truth, with event-driven side-effects for notifications and audit. All cross-service communication uses well-defined contracts; no service reaches directly into another service's data store.
No data moves between layers or services without a consent check. The Consent Service is the authority; it is never bypassed.
PII (GST, phone) is replaced with stable UUIDs at the Bronze→Silver boundary. Downstream services never see raw identifiers.
Every score request, consent event, and data access is written to an append-only audit log. Nothing is deleted or updated in the audit store.
On any service failure, the system returns a safe response (block, not allow). No score is served if consent cannot be verified.
All API-tier services are stateless. State lives in PostgreSQL (transactional), Redis (cache), or the data lake (analytical). Services scale horizontally.
All inter-service interfaces are defined by OpenAPI schemas or Avro schemas (Kafka topics). No untyped JSON payloads cross service boundaries.
AltScore decomposes into eight bounded domains, each owned by a specific team. Domains communicate via well-defined APIs or Kafka topics — never by shared database tables.
| Domain | Responsibility | Owning Team | Primary Storage | Key Consumers |
|---|---|---|---|---|
| ERP Ingestion | Pull structured data from Tally Prime / SAP B1 / CSV via connector SDK. Validate, quarantine, write to Bronze layer. | Data Engineering | S3/ADLS Bronze bucket | Feature Pipeline |
| Consent | Manage retailer consent lifecycle (grant, revoke, scope). Single source of truth for all consent decisions. | Product Engineering | PostgreSQL (consent_records) | Score API, Feature Pipeline, Notifications |
| Feature Pipeline | Bronze→Silver pseudonymization, Silver feature engineering, Gold score features. Managed by Airflow + Spark + dbt. | Data Engineering / ML | S3/ADLS Silver + Gold | Scoring Engine |
| Scoring Engine | Load features, run XGBoost inference, compute SHAP codes, apply guardrails, write score to Gold store. | ML Engineering | PostgreSQL (scores), Redis (cache) | Score API |
| Score API | External-facing REST API for NBFC lenders. Auth, consent check, rate limiting, score retrieval, reason code delivery. | API Engineering | Redis (cache), PostgreSQL (read) | NBFC Partners |
| Audit & Compliance | Append-only event logging for all score requests, consent changes, data access, admin actions. 7-year retention. | Security / Platform | S3 WORM + PostgreSQL (audit_log) | Legal, DPO, SIEM |
| Notifications | WhatsApp consent flows, score alerts to retailers, grievance acknowledgement. Stateless send-and-forget. | Product Engineering | None (stateless) + Kafka (outbox) | Retailers, Distributors |
| Admin & Operations | Internal dashboard for distributor onboarding, model promotion, grievance management, compliance reports. | Ops / Engineering | PostgreSQL (admin_db) | Internal Staff |
The only external-facing service that NBFC lenders call. Validates JWT, checks consent scope, enforces rate limits, retrieves score from cache or triggers synchronous scoring, then returns the AltScore response.
Single source of truth for all retailer consent decisions. No other service stores consent state. Signs every consent record with ECDSA P-256 and persists immutably. Exposes synchronous consent-check endpoint consumed by Score API.
Receives structured ERP data from the connector SDK deployed at distributor sites. Validates schema, checks payload HMAC, runs completeness scoring, quarantines malformed records, writes clean records to Bronze layer.
Orchestrated by Airflow. Runs Bronze → Silver (pseudonymization + normalization) and Silver → Gold (feature engineering) transformations on a nightly schedule or triggered by ingestion events.
Triggered by pipeline.gold.ready events. Loads retailer features from Gold layer, routes to correct model (primary XGBoost or Croston's), applies Isolation Forest anomaly check, computes SHAP reason codes, runs guardrails GR-1 through GR-6, writes final score to Gold store.
Consumes Kafka events and sends WhatsApp messages to retailers for consent flows, score alerts, and grievance responses. Uses outbox pattern to guarantee at-least-once delivery. Stateless — no notifications stored long-term.
Receives audit events from all other services via Kafka. Writes append-only records to PostgreSQL audit_log and replicates to S3 WORM for 7-year retention. Exposes read-only query API for legal/compliance access.
Internal-only service for distributor onboarding, NBFC partner management, model promotion workflow, grievance case management, and compliance reporting. Requires FIDO2 hardware key + SSO for all access.
Distributor's Tally Prime or SAP B1 connector SDK calls ERP Ingestion Service (POST /v1/ingest/batch) over mTLS. Payload is HMAC-signed with distributor's API key. Service validates schema, runs completeness scoring, deduplicates on invoice_id, writes clean records to Bronze S3 prefix.
ERP Ingestion Service emits ingestion.batch.complete to Kafka topic with distributor_id, batch_id, record_count, completeness_score. Airflow DAG subscribes via sensor.
Airflow triggers Bronze→Silver DAG: pseudonymizes GST/phone to UUIDs, normalizes schema across ERP types, applies outlier caps. Then triggers Silver→Gold DAG: computes all 40+ features via dbt + Spark. Lineage recorded in dbt docs.
Before scoring a retailer, Scoring Engine calls Consent Service to verify active consent. If no consent or revoked: score is not computed; score.blocked event emitted; retailer flagged for consent outreach.
Scoring Engine loads features from Gold layer, routes to XGBoost or cold-start fallback, runs Isolation Forest anomaly check (GR-2), computes SHAP reason codes (GR-3), applies affordability ceiling (GR-4), validates data quality score (GR-5). Score written to PostgreSQL + Redis cache.
score.computed event emitted to Kafka. Score API Service cache is invalidated and populated with fresh score (TTL 4 hours). NBFC lender's next API call returns fresh result within p95 < 2s.
POST /v1/score with JWT RS256 bearer token + retailer_id. API Gateway validates JWT signature, enforces rate limit (100 req/min). Request forwarded to Score API Service.
Score API Service calls Consent Service: does the requesting lender appear in the retailer's active consent scope? If not → 403 Forbidden. Consent check result cached per (retailer_id, lender_id) for 60 seconds.
Check Redis cache for retailer_id. Cache hit: return immediately (p50 <100ms). Cache miss: query PostgreSQL score store (p95 <500ms). If score older than 24h: return score with data_freshness_days warning. No score exists: return 404.
Every Score API call — hit, miss, or blocked — emits score_requested audit event to Kafka → Audit Service (async, non-blocking). Audit write failure does not block API response but triggers alerting.
Distributor sends consent invite from dashboard. Admin Service calls Notification Service: send WhatsApp invite to retailer's phone with distributor name, purpose, and rights summary.
Retailer responds "HAAN" (Yes) to WhatsApp message. WhatsApp webhook delivers message to Webhook Handler. Handler parses retailer_id, validates HMAC, calls Consent Service POST /v1/consent/grant.
Consent Service creates ECDSA P-256 signed consent record with retailer_id (UUID), purpose, lender_scope[], expiry, timestamp. Emits consent.granted Kafka event. Notification Service sends confirmation WhatsApp to retailer.
| From | To | Protocol | Contract | Auth | Notes |
|---|---|---|---|---|---|
| NBFC Lender | Score API Service | HTTPS REST | OpenAPI 3.1 (API-ALTSCORE-001) | JWT RS256 | Rate-limited; audit on every call |
| ERP Connector SDK | ERP Ingestion Service | HTTPS REST | OpenAPI 3.1 (ERP schema) | mTLS + API key | HMAC payload signing required |
| WhatsApp Webhook | Webhook Handler | HTTPS POST | Meta Webhook format | HMAC-SHA256 | Idempotent on message_id |
| Score API Service | Consent Service | Internal HTTPS | OpenAPI (internal) | mTLS service identity | 60s consent cache on caller side |
| Scoring Engine | Consent Service | Internal HTTPS | OpenAPI (internal) | mTLS service identity | Blocks scoring if no consent |
| ERP Ingestion | Feature Pipeline | Kafka | Avro: ingestion.batch.complete | SASL/SCRAM | Airflow sensor consumes |
| Feature Pipeline | Scoring Engine | Kafka | Avro: pipeline.gold.ready | SASL/SCRAM | Triggers batch scoring run |
| All services | Audit Service | Kafka | Avro: audit_event schema | SASL/SCRAM | Fire-and-forget; DLQ on failure |
| Consent Service | Notification Service | Kafka | Avro: consent.granted / revoked | SASL/SCRAM | Triggers WhatsApp confirmation |
| Scoring Engine | Notification Service | Kafka | Avro: score.computed | SASL/SCRAM | Optional retailer score-ready alert |
| Admin Service | All Services | Internal HTTPS | OpenAPI (internal admin) | FIDO2 + SSO token | All admin actions audit-logged |
All asynchronous communication uses Kafka with Avro-schema topics. Every event schema is versioned in the schema registry. Consumers must handle at-least-once delivery via idempotent processing.
| Event | Topic | Producer | Consumers | Retention |
|---|---|---|---|---|
ingestion.batch.complete | altscore.ingestion | ERP Ingestion Service | Feature Pipeline (Airflow sensor) | 7 days |
pipeline.gold.ready | altscore.pipeline | Feature Pipeline | Scoring Engine | 7 days |
score.computed | altscore.scores | Scoring Engine | Notification Service, Score API (cache invalidation) | 30 days |
score.blocked | altscore.scores | Scoring Engine / Guardrails | Audit Service, Notification Service | 30 days |
consent.granted | altscore.consent | Consent Service | Notification Service, Audit Service, Feature Pipeline | Indefinite |
consent.revoked | altscore.consent | Consent Service | Scoring Engine (halt), Score API (invalidate), Audit Service | Indefinite |
audit_event | altscore.audit | All services | Audit Service | 7 years (S3 WORM) |
grievance.filed | altscore.grievances | Score API Service | Audit Service, Admin Service (SLA timer), Notification Service | 7 years |
model.promoted | altscore.ml | Admin Service | Scoring Engine (reload), Audit Service | Indefinite |
erp.anomaly.detected | altscore.anomaly | Scoring Engine (Isolation Forest) | Audit Service, Admin Service | 90 days |
| Failure Scenario | System Behaviour | Recovery Path |
|---|---|---|
| Consent Service unreachable | Score API returns 503; score blocked (fail-safe). Redis consent cache (60s) absorbs brief outages. | Auto-retry 3× with backoff; circuit breaker opens at 50% failure rate |
| ERP Ingestion schema mismatch | Record quarantined to S3 quarantine prefix; batch continues for valid records; alert sent to Ops | Ops reviews quarantine, distributor SDK updated, records reprocessed |
| Airflow pipeline DAG failure | Feature Pipeline retries 3×; Gold layer not updated; Scoring Engine uses existing cached score with staleness flag | On-call engineer notified; manual DAG re-trigger via Admin Service |
| Scoring Engine model load failure | Score computation blocked; score.blocked event emitted; previous cached score served with warning | MLflow registry check; model artifact re-downloaded; SHA-256 re-verified |
| Redis cache failure | Score API falls back to PostgreSQL read replica; latency degrades but service remains available | Redis cluster auto-failover (Sentinel); cache warming on recovery |
| Kafka broker unavailable | Services use local transactional outbox; events queued locally and replayed on reconnect | Kafka auto-recovery; outbox drain; dead-letter queue for failed events |
| WhatsApp API downtime | Notification Service queues messages in outbox; retries with exponential backoff up to 24h | SMS fallback triggered after 30-min WhatsApp outage |
| SHAP computation failure | Guardrail GR-3: score blocked entirely. No score without reason codes (RBI FLDG). score.blocked emitted. | ML engineer investigation; model re-run triggered once resolved |
| Operation | p50 | p95 | p99 |
|---|---|---|---|
| Score API (cache hit) | <100ms | <300ms | <500ms |
| Score API (cache miss) | <500ms | <2s | <3s |
| Consent check | <50ms | <200ms | <400ms |
| ERP batch ingest (10K records) | <30s | <90s | <5min |
| Feature pipeline (full run) | — | <4h | <6h |
| Scoring (1K retailers) | — | <30min | <1h |
| Dimension | Target |
|---|---|
| Score API uptime | 99.9% monthly |
| Consent Service uptime | 99.95% (critical path) |
| Concurrent lenders | 50 simultaneous at 100 req/min each |
| Max retailers scored | 500K per nightly batch |
| ERP batches/day | 1,000+ distributor syncs |
| Horizontal scaling | All API services auto-scale on CPU >70% |
The following constraints are non-negotiable. Any proposed change to the architecture that would violate these requires explicit sign-off from the Head of Engineering + DPO + Head of Risk.
| # | Constraint | Rationale | Owner |
|---|---|---|---|
| C-01 | No PII (GST, phone) may cross the Bronze→Silver boundary in raw form | DPDP Act 2023; privacy-by-design | DPO |
| C-02 | Consent must be verified on every Score API call; no bypass | DPDP Act; fail-safe design | DPO + Engineering |
| C-03 | All data must remain in India-region cloud infrastructure | DPDP Act data localisation; RBI guidelines | DPO + CISO |
| C-04 | Guardrail thresholds are compile-time constants, not runtime config | GRD-ALTSCORE-001; tamper-resistance | Head of Risk + ML |
| C-05 | Audit log records may never be updated or deleted | Legal hold; DPDP; RBI | CISO + Legal |
| C-06 | Model artifacts must be SHA-256 verified before loading | ML supply-chain integrity; TM-ML-04 | ML Engineering |
| C-07 | Score API must not expose sequential retailer IDs (IDOR risk) | TM-API-01; privacy | Security + API Engineering |
| C-08 | No service may directly query another service's database | Domain isolation; reduces blast radius | Head of Engineering |