The AltScore Credit Scoring Engine is a machine learning system that generates creditworthiness assessments for informal Indian retailers — small kiranas, pharmacies, and general merchants — who lack formal credit histories but have verifiable transactional relationships with FMCG distributors.
MC-ALTSCORE-001 · Model Purpose StatementIndia's informal retail sector accounts for roughly ₹12 trillion in annual commerce, yet 12 million+ small retailers remain credit-invisible to traditional lenders. AltScore bridges this gap by transforming distributor ERP transaction data — invoices, payments, return rates, order cadence — into a structured creditworthiness signal that NBFC lending partners can underwrite against.
This Model Card documents the AltScore Credit Scoring Engine v1 in full compliance with RBI FLDG Guidelines (which mandate model transparency for co-lending arrangements), the DPDP Act 2023 (explainability obligations for automated decision-making), and emerging best practices for responsible AI in credit.
Integration guide and model transparency disclosure for credit underwriting teams consuming the Score API.
Demonstrating compliance with FLDG reason-code requirements, fair lending obligations, and automated decision-making transparency.
Reference document for model monitoring, guardrail thresholds, drift detection, and champion/challenger protocol.
This card covers the primary XGBoost classifier used for retailers with ≥ 6 months of distributor transaction history. The Croston's Method sub-model (sparse/seasonal routing) and the Isolation Forest anomaly detector are described in Sections 04 and 07 respectively. Cold-start fallback scoring is covered under Limitations (Section 07).
| Sub-Model | Algorithm | Trigger Condition | Output | Status |
|---|---|---|---|---|
| Primary Scorer | XGBoost v1.7 | ≥6 months transaction history, completeness ≥70% | PD (0–1), Score (300–900) | Production |
| Sparse/Seasonal Router | Croston's Method | Intermittent order pattern OR high seasonal amplitude | Adjusted order frequency; routes to modified XGB feature set | Production |
| Anomaly Detector | Isolation Forest | Every scoring request; async gaming checks | anomaly_score (0–1), flags {gaming, data_quality} | Production |
| Explainability Layer | SHAP TreeExplainer | Every scored retailer | reason_codes[] (≥5 categorical); SHAP values per feature | Production |
| Cold-Start Fallback | Rule-based conservative | <6 months history OR completeness <50% | Provisional score; ₹25K limit cap; low_confidence flag | Production |
The model is invoked exclusively via the Score API (POST /v1/score) by credentialed NBFC partners. It is not exposed to retailers directly and is never used in real-time decisioning without retailer consent being verified upstream. All scoring is performed in a managed cloud environment (AWS or Azure, India region) with no on-premises model deployment permitted.
The model is trained on historical distributor-to-retailer transaction records combined with ground-truth repayment outcomes sourced from NBFC lending partners under data-sharing agreements. All training data is processed through the AltScore Data Lake Bronze → Silver pipeline before feature engineering.
The quality of the AltScore model is directly bounded by the quality and representativeness of distributor ERP data. Retailers with a single distributor relationship, or those whose distributor uses non-standard ERP schema, will produce less reliable scores. See Limitations (Section 07).
| Source | Data Type | Coverage | Consent Basis | Retention in Training |
|---|---|---|---|---|
| Tally Prime ERP | Invoice history, payment ledger, GST filings | Primary; ~60% of distributor base | Retailer consent + distributor data partnership agreement | Up to 36 months per retailer |
| SAP Business One | Invoice history, payment terms, returns | ~25% of distributor base | Retailer consent + distributor data partnership agreement | Up to 36 months per retailer |
| CSV/Manual Upload | Structured invoice exports | ~15% of distributor base | Retailer consent + distributor data partnership agreement | Up to 24 months (lower freshness) |
| NBFC Repayment Outcomes | Loan disbursement + repayment status | Pilot cohort (Phase 02 shadow lending) | NBFC data sharing agreement; no retailer PII transferred | 24-month performance window |
Raw ERP data enters the Bronze layer as immutable append-only records protected by S3 Object Lock (WORM). The Bronze → Silver transformation applies pseudonymization (GST/phone replaced with UUIDs), outlier capping, schema normalization, and feature engineering via Apache Spark. No PII reaches the Silver layer or the model training pipeline.
| Split | Composition | Size | Purpose |
|---|---|---|---|
| Training | Historical retailers with ≥12mo history + known outcomes | 70% | Parameter fitting |
| Validation | Time-stratified holdout (most recent 6 months) | 15% | Hyperparameter tuning, early stopping |
| Test (OOT) | Out-of-time: retailers onboarded after training cutoff | 15% | Final unbiased performance assessment |
The out-of-time (OOT) test split is critical for credit models: using a temporal holdout prevents look-ahead bias that would inflate AUROC estimates. All reported metrics are OOT unless otherwise noted.
The model uses 40+ engineered features grouped into five families. Features are computed in the Silver layer via dbt transformations applied over Apache Spark. All features are deterministic — the same input data always produces the same features, facilitating auditability and reproducibility.
All features are reviewed quarterly for potential demographic proxy correlation. Features such as seasonal_amplitude and festival_spike_count are monitored for differential impact across religious community proxies. See Fairness section (06). No features encode geographic granularity below state level.
All performance metrics are computed on the out-of-time (OOT) test split. This is the only acceptable evaluation methodology for credit scoring models where look-ahead bias would render in-sample metrics misleading. Minimum thresholds below represent production promotion gates — a model that fails any threshold is returned to experimentation.
| Band | Score Range | Risk Label | Expected PD Range | Monotonicity Requirement | Typical Recommended Limit |
|---|---|---|---|---|---|
| A | 750–900 | Low Risk | < 5% | PD(A) < PD(B) — enforced in test suite | Up to 30% of 6-month GMV |
| B | 600–749 | Medium-Low Risk | 5–12% | PD(B) < PD(C) — enforced in test suite | 15–25% of 6-month GMV |
| C | 450–599 | Medium-High Risk | 12–25% | PD(C) < PD(D) — enforced in test suite | 10–15% of 6-month GMV |
| D | 300–449 | High Risk | > 25% | Highest PD cohort — enforced in test suite | Decline or <₹25K conservative |
| Metric | Frequency | Green Threshold | Amber Threshold | Red — Action Required |
|---|---|---|---|---|
| Population Stability Index (PSI) | Monthly | < 0.10 | 0.10 – 0.20 | > 0.20 → Model refresh triggered |
| KS Statistic | Monthly (when outcomes available) | ≥ 0.35 | 0.25 – 0.35 | < 0.25 → Champion/challenger triggered |
| PD Calibration Divergence | Quarterly | Within ±2% | ±2–5% | > ±5% → Model freeze, re-calibration |
| Score Distribution Shift | Monthly | P50 within ±30 pts of baseline | ±30–60 pts | > ±60 pts → Investigation required |
| Reason Code Coverage | Daily | 100% | — | < 100% → Score blocked (RBI FLDG) |
AltScore commits to the 80% Rule (Adverse Impact Ratio ≥ 0.80) across all tested demographic and geographic cohorts. A model that fails fairness thresholds is frozen for affected cohorts until the disparity is remediated. Fairness is not optional — it is a production gate.
Guardrail GR-1 · Fairness & Bias DetectionSince the model does not directly observe demographic attributes (by design — see Features, Section 04), fairness analysis uses indirect proxies available in the data: geographic region (state-level), retailer category (pharmacy, kirana, general merchant), and distributor affiliation as a proxy for supply-chain community. Analyses are performed quarterly by an independent reviewer on a pseudonymized dataset.
AIR = Approval rate of group ÷ Approval rate of most-favored group. Minimum acceptable: 0.80 (80% Rule). Values shown are design targets; actual monitoring occurs against live scoring data.
For matched cohorts (retailers with similar GMV, tenure, and payment history), score gaps should not exceed 15 percentage points across any demographic-proxy cohort. Statistically significant disparities (p < 0.05 by Mann-Whitney U) trigger immediate review.
| Control | Implementation | Frequency |
|---|---|---|
| Feature proxy audit | Correlation analysis of all features against demographic proxies | Quarterly |
| Disparate impact reporting | AIR computed for 8+ cohort groupings, reported to Guardrail Committee | Quarterly |
| SHAP-based explanation audit | Distribution of reason codes across cohorts checked for anomalous concentrations | Monthly |
| Model freeze protocol | Affected cohort frozen from scoring if AIR < 0.80; DPO + Board notified if >1,000 affected retailers | Triggered |
| Independent fairness reviewer | External audit of quarterly fairness report before board sign-off | Annually |
Honest disclosure of model limitations is required under both the DPDP Act 2023 and RBI FLDG Guidelines, and is fundamental to responsible AI deployment in credit. Each limitation below is paired with its operational mitigation.
The XGBoost primary model requires a minimum 6-month transaction window to produce a reliable payment delay distribution and trend signal. Retailers below this threshold cannot be scored using the primary model. A conservative rule-based fallback provides a provisional score capped at ₹25,000 with a low_confidence flag.
Retailers who source exclusively from one distributor are more susceptible to biased scores if that distributor has data quality issues or if the relationship is atypical. The model cannot cross-validate behavior against a second distributor signal. This cohort also shows depressed AIR (see Section 06).
Standard payment delay and order frequency features lose predictive power for retailers who order infrequently or only during festival seasons (e.g., Diwali-only buyers). The primary XGBoost model is replaced by the Croston's Method routing for these retailers, but this sub-model has a smaller training sample and lower confidence estimates.
The model's repayment outcome labels are derived from historical lending under the economic conditions prevailing during training. A significant economic shock (pandemic, commodity spike, supply-chain disruption) may cause the model's PD estimates to diverge from actual default rates faster than the monthly monitoring cycle can detect.
A distributor who is aware of the model's feature logic could, in theory, systematically manipulate invoice timing or payment recording to inflate retailer scores. The Isolation Forest anomaly detector addresses isolated gaming attempts but may not detect highly sophisticated long-horizon coordination.
AltScore scores reflect the retailer's behavior up to the most recent ERP sync. If ERP data is more than 14 days old, the score may not reflect recent payment behavior changes (positive or negative). This is especially relevant for retailers in financial stress who may have recently deteriorated.
Phase 01 and 02 distributor onboarding is concentrated in Maharashtra, Karnataka, Tamil Nadu, and Delhi NCR. Retailers in Northeast India, tribal regions, or states with lower formal FMCG distribution penetration may be underrepresented in training data, potentially reducing model accuracy for those cohorts.
The AltScore Credit Scoring Engine is purpose-built for a specific, constrained use case. Deployment outside these boundaries is not authorized, and AltScore Technologies bears no liability for misuse of model outputs beyond intended scope.
NBFC partners using AltScore scores to make lending decisions for retailers within the agreed target segment (B2B informal retail, India, distributor-relationship verified, retailer consent obtained).
Using the recommended_limit field (capped at 30% of trailing 6-month GMV) as an input into lender's own credit committee decisions. Lenders may set lower limits but may not exceed the AltScore ceiling without documented override.
Providing SHAP-derived reason codes to retailers who request an explanation of their AltScore, via the WhatsApp grievance portal or distributor dashboard, in accordance with RBI FLDG Guidelines and DPDP Act rights.
Running AltScore in parallel with existing lender credit processes for benchmarking purposes during Phase 02 shadow mode (≥30 days required before live lending use).
AltScore scores are derived from B2B distributor transaction behavior and are not validated for consumer (personal) credit decisions. Using these scores for personal loans, consumer durables, or mortgages is prohibited.
Insurance risk is a materially different actuarial problem. Using AltScore outputs as an input to insurance premium pricing or policy approval is not authorized without a new model validation exercise.
Using AltScore scores, reason codes, or underlying features to infer demographic characteristics, religious affiliation, caste, or community identity of retailers is strictly prohibited. Scores reflect only transactional behavior.
Scoring any retailer who has not provided unambiguous, DPDP-compliant affirmative consent is prohibited. The system enforces this at the API level; any attempt to score a non-consented retailer returns a 403 error.
NBFC partners may not resell, sublicense, or otherwise transfer AltScore values or reason codes to any third party not named in the original retailer consent. Each lender is individually named in the consent scope.
For Band D scores (High Risk) or when low_confidence: true is returned, a human credit officer must review the decision before loan decline is communicated to the retailer. Fully automated adverse decisions without human-in-the-loop violate DPDP Act automated decision-making provisions.
The AltScore API enforces all use restrictions programmatically. Attempting to score a retailer without valid consent, to access scores cross-lender, or to exceed rate limits results in 4xx error responses. Technical enforcement does not substitute for contractual compliance — NBFC partners are required to maintain their own controls under the Partner Agreement.
Every AltScore response includes a mandatory set of SHAP-derived reason codes explaining the key drivers of the score. This is a hard system requirement — scores without reason codes are blocked and not returned to lenders. This design ensures compliance with RBI FLDG Guidelines and the DPDP Act's right to explanation.
| Rule | Requirement | Enforcement |
|---|---|---|
| Minimum codes per response | ≥ 5 reason codes (≥3 positive, ≥2 negative) | Hard block — score not returned if <5 codes generated |
| SHAP coverage | 100% of scored retailers must have SHAP values computed | Daily monitoring; <100% triggers alert |
| Language support | All 24 codes have verified Hindi translations | Translation table versioned in code repository |
| Code taxonomy stability | Reason codes are stable across model versions | New codes require Guardrail Committee sign-off + lender notification |
| Adverse action reason codes | Band D decisions and limit reductions must surface top 3 negative codes | Enforced in API response serialization |
Retailers may request an explanation of their score via the WhatsApp grievance portal at any time. The explanation is rendered in plain Hindi using the reason code translations and is reviewed by a human credit officer within 24 hours of request. Retailers are not shown raw SHAP values or feature weights — only the categorical reason code descriptions.
Actual SHAP numeric values and feature weights are not exposed to retailers, lenders, or distributors via the API. This is intentional: exposing numeric weights would enable sophisticated gaming of the model's scoring logic. Reason codes convey the directional signal without exposing the precise decision boundary.
The AltScore engine follows a structured champion/challenger model lifecycle. The in-production champion model is continuously monitored via PSI, KS, and PD calibration metrics. When thresholds are breached, a challenger model is trained and validated before promotion.
Challenger model trained on refreshed data. OOT test split evaluated. Fairness analysis performed. All metrics must exceed promotion gates (AUROC ≥0.72, KS ≥0.35, AIR ≥0.80).
Challenger scores computed alongside champion but not used for lending decisions. Rank-order accuracy vs. champion must reach ≥75% concordance before promotion is even considered.
Head of Risk + CTO (or delegate) must both approve promotion. Model artifact SHA-256 hash recorded in immutable audit log. NBFC partners notified of model version change with 72-hour advance notice.
Monthly PSI and score distribution monitoring. Quarterly KS and fairness audit. Drift detection triggers return to Stage 1 or emergency rollback.
Previous champion model version retained in WORM storage with hash verification. Rollback restores previous model artifact and all routing configuration. No re-training required for rollback.
| Trigger | Required Update | Owner |
|---|---|---|
| New model version promoted to production | Full card refresh — all sections | ML Engineering + Credit Risk |
| New feature family added | Section 04 (Features) updated; fairness analysis re-run | ML Engineering |
| Fairness audit finds new at-risk cohort | Section 06 updated; limitation added to Section 07 | Head of Risk + DPO |
| Regulatory change | Section 10 compliance map updated; legal review | Legal & DPO |
| Annual review (minimum) | All sections reviewed; sign-off from ML Lead + Head of Risk + DPO | Cross-functional |