Skip to content

The Prediction Model

TL;DR

The app uses a calibrated logistic regression model to estimate each employee’s probability of leaving. It’s trained on historical snapshots up to 2022 and validated on 2023 to mimic how it would perform in the next year. Results are turned into actionable savings estimates using your assumptions (intervention cost, effectiveness, replacement cost).

What the model does

Think of the model like a weather forecast for attrition:

  • Each person gets a risk score (0–100%) — "how likely is rain (attrition) for this person?"
  • We don't claim certainty for any individual. Instead, we use these probabilities to prioritise attention (like taking an umbrella if rain is likely)
  • The app lets you test what-if levers (e.g., reduce workload) and instantly see how the forecast and savings might change

Analogy

You could think of logistic regression as a weighted checklist. For example, heavier workload and long commute may push risk up; higher pay competitiveness may push risk down. The model learns how much each factor matters based on past outcomes.


What the Model Does NOT Do

Limitation Description
Decision Making It doesn't decide who to keep or let go — it's decision support
Future Prediction It doesn't know the future. It extrapolates from patterns in your historical snapshots
Post-Decision Learning It doesn't see post-decision outcomes (e.g., exit interviews, tickets) that would leak future information into training

Key Design Choices

Transparency by Design

Logistic regression is easy to inspect and explain, a better fit for HR decision-making than black-box models when accuracy is similar.

Calibrated Probabilities

We calibrate the model so "30% risk" roughly means "30 out of 100 similar employees leave."

Time-Aware Validation

Train on ≤2022, evaluate on 2023. This prevents "peeking" into the future and gives a realistic performance estimate.

Scenario-Ready

We engineered a feature (mgmt_workload_score) so what-if changes to workload and manager quality flow through to risk.


Model Inputs (Features)

From the cleaned dataset (data/processed/hr_attrition_clean.csv), the model uses:

Job & Pay

  • base_salary (pay level)
  • salary_band (dropped to avoid duplication)
  • compa_ratio (dropped in final model due to collinearity)

Performance & Progression

  • performance_rating
  • avg_raise_3y (dropped)
  • internal_moves_last_2y (dropped)
  • time_since_last_promo_yrs

Work Patterns & Wellbeing

  • workload_score
  • overtime_hours_month
  • sick_days
  • pto_days_taken

Engagement

  • engagement_score
  • manager_quality (missingness is informative; also used in interaction below)

Logistics

  • commute_km
  • onsite/remote (if present)
  • night_shift

Organizational Context

  • department
  • role
  • team_id (as categorical signals)

Engineered Features

  • mgmt_workload_score = (10 − manager_quality) × workload_score
    Captures that high workload under poorer management is especially risky

Explicitly Excluded Features

Category Features Reason
Identifiers employee_id Not predictive
Post-Decision Signals exit_interview_scheduled, offboarding_ticket_created Data leakage
Duplicates/Collinear salary_band, high-VIF pay proxies (compa_ratio, avg_raise_3y, benefit_score) Redundancy/collinearity
Split Key snapshot_year Used to split, not to predict

Missing Values

Missing manager_quality can itself be a signal. The pipeline treats missing values via encoding/scaling, and the engineered interaction uses a conservative floor (e.g., treats missing as 1 for the "quality" term in the what-if mechanism).


Technical Pipeline

1. Preprocessing

  • Numerical: StandardScaler(with_mean=False)
  • Categorical: OneHotEncoder(handle_unknown="ignore", sparse=True)
  • Combined: ColumnTransformer

2. Estimator

  • LogisticRegression(max_iter=1000) trained on data ≤2022

3. Calibration

  • CalibratedClassifierCV(cv="prefit") with isotonic or sigmoid chosen by lower Brier score on the 2023 set

4. Persistence

  • Model: models/attrition_lr_calibrated_train_to_2022_skl171.pkl
  • Metrics: models/attrition_lr_calibrated_metrics_train_to_2022_skl171.json

Why Calibration?

Uncalibrated models can be over- or under-confident. Calibration aligns predicted probabilities with observed rates — crucial when you convert probabilities into expected savings.


Validation & Performance

Data Split

  • Train: Snapshots ≤2022
  • Calibrate/Evaluate: 2023

Key Metrics

Metric Purpose
ROC AUC Overall ranking quality (closer to 1 is better)
PR AUC Useful when attrition is rare
Lift by Decile Business-friendly: top 10% risk vs average rate
Calibration Through the chosen calibration method (implicit)

Interpreting Lift

If Decile 1 shows 3× lift, your top-risk 10% is three times as likely to contain real leavers as a random 10%. That's strong targeting signal.


From Probability to Business Value

Two complementary savings views power the app:

1. Threshold-Based Net Savings

Pick a risk threshold; treat everyone above it.

Saved cost = (True Positives × effectiveness × replacement_cost)
Spend = (Flagged × intervention_cost)
Net = Saved − Spend

Good for: Operational reporting, precision/recall trade-offs

2. Threshold-Free Expected Value

Sum expected value across the treated cohort (Top-K or Threshold with Coverage):

Per person EV = effectiveness × replacement_cost × predicted_risk − intervention_cost

Good for: Comparing levers & strategies without being sensitive to a single threshold


Why Not a Complex Model?

We tried Random Forest and XGBoost variants; in this dataset, logistic regression performed comparably (AUC around ~0.65) but offered much better interpretability and governance. When accuracy differences are marginal, simpler + explainable is the right HR choice.


Limits & Caveats

Causality

The model captures associations, not guaranteed causes. Use it to prioritise conversations and support, not as an automated decision engine.

Data Drift

If your org changes (hybrid policies, comp structure), refresh training and re-calibrate. The app's time-aware split is a safeguard, not a guarantee.

Group Effects

team_id can encode manager/team culture. If governance requires, add GroupKFold validation by team to quantify sensitivity.

Fairness

Always review risk & intervention rates by relevant groups (e.g., function, location). Add governance checks before production use.


Model Interpretation

Coefficients

In logistic regression, each feature has a weight: - Positive weight → increases log-odds of attrition - Negative weight → decreases log-odds of attrition

You can export coefficients and a per-feature report for HR/legal review.

Partial Effects

Use decile tables or partial dependence for top drivers (e.g., workload ↑, manager quality ↓).

Calibration Check

Compare predicted vs actual rate in bins (the app's lift + AUC and calibration choice are proxies; you can add a reliability curve if needed).


Reproducibility & Versioning

Component Location
Training Code src/train_calibrated_lr.py
Serving Code app/main.py
Data data/processed/hr_attrition_clean.csv
Environment requirements.txt and runtime.txt (Python 3.11)
Artifacts Model .pkl and metrics .json versioned by train cutoff year and sklearn tag

Enterprise Tip

For enterprise, log training runs (data hash, params, metrics) in MLflow or DVC, and schedule periodic re-calibration.


Governance Checklist

Ready for productionization:

  • Data lineage documented (source, refresh cadence, snapshot definition)
  • Feature list reviewed (no prohibited attributes; leakage removed)
  • Validation includes time-based and, if required, grouped CV
  • Calibration validated on out-of-time data; reliability plot archived
  • Fairness report (rates by segment, false positive/negative parity where applicable)
  • Monitoring plan (drift, performance drop alerts)
  • Human-in-the-loop SOP (how HR acts on risk, audit trail)

Frequently Asked Questions

Is 65% AUC "good enough"?

For HR attrition with limited features, ~0.6–0.7 is common. Value comes from targeting and what-if planning, not perfect prediction.

Why does Δ Net Savings sometimes show 0 while threshold-free savings changes?

Your scenario may shift probabilities without moving many people across the chosen threshold. The threshold-free view captures that subtle improvement.

Can we add more drivers?

Yes — especially manager signals, work pattern telemetry, career progression, and comp competitiveness vs market (with governance).