The Prediction Model

TL;DR

The app uses a calibrated logistic regression model to estimate each employee’s probability of leaving. It’s trained on historical snapshots up to 2022 and validated on 2023 to mimic how it would perform in the next year. Results are turned into actionable savings estimates using your assumptions (intervention cost, effectiveness, replacement cost).

What the model does

Think of the model like a weather forecast for attrition:

Each person gets a risk score (0–100%) — "how likely is rain (attrition) for this person?"
We don't claim certainty for any individual. Instead, we use these probabilities to prioritise attention (like taking an umbrella if rain is likely)
The app lets you test what-if levers (e.g., reduce workload) and instantly see how the forecast and savings might change

Analogy

You could think of logistic regression as a weighted checklist. For example, heavier workload and long commute may push risk up; higher pay competitiveness may push risk down. The model learns how much each factor matters based on past outcomes.

What the Model Does NOT Do

Limitation	Description
Decision Making	It doesn't decide who to keep or let go — it's decision support
Future Prediction	It doesn't know the future. It extrapolates from patterns in your historical snapshots
Post-Decision Learning	It doesn't see post-decision outcomes (e.g., exit interviews, tickets) that would leak future information into training

Key Design Choices

Transparency by Design

Logistic regression is easy to inspect and explain, a better fit for HR decision-making than black-box models when accuracy is similar.

Calibrated Probabilities

We calibrate the model so "30% risk" roughly means "30 out of 100 similar employees leave."

Time-Aware Validation

Train on ≤2022, evaluate on 2023. This prevents "peeking" into the future and gives a realistic performance estimate.

Scenario-Ready

We engineered a feature (mgmt_workload_score) so what-if changes to workload and manager quality flow through to risk.

Model Inputs (Features)

From the cleaned dataset (data/processed/hr_attrition_clean.csv), the model uses:

Job & Pay

base_salary (pay level)
salary_band (dropped to avoid duplication)
compa_ratio (dropped in final model due to collinearity)

Performance & Progression

performance_rating
avg_raise_3y (dropped)
internal_moves_last_2y (dropped)
time_since_last_promo_yrs

Work Patterns & Wellbeing

workload_score
overtime_hours_month
sick_days
pto_days_taken

Engagement

engagement_score
manager_quality (missingness is informative; also used in interaction below)

Logistics

commute_km
onsite/remote (if present)
night_shift

Organizational Context

department
role
team_id (as categorical signals)

Engineered Features

mgmt_workload_score = (10 − manager_quality) × workload_score
Captures that high workload under poorer management is especially risky

Explicitly Excluded Features

Category	Features	Reason
Identifiers	`employee_id`	Not predictive
Post-Decision Signals	`exit_interview_scheduled`, `offboarding_ticket_created`	Data leakage
Duplicates/Collinear	`salary_band`, high-VIF pay proxies (`compa_ratio`, `avg_raise_3y`, `benefit_score`)	Redundancy/collinearity
Split Key	`snapshot_year`	Used to split, not to predict

Missing Values

Missing manager_quality can itself be a signal. The pipeline treats missing values via encoding/scaling, and the engineered interaction uses a conservative floor (e.g., treats missing as 1 for the "quality" term in the what-if mechanism).

Technical Pipeline

1. Preprocessing

Numerical: StandardScaler(with_mean=False)
Categorical: OneHotEncoder(handle_unknown="ignore", sparse=True)
Combined: ColumnTransformer

2. Estimator

LogisticRegression(max_iter=1000) trained on data ≤2022

3. Calibration

CalibratedClassifierCV(cv="prefit") with isotonic or sigmoid chosen by lower Brier score on the 2023 set

4. Persistence

Model: models/attrition_lr_calibrated_train_to_2022_skl171.pkl
Metrics: models/attrition_lr_calibrated_metrics_train_to_2022_skl171.json

Why Calibration?

Uncalibrated models can be over- or under-confident. Calibration aligns predicted probabilities with observed rates — crucial when you convert probabilities into expected savings.

Validation & Performance

Data Split

Train: Snapshots ≤2022
Calibrate/Evaluate: 2023

Key Metrics

Metric	Purpose
ROC AUC	Overall ranking quality (closer to 1 is better)
PR AUC	Useful when attrition is rare
Lift by Decile	Business-friendly: top 10% risk vs average rate
Calibration	Through the chosen calibration method (implicit)

Interpreting Lift

If Decile 1 shows 3× lift, your top-risk 10% is three times as likely to contain real leavers as a random 10%. That's strong targeting signal.

From Probability to Business Value

Two complementary savings views power the app:

1. Threshold-Based Net Savings

Pick a risk threshold; treat everyone above it.

Saved cost = (True Positives × effectiveness × replacement_cost)
Spend = (Flagged × intervention_cost)
Net = Saved − Spend

Good for: Operational reporting, precision/recall trade-offs

2. Threshold-Free Expected Value

Sum expected value across the treated cohort (Top-K or Threshold with Coverage):

Per person EV = effectiveness × replacement_cost × predicted_risk − intervention_cost

Good for: Comparing levers & strategies without being sensitive to a single threshold

Why Not a Complex Model?

We tried Random Forest and XGBoost variants; in this dataset, logistic regression performed comparably (AUC around ~0.65) but offered much better interpretability and governance. When accuracy differences are marginal, simpler + explainable is the right HR choice.

Limits & Caveats

Causality

The model captures associations, not guaranteed causes. Use it to prioritise conversations and support, not as an automated decision engine.

Data Drift

If your org changes (hybrid policies, comp structure), refresh training and re-calibrate. The app's time-aware split is a safeguard, not a guarantee.

Group Effects

team_id can encode manager/team culture. If governance requires, add GroupKFold validation by team to quantify sensitivity.

Fairness

Always review risk & intervention rates by relevant groups (e.g., function, location). Add governance checks before production use.

Model Interpretation

Coefficients

In logistic regression, each feature has a weight: - Positive weight → increases log-odds of attrition - Negative weight → decreases log-odds of attrition

You can export coefficients and a per-feature report for HR/legal review.

Partial Effects

Use decile tables or partial dependence for top drivers (e.g., workload ↑, manager quality ↓).

Calibration Check

Compare predicted vs actual rate in bins (the app's lift + AUC and calibration choice are proxies; you can add a reliability curve if needed).

Reproducibility & Versioning

Component	Location
Training Code	`src/train_calibrated_lr.py`
Serving Code	`app/main.py`
Data	`data/processed/hr_attrition_clean.csv`
Environment	`requirements.txt` and `runtime.txt` (Python 3.11)
Artifacts	Model `.pkl` and metrics `.json` versioned by train cutoff year and sklearn tag

Enterprise Tip

For enterprise, log training runs (data hash, params, metrics) in MLflow or DVC, and schedule periodic re-calibration.

Governance Checklist

Ready for productionization:

Data lineage documented (source, refresh cadence, snapshot definition)
Feature list reviewed (no prohibited attributes; leakage removed)
Validation includes time-based and, if required, grouped CV
Calibration validated on out-of-time data; reliability plot archived
Fairness report (rates by segment, false positive/negative parity where applicable)
Monitoring plan (drift, performance drop alerts)
Human-in-the-loop SOP (how HR acts on risk, audit trail)

Frequently Asked Questions

Is 65% AUC "good enough"?

For HR attrition with limited features, ~0.6–0.7 is common. Value comes from targeting and what-if planning, not perfect prediction.

Why does Δ Net Savings sometimes show 0 while threshold-free savings changes?

Your scenario may shift probabilities without moving many people across the chosen threshold. The threshold-free view captures that subtle improvement.

Can we add more drivers?

Yes — especially manager signals, work pattern telemetry, career progression, and comp competitiveness vs market (with governance).