CoatingIQ
← Course index

Wrap-up

67%

Lesson 11·Running Forge Dies

Die change-out timing: when to push, when to pull

A working die is always asking one question: keep running, or pull now. The answer lives in scrap-rate trend, wear-progression rate, and the asymmetry between a planned change-out and a catastrophic in-press failure.

6 min readLesson 11 of 13

Tying it together

Why change-out timing is the operator-controlled lever that pays back most

The build sets the ceiling on die life. Preheat, lube, and inspection determine how close the shop gets to it. Change-out timing decides whether the shop spends the last 10-15% of die life on its own schedule or on the die's schedule. The die is going to be pulled. The only question is whether it is pulled on a planned shift change with a warm spare ready and a maintenance crew on hand, or pulled out of a press that just produced a damaged forging, a broken die, and a half-shift of downtime nobody scheduled for.

Most shops over-push. The reasons are predictable: a tired die is still hitting tolerance most of the time, production has line-throughput targets to make, and a planned change-out is a concrete, visible loss of running hours while a catastrophic failure is statistical until it isn't. The asymmetry hides in the statistics, but it dominates the math.

The four leading indicators

A die in the back half of its life broadcasts what it is going to do. Four indicators move before scrap-rate panic sets in. Track all four and the pull decision is rarely a surprise.

Dimensional drift on the part. Pull a sample part every shift on a critical dimension and trend it. A die in stable wear drifts 0.0005-0.001 inch per 5K hits in the high-flow regions. A die that is losing the compound layer or starting to plastically deform drifts faster, sometimes 0.003-0.005 inch in the same 5K hits. The absolute number depends on the geometry. The trend is what matters. A drift rate that triples is the same signal whether the dimensions are still inside the band or not.

Scrap-rate slope. Track scrap rate per 1K hits, not per shift. A die at 0.5% in steady state and 0.5% over the last 5K is fine. A die at 0.5% historical average and 1.8% over the last 5K is inflecting, even if the average across the die's life is still under 1%. The slope is the leading indicator. Wait for the average to move and you have already paid for the inflection.

Visible wear progression between inspections. Photograph the same regions of the die face at the same lighting at every inspection. Lesson 7 covered the cadence. The trend across photos catches things the eye misses on a single visit: a heat-check crack that was 0.5 mm at the last inspection and is 1.5 mm now is propagating at 200 µm per 1K hits, which is the rate where the next inspection finds a runaway crack, not a measurable one. Photo-trend is also what separates "the die looks tired" (subjective) from "the die has changed since last week" (objective).

Thermal-trace drift. Bulk die temperature at steady state should be stable shift over shift for a given cycle time. A die that ran a steady 280°C at hit 30K and is now running 340°C at hit 70K for the same cycle, lube setting, and ambient is telling you the heat-balance the process was tuned for is no longer accurate. Lesson 3 covered why. By the time the die is shedding heat poorly, the surface gradient per cycle is harsher than the process is dosed for, and heat-check growth accelerates. Cycle time creep (Gold-Set Q25) is the operator's compensating response and lags the thermal drift by hours to shifts.

None of the four indicators alone forces a pull. Two of the four crossing at the same time on the same die is the pattern that does.

When to push

The right call is to keep running when the die is in the band the process was tuned for and the indicators are stable. Concretely: scrap rate inside the baseline band (defined per job from the die's own history, often 0.3-1.0% on hot-forge work), wear progression slow (photo-trend showing crack length growing under 100 µm per 1K hits), no propagating cracks beyond the heat-check network, and dimensions drifting within their established slope. A die at 60K hits with a mature, stable heat-check pattern and a flat scrap-rate trend has 20-30K hits of useful life left and the right answer is to run it. Pulling early on a die that is still in its stable wear phase wastes the run.

Push, in shift-schedule terms, means keep running through the next planned change-out window and re-evaluate at that window. It does not mean push past the next scheduled inspection. The inspection is what generates the indicators that decide the push.

When to pull

The right call is to pull on the next planned shift change when any two of the following are true at the same inspection: scrap-rate slope has inflected (a clean 2x-5x jump over a 3-5K hit window, distinct from shift-to-shift noise), a single crack is propagating out of the heat-check network (anything growing faster than 100-200 µm per 1K hits, particularly in a structurally loaded region), dimensions are drifting at a rate that will leave the tolerance band within the next 5-10K hits, or the thermal trace has drifted enough that cycle time is being extended to compensate.

The phrase "on the next planned shift change" matters. Pulling now-now (mid-shift, mid-run) costs more than pulling at the next handoff and gains very little, unless one of the indicators is severe enough to be a safety question (a deep crack in a structurally loaded region, a piece of the die actually missing, a hard-stop dimensional excursion). For the standard inflection pattern, the pull is at the next change-out window, with the spare die preheated and the maintenance crew on hand. That window is usually 4-12 hours away. The die is not going to fall apart in 4 hours. It is going to fall apart in the next 5-20K hits, and the only question is whether the pull is on the shop's schedule or the die's.

The asymmetric-risk argument is the whole reason early-pull decisions are usually correct

A planned die change-out on a closed-die hot-forge press takes roughly 6-8 hours, end to end: cool the press, unbolt the dies, move the spare in, align, preheat the spare, run first articles, return to production. The number is shop-dependent (a hammer is faster than a screw press; a complicated multi-cavity die is slower than a single-impression) but the order of magnitude is hours, not days. The maintenance crew is on shift, the spare die is ready, and the engineering data captured from the pulled die (photos, dye-pen results, dimensional traverse) feeds the next die's life record.

A catastrophic die failure in production is a different category of event. A die that fractures in the press at the bottom of a stroke takes 24-72 hours of downtime to recover from, depending on what broke. The press has to be powered down hot, the broken die has to be removed without damaging the bolster, the bolster and tooling around the die have to be inspected (a fractured die often damages the upper or lower retaining hardware, sometimes the slide alignment), the part-in-press has to be cleared (a half-formed forging welded to the die surface is its own retrieval problem), and the spare die change has to happen on top of all of that. If the failure event throws a piece of die into the working space, the shop is also looking at an incident investigation and possibly a safety stand-down.

The ratio between the two outcomes is what makes the decision. 6-8 hours of planned downtime vs. 24-72 hours of catastrophic downtime is a 3:1 to 10:1 asymmetry, and that is before counting the damaged die (lost for the job, possibly unrecoverable for the next), the collateral tooling damage, and the half-shift of forgings produced at elevated scrap rate in the run-up to the failure.

The asymmetry compounds with frequency. A shop that runs to catastrophic failure on 1 die in 10 over a year burns those 24+ hour incidents into the calendar. A shop that pulls early on the same die population trades 1-2 hours of "could have run longer" per planned pull for never paying the 24+ hour incident. Over a year of die changes that is the difference between a maintained press schedule and a press schedule that is one bad week from missing a customer commitment.

The 6-8 vs. 24+ asymmetry is the engineering ratio that justifies early-pull decisions in almost every borderline case. The downside of pulling slightly early is bounded (a few thousand hits of life left on the floor). The downside of pulling too late is not bounded at all.

What this means on the shop floor

The pull decision belongs to whoever has the data. In most shops that is the die maintenance supervisor, working from the inspection log and the scrap-rate trend, not the production supervisor working from the line schedule. Production owns the schedule. Maintenance owns the pull criterion. The two conversations are different, and conflating them is how shops end up in the run-to-failure pattern.

The decision also belongs to the data, not the operator's feel. "The die looks done" is not a pull criterion. "Scrap rate climbed from 0.5% to 3.1% over the last 5K hits and the heat-check network is propagating a 2 mm crack" is a pull criterion. The difference is whether the pull can be defended at next month's review, the difference between a habit and a process.

The pull criterion should be written down per job, not per die. A connecting-rod job and a gear-blank job on the same press with the same H13 dies have different criteria because the parts have different tolerance bands and the dies see different stress patterns. The criterion lives in the job's run book alongside the preheat spec, the lube dose, and the inspection cadence. Lesson 13 covers the records.

How AIAG CQI-9 records support the calculus

AIAG CQI-9 does not dictate when to pull a die. The standard is a heat-treat process-audit framework, not a die-management framework. What CQI-9 requires (and IATF 16949 reinforces) is the records the audit can inspect: heat-treat lot, instrumentation calibration, TUS data, witness coupon results, traceability from part back to die.

The same records, kept over a year of production, are what let a shop build its own pull criterion. A shop that logs hit count, inspection findings (with photos), scrap rate, lube dose, preheat profile, and thermal trace per die per shift can look back over 10-20 die-life cycles on the same job and see exactly where the inflection lives for that die family. The pull threshold stops being a guess and becomes an evidence-based number: "on this connecting-rod job, our H13 dies inflect at 65-75K hits, scrap rate climbs 2-3x over a 5K window, and the median useful life past inflection before catastrophic risk becomes unacceptable is 4K hits."

Without the records, every pull decision is the first one. With them, the decision is policy.

Pushback questions for your own shop

The adversary on this lesson is the production-vs-maintenance tug of war and the shop habit of running to failure because the failure cost is statistical and the pull cost is concrete. Ask these questions inside your own four walls.

  1. Who decides when a die is pulled, and what is the written criterion they use? If the answer is "the production supervisor, when the line trips," the shop is running to failure as a policy, not a problem.
  2. What is the baseline scrap rate per job per die family, what is the slope threshold that triggers a pull review, and where is that documented? Without the baseline, every scrap-rate climb is "we will watch it."
  3. What is the documented planned-change-out duration on this press for this die, and what is the longest catastrophic-failure recovery this shop has logged in the last year? If the second number is not in the same database as the first, the asymmetry is not visible to the people making pull decisions.
  4. Over the last 12 months, how many dies on this job were pulled planned vs. pulled catastrophic? A ratio of 4:1 or better planned-to-catastrophic is the working number. A ratio under 2:1 is a shop running too hot on its dies and paying for it in unscheduled downtime.

A shop that can answer all four for one die family on one job has a change-out process. A shop that cannot is running on the production supervisor's instinct, and the next 24-hour incident is already on the calendar.

Common confusions

A die that is still in tolerance is not a die that should keep running. Tolerance is the floor on the part, not the floor on the die. A die can produce parts inside the band and still be 3K hits from a catastrophic crack. The leading indicators (slope, photo-trend, thermal drift) move before the tolerance line is crossed. Pulling on tolerance excursion alone is pulling at the last possible moment.

A planned change-out is not "lost production." It is production deferred by 6-8 hours, scheduled into a window the shop chose. A catastrophic change-out is lost production. The first goes on the calendar. The second goes on the incident log.

Scrap-rate noise is not slope. A single shift at 1.2% on a die that has run 0.5% for 30K hits is not a pull signal. Three consecutive shifts at 1.2% with no other change to the process is the same number telling a different story. Track per 1K hits, not per shift, and the noise filters out.

"We will pull it at the next planned change" only counts if there is a next planned change on the calendar that is closer than the projected catastrophic-failure window. If the next planned change is 20K hits away and the indicators are inflecting, the planned change just became the next shift, not next month.

Up next: operator and crew procedures.

Sources