General
How to Evaluate Carrier Performance: 7 KPIs That Actually Matter
Apr 2, 2026
15 mins read

Key Takeaways
- Most enterprises track carrier performance reactively — reviewing it quarterly, if at all. The carriers that erode your margins and SLAs do so gradually, not dramatically.
- Seven KPIs form the foundation of a carrier scorecard that’s worth building: OTIF rate, first-attempt delivery success, cost-per-delivery by lane, SLA adherence, exception rate, capacity planning, and settlement accuracy.
- The real value of a carrier scorecard isn’t the report. It’s what happens when the scorecard feeds your allocation engine — routing volume toward carriers who perform and away from those who don’t, in real time.
- Locus’s dispatch management platform connects carrier performance data to allocation decisions continuously — so your best carriers get more volume, and your worst carriers get a conversation.
There’s a pattern that plays out quietly at most logistics operations. A carrier starts strong — competitive rates, decent on-time numbers, responsive account management. Six months later, their first-attempt delivery rate has drifted down a few points. Exception rates have crept up on certain lanes. Settlement disputes are taking longer to resolve. Nobody sounds the alarm, because no single metric has crossed a red line. But the cumulative cost in re-attempts, customer complaints, manual workarounds, and eroded SLAs is real.
The problem is rarely that the carrier is “bad.” It’s that nobody is measuring what matters, consistently, at the level of granularity where the drift becomes visible before it becomes expensive.
This post lays out the seven KPIs that form the backbone of a carrier scorecard worth building. For each one, we’ll cover what it measures, why it matters, what “good” looks like, and what to do when the number tells you something is off. The goal is practical: by the end, you should be able to build a working carrier scorecard or evaluate whether the one you have is actually doing its job.
What is a Carrier Scorecard and why Most are Underperforming?
A carrier scorecard is a structured framework for measuring and comparing the performance of your logistics partners across a consistent set of metrics. It sounds straightforward. In practice, most carrier scorecards fall into one of two failure modes.
The first is the spreadsheet that gets updated quarterly. Someone pulls data from three different systems, normalises it manually, and produces a report that arrives two months after the performance it describes. By the time it reaches the procurement review meeting, the underperforming carrier has already handled another 10,000 shipments.
The second is the dashboard that measures the wrong things. It tracks total volume and on-time percentage at the aggregate level, which makes every carrier look roughly the same. The lane-level, time-slot-level, and exception-type-level differences — where the actual cost and service gaps live — never surface.
A well-built scorecard does something different. It tracks the right KPIs at the right granularity, updates continuously, and — most importantly — connects to allocation decisions. A scorecard that exists only as a reporting artefact is a waste of analytical effort. A scorecard that feeds your dispatch engine is a competitive advantage.
Related: Logistics KPIs & Metrics That Matter Most in 2026
The 7 KPIs That Belong on Every Carrier Scorecard
1. On-Time In-Full (OTIF) Rate by Carrier
What it measures: The percentage of orders a carrier delivers both on time and in complete quantity. Partial deliveries and late arrivals both count as failures — which is what makes OTIF a more demanding metric than simple on-time delivery.
Why it matters: OTIF is the single best proxy for carrier reliability as experienced by the customer. A carrier that’s fast but frequently delivers incomplete orders isn’t reliable. A carrier that’s complete but consistently late isn’t either. OTIF captures both dimensions. According to RXO’s research across 1,000 shippers and carriers, 94% of respondents said OTIF performance has directly impacted their carrier procurement decisions. Walmart famously requires a 98% OTIF rate from suppliers and charges a 3% penalty on cost of goods for failures — a signal of just how seriously the market’s largest buyers treat this metric.
What “good” looks like: Industry benchmarks vary, but top-performing logistics operations target 95–98% OTIF. Average performance sits around 90–92%. Anything consistently below 90% on a specific lane or for a specific carrier signals a structural issue, not a bad week.
What to do when it’s off: Don’t look at the aggregate first. Break OTIF down by lane, time slot, and product category. A carrier that’s 96% OTIF overall but 82% on your Tuesday evening grocery runs in Zone 3 has a specific problem that’s being masked by the average. Investigate root causes: is it a capacity issue on certain days? A routing problem? A warehouse handoff delay that the carrier inherits? Then decide whether to reroute volume, renegotiate terms, or replace.
2. First-Attempt Delivery Success Rate
What it measures: The percentage of deliveries completed successfully on the first attempt, without requiring a re-attempt, reschedule, or return to depot.
Why it matters: Failed first attempts are among the most expensive events in last-mile logistics. Each failure generates a direct re-delivery cost (fuel, driver time, vehicle utilisation), a customer service contact (the WISMO call or complaint), and, less visibly, an erosion of customer trust that compounds over time. Locus’s own published content references a cost multiplier of nearly 4x for failed deliveries when all downstream costs are included.
A carrier with a 92% first-attempt rate sounds acceptable. But at 1,000 deliveries per day, that’s 80 failures daily — each one costing time, money, and goodwill. Over a month, that’s 2,400 re-attempts.
What “good” looks like: Best-in-class last-mile operations target 95%+ first-attempt success. For carriers handling time-window deliveries (grocery, food, appointment-based), the bar is even higher — because a missed window often means the customer isn’t home for the second attempt either.
What to do when it’s off: Segment by cause. Failed attempts driven by “customer not available” may point to a time-window accuracy problem upstream, not a carrier performance issue. Failures driven by “address not found” suggest a geocoding gap. Failures driven by “vehicle arrived late” are carrier-attributable. The fix depends entirely on the diagnosis.
Related: What Is Transportation Cost? Formula, Examples & How to Reduce It
3. Cost-Per-Delivery by Lane
What it measures: The fully loaded cost of a single delivery on a specific lane — including base rate, fuel surcharges, accessorials, re-attempt costs, and settlement adjustments. Not the contracted rate. The actual cost.
Why it matters: Contracted rates are what you agreed to pay. Cost-per-delivery is what you actually pay. The difference between the two is where margin leaks. A carrier with a competitive base rate but high accessorial charges, frequent surcharges, and slow settlement accuracy may cost more per delivery than a carrier with a higher headline rate but cleaner execution.
Tracking cost-per-delivery by lane also reveals network-level insights. If Lane A costs 40% more per delivery than Lane B with comparable volumes and distances, the question is why — and whether a different carrier, route structure, or consolidation strategy could close the gap.
What “good” looks like: This is inherently network-specific — geography, density, and product type all influence what “good” means. The value isn’t in an absolute benchmark. It’s in relative comparison: carrier A vs. carrier B on the same lane, this quarter vs. last quarter, peak vs. off-peak. The trends matter more than the number.
What to do when it’s off: Decompose the cost. Is the base rate the problem, or is it accessorials that are inflating the figure? Are re-attempt costs pulling up the average? Is the carrier billing for services that should be included in the contracted rate? This analysis often reveals that the cheapest carrier on paper is not the cheapest carrier in practice — and that insight alone can justify a scorecard investment.
4. Capacity Utilization or Load Fill Rate
What it measures
Capacity utilization measures how effectively a carrier’s available capacity—whether in weight, volume, or pallet positions—is used during transportation. It reflects how full a truck or vehicle is when it is dispatched, across shipments, routes, or the overall network.
Why it matters
In large-scale logistics, like CPG for instance, transportation costs are largely fixed per trip. Underutilized vehicles therefore lead to higher cost per unit, wasted capacity, and increased emissions. Even small inefficiencies, when multiplied across thousands of shipments, can significantly impact margins. As a result, utilization is a critical lever for both cost optimization and sustainability performance.
What “good” looks like
High-performing networks typically maintain 85–95%+ utilization, depending on variability in demand and delivery constraints. More importantly, strong performance is consistent across lanes, with minimal unused space (“air movement”) and tight alignment between shipment profiles and vehicle types.
What to do when it’s off
Low utilization is usually a symptom of fragmented planning. Leaders address this by enabling dynamic load consolidation across orders and time windows, improving routing logic to combine shipments more efficiently, and right-sizing vehicles based on demand. Over time, analyzing lane-level patterns helps identify structural inefficiencies and redesign the network for better consolidation.
5. SLA Adherence Rate
What it measures: The percentage of shipments that meet the specific service-level commitments defined in the carrier contract — which may include delivery time windows, pickup punctuality, proof-of-delivery capture, temperature compliance, or other parameters beyond simple “on time.”
Why it matters: OTIF tells you whether the carrier delivered on time and in full. SLA adherence tells you whether they met the full set of contractual commitments you’re paying for. A carrier can be on time but fail to capture ePOD, miss a temperature threshold, or arrive outside the agreed pickup window. These failures don’t show up in OTIF but they show up in compliance audits, customer complaints, and contract disputes.
This metric also matters because it’s the basis for contract renegotiation. If a carrier is meeting their SLA 88% of the time, you have a data-backed position for adjusting terms, introducing performance-based pricing, or reallocating volume. Without the data, you’re negotiating on anecdote.
What “good” looks like: This depends entirely on what’s in the SLA. For a simple on-time commitment, 95%+ is standard. For multi-parameter SLAs that include ePOD, time-window precision, and compliance documentation, 90%+ is strong. The key is to measure what you’ve contractually agreed to — not a simplified version of it.
What to do when it’s off: Identify which SLA parameters are failing. If it’s time-window adherence, the issue may be dispatch timing or route planning. If it’s ePOD compliance, it may be a driver training or app adoption issue. If it’s pickup punctuality, it may be a warehouse scheduling problem that the carrier is inheriting. Not every SLA failure is the carrier’s fault — but every SLA failure is measurable.
6. Exception Rate
What it measures: The frequency of delivery exceptions — delays, damages, failed attempts, mis-routes, partial deliveries, or customer complaints — as a percentage of total shipments handled by a carrier.
Why it matters: Exception rate is the metric that tells you how much operational noise a carrier creates. A carrier with a low exception rate is one your dispatch team rarely has to think about. A carrier with a high exception rate consumes disproportionate coordinator time, generates customer service contacts, and introduces unpredictability into your operations.
The highest-value version of this metric segments exceptions by type and by customer cohort. If a carrier’s exceptions are concentrated among your high-LTV customers, the retention impact is disproportionate to the exception count. If exceptions cluster on specific days or routes, the cause is likely systemic and fixable.
What “good” looks like: Top-performing carriers typically maintain exception rates below 3–5% of total shipments. For premium SLA tiers (same-day, time-window), lower thresholds are appropriate. The absolute number matters less than the trend — a carrier whose exception rate is rising quarter over quarter deserves scrutiny, even if the rate itself looks acceptable today.
What to do when it’s off: Pattern analysis first. Exceptions that cluster by geography, day of week, or product type point to a specific operational issue. Exceptions that are distributed randomly across the network suggest a broader capability gap. For the first type, targeted intervention (rerouting specific zones, adjusting dispatch timing) can resolve the issue. For the second, it may be time to evaluate the carrier relationship more fundamentally.
Related: Hub and Spoke Distribution Model: A Logistics Guide
7. Settlement Accuracy
What it measures: The percentage of carrier invoices that match the agreed rates, terms, and shipment records without requiring manual disputes, adjustments, or audit cycles.
Why it matters: This is the KPI nobody talks about in carrier performance reviews, and it’s quietly one of the most expensive. Invoice discrepancies between carriers and shippers are remarkably common — overcharges, accessorial mismatches, duplicate billing, incorrect surcharges. Each dispute consumes finance and operations time. Across a high-volume network with dozens of carriers, inaccurate settlement can leak meaningful cost.
Settlement accuracy also serves as a proxy for carrier operational discipline. A carrier that consistently invoices correctly tends to be operationally tighter across other dimensions as well. A carrier whose invoices require constant correction is usually exhibiting the same loose execution in delivery performance.
What “good” looks like: Target 98%+ invoice accuracy — meaning 98% of invoices match agreed terms without requiring manual intervention. For carriers below this threshold, investigate whether the issue is systemic (their billing system doesn’t capture your rate structure correctly) or incidental (occasional errors on specific service types).
What to do when it’s off: Automate what you can. Digital freight invoicing and audit tools that match shipment records against carrier invoices reduce manual reconciliation effort and surface discrepancies faster. If a specific carrier’s settlement accuracy is persistently low despite feedback, include settlement accuracy as a contractual KPI with financial consequences.
From Scorecard to Allocation: Where the Real Value Lives
Building a carrier scorecard is useful. Reviewing it quarterly is better than not reviewing it at all. But the real competitive advantage emerges when the scorecard stops being a report and starts being an input to allocation decisions.
Consider the difference: a logistics team that reviews carrier performance every quarter and makes manual adjustments to their carrier mix versus a dispatch platform that uses carrier performance data continuously — routing more volume toward carriers with strong OTIF, high first-attempt success, and clean settlement records, and routing less volume toward those trending in the wrong direction. The first approach adjusts every 90 days. The second adjusts every shipment.
This is where performance-based carrier allocation becomes operationally real. The scorecard data feeds the dispatch engine. The dispatch engine makes allocation decisions informed by how each carrier is actually performing right now — not how they performed when the contract was signed.
For enterprises managing hybrid fleet models with dozens of carrier partners, this capability isn’t a nice-to-have. It’s the mechanism that prevents your carrier mix from gradually degrading as the network scales.
Related: Traveling Salesman Problem: What Is It and How to Solve It?
Frequently Asked Questions (FAQs)
How often should I update my carrier scorecard?
Ideally, continuously — with formal reviews monthly or quarterly. The data should update in real time; the conversation about what to do with it should happen on a regular cadence. Quarterly-only updates create a dangerous lag where underperformance goes unaddressed for months.
Should I share scorecard results with my carriers?
Yes. The most productive carrier relationships are built on transparency. Sharing performance data creates accountability, opens the door to joint problem-solving (some issues are caused by your operations, not the carrier), and gives high-performing carriers the recognition that helps retain their commitment to your network.
What if my TMS doesn’t support lane-level carrier analytics?
This is a common limitation of legacy TMS platforms. If you can only track carrier performance at the aggregate level, you’re missing the lane-level and time-slot-level insights where the actionable intelligence lives. It may be worth evaluating platforms that natively connect carrier performance data to dispatch decisions.
How do I handle a carrier that performs well on some KPIs but poorly on others?
Every carrier will have a mixed profile. The question is whether the weak areas are improving, stable, or deteriorating — and whether they affect your highest-priority operations. A carrier with strong OTIF but poor settlement accuracy may be worth keeping if the settlement issues are addressable. A carrier with strong cost-per-delivery but declining first-attempt success may be costing you more than the rate savings suggest.
Can these KPIs be applied to gig-economy and hyperlocal delivery partners?
Yes, with some adaptation. Gig-economy carriers may not have traditional SLA structures, but first-attempt success, exception rate, and cost-per-delivery are just as measurable. The challenge is data capture — ensure your platform can track performance across all carrier types, not just traditional contracted partners.
The Bottom Line
Carrier performance evaluation isn’t a procurement exercise that happens once a year during contract negotiations. It’s an operational discipline that should run continuously, at the granularity where problems become visible before they become expensive.
The seven KPIs in this post — OTIF, first-attempt success, cost-per-delivery by lane, SLA adherence, exception rate, onboarding speed, and settlement accuracy — give you a scorecard that covers the dimensions that actually matter for service quality, cost control, and operational agility. But the scorecard’s value is only as good as what you do with it. The organisations that are pulling ahead are those that connect carrier performance data to dispatch decisions in real time — so the scorecard doesn’t just describe what happened last quarter, but shapes what happens with the next shipment.
Locus’s dispatch management platform connects carrier performance data to allocation decisions continuously — across 1,000+ pre-integrated carriers, with dynamic scoring, automated tendering, and real-time exception management. If your carrier scorecard is currently a quarterly spreadsheet and you think it should be something more, our team can show you what that looks like in practice.
Ishan, a knowledge navigator at heart, has more than a decade crafting content strategies for B2B tech, with a strong focus on logistics SaaS. He blends AI with human creativity to turn complex ideas into compelling narratives.
Related Tags:
General
Easter Logistics: How Retail & Grocery Operations Handle the Spring Surge
Easter grocery surge isn't just a volume problem. It's a time-window precision problem, a cold-chain compliance problem, and a customer-expectation problem, all at once.
Read moreInsights Worth Your Time
How to Evaluate Carrier Performance: 7 KPIs That Actually Matter