Portfolio Performance Measures
Portfolio Performance Measures
Performance measurement is heavily tested because it separates investor return from manager skill and exposes where each falls short. Five sections: return calculations (total, real, geometric vs arithmetic, after-tax, yield types); time-weighted vs money-weighted returns including a hands-on calculator showing how cash-flow timing creates divergence between the manager's performance and the investor's experience; risk-adjusted measures (Sharpe, Treynor, Jensen's alpha — formulas canonical in M3.4, M3.12 covers their use), tracking error, and information ratio; benchmark selection with style-matching discipline; and attribution analysis, GIPS compliance, and survivorship bias. By the end you can read a performance report critically and explain to a client why their actual return diverges from the fund's reported return.
Return calculations
Real vs nominal returns, and geometric vs arithmetic returns
Two return calculations regularly confused:
NOMINAL vs REAL RETURNS
- Nominal return: The actual return earned, unadjusted for inflation. The number on the statement.
- Real return: Inflation-adjusted return, reflecting actual PURCHASING POWER growth. Approximation: real return ≈ nominal return − inflation rate. Precise (Fisher equation): (1 + nominal) ÷ (1 + inflation) − 1.
- Example. Nominal return 8%, inflation 3%. Real return ≈ 5% (approximation); precise: 1.08/1.03 − 1 = 4.85%. Over multi-decade periods, even small inflation differences compound to large purchasing-power differences.
GEOMETRIC vs ARITHMETIC RETURNS
- Arithmetic mean return. Simple average: (R1 + R2 + ... + Rn) ÷ n. Easy to calculate. OVERSTATES actual growth when returns are volatile because it doesn't account for compounding.
- Geometric mean return (CAGR). ((1+R1) × (1+R2) × ... × (1+Rn))^(1/n) − 1. Reflects the COMPOUND growth actually experienced. The correct measure for multi-period returns.
- Example. Returns of +50% then −50%. Arithmetic mean = 0%. Geometric: ((1.50)(0.50))^(1/2) − 1 = (0.75)^0.5 − 1 = −13.4%. The investor ended up with $75 from $100 — the geometric mean is correct; arithmetic is misleading.
- When to use which. Geometric for HISTORICAL performance (what actually happened). Arithmetic for EXPECTED future returns in single-period contexts (academic models).
- Volatility drag. The gap between arithmetic and geometric mean grows with VOLATILITY. Approximate relationship: geometric ≈ arithmetic − (variance ÷ 2). High-volatility strategies suffer significant volatility drag.
After-tax returns and yield types
AFTER-TAX RETURN. The return remaining after paying federal, state, and (where applicable) local taxes on dividends, interest, and realized capital gains. Materially different from nominal returns for taxable accounts.
- Calculation: Pre-tax return × (1 − effective tax rate). The effective tax rate depends on the character of income (ordinary, qualified dividend, LTCG, tax-exempt) and the investor's tax bracket.
- Tax-equivalent yield (TEY). Used to compare TAX-EXEMPT munis to TAXABLE bonds. TEY = tax-exempt yield ÷ (1 − marginal tax rate). Example: 3% muni for an investor in 37% bracket has TEY = 3% / (1 − 0.37) = 4.76%. A taxable bond would need to yield 4.76% to leave the same after-tax amount.
- After-tax return varies by ACCOUNT TYPE. Tax-deferred accounts (Traditional IRA, 401(k)): no annual tax drag; tax at withdrawal. Tax-free accounts (Roth, HSA for medical): no tax. Taxable accounts: annual tax on dividends and interest; capital gains tax on realized gains.
YIELD TYPES (review)
- Current yield: Annual income ÷ current price. For a bond: coupon ÷ price. Quick measure but ignores capital appreciation/depreciation toward maturity.
- Yield to maturity (YTM): Total return assuming the bond is held to maturity, including coupon reinvestment at the same yield. The most comprehensive bond yield measure.
- Yield to call (YTC): Similar to YTM but assumes the bond is called at the first call date. For premium bonds, YTC is often LOWER than YTM — the conservative (lower) of the two is the “yield to worst” (YTW).
- Dividend yield: Annual dividend ÷ current stock price. Most useful for income-focused equity investing.
Worked Example — Time-Weighted vs. Dollar-Weighted
Scenario: A client invests $100,000 in a fund on Jan 1.
- Year 1: Fund returns +20% → value = $120,000
- Client adds $100,000 on Jan 1 of Year 2 → total = $220,000
- Year 2: Fund returns −10% → value = $198,000
Time-weighted return: (1.20) × (0.90) − 1 = +8% (over 2 years). Ignores the timing of the $100K addition. Measures the fund manager's performance.
Dollar-weighted return: Much lower — the client had more money invested during the losing year ($220K) than during the winning year ($100K). The actual money-weighted return is approximately +0.9%. This reflects the investor's actual experience.
Key takeaway: The same fund, same manager, same returns — but the investor's experience was dramatically different from the fund's reported performance because of poor timing on the additional investment.
An investor's portfolio has a nominal return of 9% for the year. Inflation during the same period was 4%. The REAL (inflation-adjusted) return, using the precise Fisher equation, is approximately:
A portfolio earned +30% in Year 1 and −20% in Year 2. The ARITHMETIC mean return is 5% per year. The GEOMETRIC mean return (CAGR) is approximately:
An investor in the 37% federal tax bracket is choosing between a TAX-EXEMPT municipal bond yielding 3.5% and a corporate bond. To leave the investor with the SAME AFTER-TAX YIELD, the corporate bond must yield approximately:
TWRR vs MWRR
Time-Weighted Return (TWRR) — the manager's measure
TWRR ELIMINATES the impact of cash flows (deposits and withdrawals) controlled by the investor. Isolates the PORTFOLIO MANAGER'S investment decisions from the investor's allocation timing.
- Mechanics. Divide the measurement period into SUB-PERIODS bounded by external cash flows. Calculate the period return for each sub-period. Chain-link (multiply) the sub-period returns to get the total TWRR.
- Formula: TWRR = ((1+R1) × (1+R2) × ... × (1+Rn))^(1/n) − 1, where each R is a SUB-PERIOD return based on starting and ending values within that sub-period (not the total period).
- Why it's the manager's measure. The manager doesn't control client deposits/withdrawals. By isolating sub-period returns, TWRR shows what the manager's actual investment decisions produced — comparable across managers and benchmarks.
- Used by GIPS standards. The CFA Institute's Global Investment Performance Standards (GIPS) require TWRR for performance reporting to fairly compare managers regardless of client cash-flow patterns.
- Reported on mutual fund prospectuses. Mutual fund “total return” figures are TWRR. Same investor in the same fund can experience very different actual returns based on the timing of their purchases and sales.
Example. Fund returns +20% in Year 1, then −10% in Year 2. TWRR = (1.20)(0.90) − 1 = 8% over 2 years (annualized 3.92%). This is the fund's reported return regardless of when individual investors entered or exited.
Money-Weighted Return (MWRR / IRR) — the investor's actual experience
MONEY-WEIGHTED RETURN (also called Dollar-Weighted Return or Internal Rate of Return / IRR) REFLECTS the impact of cash-flow timing on the investor's actual experience. Larger cash flows have larger weight in the calculation.
- Mechanics. MWRR is the DISCOUNT RATE that makes the NET PRESENT VALUE of all cash flows (initial investment, additional deposits, withdrawals, ending value) equal to ZERO. It's the IRR of the investor's personal cash-flow series.
- Mathematically: 0 = CF0 + CF1/(1+r) + CF2/(1+r)^2 + ... + CFn/(1+r)^n. Solved iteratively (financial calculator or spreadsheet).
- Sensitive to timing. Larger cash flows during favorable periods boost MWRR; larger cash flows during unfavorable periods drag MWRR. The investor's timing skills (or luck) are captured.
- Reflects the investor's ACTUAL experience. Two investors in the same fund can have very different MWRRs based on when they invested or withdrew. Same fund returns — different personal results.
- Used for INDIVIDUAL investor performance reporting and PRIVATE EQUITY funds (where cash flows are concentrated and irregular). Buffett's legendary returns are TWRR-style; LP returns in PE funds use IRR.
The cardinal test rule: If the question asks about EVALUATING THE MANAGER — use TWRR. If the question asks about WHAT THE INVESTOR ACTUALLY EARNED — use MWRR.
TWRR vs MWRR calculator — see the divergence
Same fund returns, different cash-flow timing. Watch how big a deposit before a bad year creates a large MWRR-vs-TWRR gap.
A portfolio earned +10% in Q1, −5% in Q2, +8% in Q3, and +3% in Q4 of a given year. There were no external cash flows. The TIME-WEIGHTED RETURN for the full year is:
To evaluate a PORTFOLIO MANAGER'S SKILL independently of client cash flows, the BEST measure is:
A fund's TIME-WEIGHTED RETURN is +12% for the year, but a particular INVESTOR'S MONEY-WEIGHTED RETURN in the same fund is only +3%. The MOST LIKELY explanation is:
Risk-adjusted performance measures
Risk-adjusted return measures — interpretation
The formulas for Sharpe, Treynor, and Jensen's alpha are canonical in M3.4 capital-market-theory. This section focuses on their INTERPRETATION and USE in performance evaluation.
- SHARPE RATIO = (Portfolio return − risk-free rate) ÷ portfolio standard deviation. Measures EXCESS RETURN per unit of TOTAL RISK (volatility). Use when evaluating a STANDALONE portfolio or comparing portfolios with different risk levels. Higher is better. Doesn't require the portfolio to be diversified.
- TREYNOR RATIO = (Portfolio return − risk-free rate) ÷ portfolio beta. Measures EXCESS RETURN per unit of SYSTEMATIC RISK (market exposure). Use when the portfolio is well-DIVERSIFIED (idiosyncratic risk already eliminated). Higher is better. Less meaningful for poorly-diversified portfolios where beta doesn't fully capture risk.
- JENSEN'S ALPHA = Portfolio return − expected return predicted by CAPM. Measures EXCESS RETURN BEYOND what CAPM predicts given the portfolio's beta. Positive alpha = MANAGER ADDED VALUE; negative alpha = manager destroyed value; zero alpha = manager performed in line with risk taken. The most direct measure of manager skill.
Use them together. A high alpha with high Sharpe and Treynor strongly suggests genuine manager skill. High alpha with low Sharpe suggests excessive risk-taking. Inconsistency between Sharpe (total risk) and Treynor (systematic risk) suggests the portfolio is poorly diversified.
Important interpretation caveat: all three measures use HISTORICAL data. Past risk-adjusted performance doesn't guarantee future results. Statistical significance requires multiple years of data; short-term Sharpe/alpha figures are noisy.
Tracking error and information ratio
For ACTIVELY MANAGED portfolios benchmarked to an index, two additional measures evaluate the active-management quality:
- TRACKING ERROR. The STANDARD DEVIATION of the difference between portfolio returns and benchmark returns. Low tracking error = portfolio closely follows benchmark; high tracking error = significant active bets vs benchmark. Index funds: tracking error near zero (by design). Concentrated active funds: tracking error of 5-15% common.
- INFORMATION RATIO (IR) = (Portfolio return − benchmark return) ÷ Tracking error. Measures EXCESS RETURN per unit of ACTIVE RISK. Similar in concept to Sharpe but uses BENCHMARK (not risk-free rate) as the comparison and TRACKING ERROR (not total volatility) as the risk measure.
- IR interpretation: IR > 0.5 considered good for an active manager; IR > 1.0 considered exceptional and rare over multi-year periods. Most active managers struggle to consistently produce positive IR after fees.
- Why IR matters. For active management to be worthwhile, the manager must produce excess return WORTH the active risk taken. A manager who beats the benchmark by 1% with 5% tracking error has IR = 0.2 — close to luck given the active risk. A manager who beats by 1% with 1% tracking error has IR = 1.0 — meaningful skill if sustained.
Active share. A complementary measure: the percentage of portfolio holdings that DIFFER from the benchmark. Higher active share = more genuine active management. Funds with low active share but high fees are “closet indexers” — a focus of recent regulatory scrutiny.
An investor compares Fund A (Sharpe ratio 1.2) and Fund B (Sharpe ratio 0.6) for inclusion as a STANDALONE investment. Both have similar return levels. The RELEVANT INSIGHT is:
An ACTIVE manager has a return of 12% vs a benchmark return of 10% (excess return 2%). Their TRACKING ERROR is 1%. Their INFORMATION RATIO is:
Benchmark selection
Benchmark selection — the SAMURAI properties
The CFA Institute teaches the SAMURAI properties of a good benchmark. An appropriate benchmark should be:
- S — Specified in advance. Identified BEFORE the period being measured. Choosing a benchmark after the fact is cherry-picking.
- A — Appropriate. Same asset class, market cap range, geographic exposure, and investment style as the portfolio.
- M — Measurable. Performance can be calculated and reported on a regular and timely basis.
- U — Unambiguous. The identity and weights of constituent securities are clearly defined.
- R — Reflective of current investment opinion. The manager has views (positive, negative, or neutral) on the constituents.
- A — Accountable. The investor has accepted the manager's use of this benchmark.
- I — Investable. The investor could actually purchase the benchmark as a passive alternative. If the benchmark isn't investable, performance comparison is theoretical.
Common benchmark mismatches the exam tests:
- SMALL-CAP fund benchmarked to S&P 500 (large-cap). Mismatched. Use Russell 2000 or S&P 600.
- VALUE fund benchmarked to broad-market index. Mismatched. Use Russell 1000 Value or S&P 500 Value.
- INTERNATIONAL fund benchmarked to S&P 500. Mismatched. Use MSCI EAFE or MSCI ACWI ex-US.
- BALANCED fund benchmarked only to S&P 500. Mismatched. Use blended benchmark (e.g., 60% S&P + 40% Bloomberg Agg).
- EMERGING MARKETS fund benchmarked to MSCI EAFE (developed). Mismatched. Use MSCI Emerging Markets.
Benchmark mismatch creates ARTIFICIAL alpha during periods when the portfolio's style is favored vs the mismatched benchmark — and DESTRUCTIVE alpha when the style is out of favor. Always confirm benchmark appropriateness when reviewing performance.
An EMERGING MARKETS equity fund focused on Chinese, Indian, and Brazilian stocks should be benchmarked against:
A SMALL-CAP GROWTH equity fund should be benchmarked against:
A portfolio outperformed its benchmark by 4%. Attribution analysis shows: sector allocation effect +3.5%, security selection effect +0.3%, interaction effect +0.2%. The INSIGHT this provides:
A small investment advisory firm wants to advertise its performance to attract institutional clients. To claim GIPS COMPLIANCE, the firm must:
Attribution, GIPS, and pitfalls
Performance attribution — sector vs security selection
PERFORMANCE ATTRIBUTION decomposes a portfolio's excess return (vs benchmark) into the SOURCES of that excess: sector allocation, security selection, and interaction effects. Helps diagnose where a manager added or destroyed value.
- Sector (allocation) effect. Excess return from being OVERWEIGHT or UNDERWEIGHT sectors relative to the benchmark. If the portfolio was overweight tech and tech outperformed, the allocation effect is positive. Captures top-down macro/sector bets.
- Security selection effect. Excess return WITHIN each sector from picking individual securities that outperformed the sector index. Captures bottom-up stock-picking skill.
- Interaction effect. The cross-term: being overweight a sector AND picking outperforming securities within it compounds the impact. Usually a small residual.
Example. Portfolio beats benchmark by 3%. Attribution: sector allocation +1.5% (the manager was overweight a winning sector), security selection +1.2% (picked above-average stocks), interaction +0.3%. Conclusion: roughly equal contribution from top-down and bottom-up — balanced manager.
Brinson-Fachler model. The standard methodology for attribution analysis. Most performance reports break down attribution by sector/region using this framework.
Why attribution matters. Two managers can both beat the benchmark by 3% but with very different drivers. A manager whose alpha comes entirely from sector bets (allocation effect) is a top-down strategist; a manager whose alpha comes from security selection is a bottom-up stock picker. Different styles deserve different evaluation criteria and produce different patterns of future returns.
GIPS — Global Investment Performance Standards
The Global Investment Performance Standards (GIPS) are voluntary, industry-standard guidelines for CALCULATING AND REPORTING investment performance. Maintained by the CFA Institute. Designed to ensure FAIR REPRESENTATION and FULL DISCLOSURE of performance and to enable APPLES-TO-APPLES COMPARISON between investment managers.
- Voluntary but widely adopted. GIPS compliance is a powerful credibility signal for institutional managers. Most reputable institutional managers claim GIPS compliance; failure to comply is a competitive disadvantage in institutional mandates.
- Composite reporting. Managers report on COMPOSITES — groupings of all DISCRETIONARY accounts with similar strategies. Cannot cherry-pick best-performing accounts. Composite returns are weighted aggregates of all accounts in the strategy.
- TWRR required. Performance must be calculated using TIME-WEIGHTED returns to remove the impact of client cash flows. (MWRR may be reported as supplemental information.)
- Five-year minimum. Compliance requires at least five years of GIPS-compliant performance history (or since inception if less). Managers can't selectively present favorable shorter periods.
- Survivorship-bias avoidance. Closed or terminated portfolios must remain in composite returns for the period they were active. Cannot retroactively remove discontinued strategies that performed poorly.
- Required disclosures. Composite description, fees, currency, dispersion of returns, three-year ex-post standard deviation, gross and net returns, benchmark.
- Verification. Optional but valued: an independent verification firm reviews the manager's GIPS compliance. Verified compliance is a higher signal than self-claimed.
Survivorship bias and other performance pitfalls
Several systematic biases distort performance comparison if not properly handled:
- SURVIVORSHIP BIAS. When studying historical fund performance, only the funds that SURVIVED are typically available. Funds that closed (often due to poor performance) drop out of databases. The remaining sample looks better than the original universe did, OVERSTATING historical returns. Estimated impact: 1-2% per year overstatement for actively managed equity funds.
- BACKFILL BIAS. Hedge funds and others can choose when to start reporting to a database. They typically start reporting AFTER a period of strong returns, then have those returns BACKFILLED into the database history. Inflates apparent historical performance.
- SELECTION BIAS. Voluntary reporting databases include only managers who CHOOSE to report. Poor-performing managers may withdraw, leaving the dataset skewed positive.
- END-OF-PERIOD BIAS. Performance can be cherry-picked by choosing a start and end date favorable to the conclusion. Reputable analyses use multiple start dates or rolling periods.
- FEE TREATMENT. Comparing GROSS returns (before fees) of one fund to NET returns (after fees) of another distorts comparison. Always compare like-to-like.
- BENCHMARK SHIFTING. Managers occasionally change benchmarks after a poor period. Reputable performance reporting flags benchmark changes and shows both old and new comparison.
The exam tests recognition of these biases and the principle that PERFORMANCE COMPARISONS REQUIRE CARE — standardized methodologies like GIPS exist to address these distortions.
A research study claims actively managed equity mutual funds returned 9% per year over the past 20 years — outperforming the S&P 500's 8% return. The DATABASE used includes only funds CURRENTLY OPERATING. The conclusion most likely suffers from:
A financial adviser presents their performance: “Over the past 5 years, my portfolio recommendations returned 11% annually vs the S&P 500's 9%.” To CRITICALLY EVALUATE this claim, the prospect should ask:
Chapter summary
Types of returns — baseline overview
Understanding different return calculations is critical for evaluating portfolio performance:
- Total return: Includes both income (dividends, interest) and capital appreciation. The most comprehensive return measure.
- Holding period return: Total gain or loss over the period an investment is held.
- Annualized return: Holding period return converted to a per-year basis for comparison across investments.
- Cumulative return: Total return over the entire period (not annualized). Useful for long-term comparisons.
Time-weighted vs. dollar-weighted returns — baseline
Time-Weighted Return (TWRR)
- Eliminates the impact of cash flows (deposits and withdrawals)
- Best measure of a portfolio manager's performance
- Used by the CFA Institute's GIPS standards
Dollar-Weighted Return (IRR / MWRR)
- Reflects the impact of the timing and size of cash flows
- Best measure of the investor's actual experience
- Equivalent to the internal rate of return (IRR) on the cash flow series
Other performance concepts — baseline
- Expected return: Probability-weighted average of possible outcomes.
- Inflation-adjusted (real) return: Nominal return minus inflation. Reflects actual purchasing power growth.
- After-tax return: Return after accounting for taxes on dividends, interest, and capital gains.
- Current yield: Annual income (dividends or interest) divided by current price.
- Yield to maturity (YTM): Total return assuming a bond is held to maturity, including reinvested coupons.
Returns — complete reference
| Return Measure | What It Captures | When to Use |
|---|---|---|
| Total return | Income + capital appreciation | Most comprehensive measure; standard for reporting |
| Time-weighted | Manager's investment skill (removes cash-flow effects) | Evaluating fund/portfolio managers; GIPS standard |
| Dollar-weighted (IRR) | Investor's actual experience (includes cash-flow timing) | Evaluating client's actual return; private equity LP returns |
| Real return | Nominal return adjusted for inflation | Long-horizon planning; purchasing-power analysis |
| After-tax return | Return net of taxes on income and gains | Comparing taxable vs tax-advantaged accounts |
Benchmarking — baseline
An appropriate benchmark should be:
- Investable: A real, accessible alternative the investor could choose instead
- Style-matched: Same asset class, market cap range, and investment style as the portfolio
- Consistent: Applied over time without cherry-picking favorable comparisons
- Specified in advance: Identified before the period being measured
- Unambiguous: Clear constituent securities and weights
A SMALL-CAP GROWTH fund benchmarked to the S&P 500 (large-cap blend) creates artificial alpha when small-caps outperform large-caps regardless of manager skill — a benchmark-mismatch trap. Use the Russell 2000 Growth Index for small-cap growth funds.
- “Use TWRR to report individual investor performance.” WRONG — TWRR evaluates the manager. MWRR/IRR reflects the investor's actual experience.
- “Arithmetic mean is the correct measure of historical multi-period return.” WRONG — geometric mean (CAGR) reflects compounding. Arithmetic overstates for volatile returns.
- “Sharpe ratio uses beta as the risk measure.” WRONG — Sharpe uses TOTAL risk (standard deviation). Treynor uses beta.
- “Positive Jensen's alpha is automatic if the portfolio outperformed.” WRONG — alpha measures excess return BEYOND CAPM's prediction given the portfolio's beta. A portfolio with high beta should outperform in up markets even without manager skill.
- “Low tracking error means the manager added value.” WRONG — low tracking error means the manager closely followed the benchmark. Index funds have near-zero tracking error by design. High tracking error with positive excess return is meaningful active management.
- “The S&P 500 is an appropriate benchmark for any US equity fund.” WRONG — benchmark must MATCH the fund's style (cap, growth/value). Small-cap funds use Russell 2000; growth funds use a growth index, etc.
- “GIPS compliance is required by law for US investment advisers.” WRONG — GIPS is VOLUNTARY. Required only if the firm claims compliance. Institutional clients often require it.
- “Survivorship bias inflates HEDGE FUND returns more than mutual funds.” WRONG — both are affected, hedge funds severely. But the exam tests recognition of the concept, not relative magnitude.
- “Tax-equivalent yield is used to compare two taxable bonds.” WRONG — TEY is for comparing a TAX-EXEMPT muni to a taxable bond. Two taxable bonds compare directly on stated yields.
- “A manager with positive sector allocation effect must also be a good stock picker.” WRONG — allocation effect and selection effect are SEPARATE. A manager can be skilled at one but not the other.
A fund's time-weighted return is 12% but a particular investor's dollar-weighted return in the same fund is only 3%. The MOST likely explanation is:
Test yourself with exam-style questions on this topic.