Section 3 Intervention and Practice

Practice Evaluation and Research

32 min read · Lesson 3 of 4

Practice Evaluation and Research

This domain (IIIC) asks two questions: can you evaluate your own work with an individual client, and can you evaluate an agency's programs at the system level? The 2026 outline leans into two areas it used to underplay: the specific instruments and techniques for measuring practice effectiveness (IIIC.2), and the full toolkit for evaluating programs (IIIC.3: needs assessment, formative and summative evaluation, process and outcome evaluation, cost-effectiveness, and cost-benefit). Four sections carry it: EBP and research-design foundations, practice evaluation with individual clients, agency program evaluation, and research ethics and design.

Evidence-Based Practice

Evidence-based practice rests on three equally weighted parts, and the common misread is to treat it as research applied rigidly. It is the opposite: research used to inform a decision while you account for the particular client in front of you.

1
Best research evidence
Peer-reviewed studies and clinical trials
2
Clinical expertise
Professional judgment and experience
3
Client values
Cultural background, goals, preferences
Section 1 of 4 ~5 min

EBP & research design foundations

Evaluating Practice Outcomes

Good intentions are not evidence. At some point you have to ask whether your intervention actually worked, and that means measuring it:

Single-system design (N=1)
Tracking one client's progress over time with baseline and intervention phases
Pre-post testing
Measuring outcomes before and after treatment to determine change
Goal Attainment Scaling
Measuring progress toward individualized treatment goals on a defined scale
Client satisfaction measures
Gathering feedback on service delivery quality and client experience
Concept Check

A social worker reviews the research literature on PTSD treatment but notes the client expresses strong preferences for a different modality grounded in her cultural and spiritual practices. Under evidence-based practice principles, the MOST appropriate response is to:

(Cognitive Level: Application) The MOST appropriate response integrates research evidence, professional judgment, and client values in collaborative decision-making, the three-component framework that defines evidence-based practice. Privileging research alone treats EBP as rigid application of literature, the framing the model explicitly rejects. Privileging client preference alone abandons the evidence dimension. Reflexive referral on cultural grounds bypasses the worker's responsibility to integrate the client's cultural context into evidence-informed care.
Research design basics: quantitative, qualitative, mixed methods

QUANTITATIVE research uses numbers and statistical analysis to test hypotheses (experiments, surveys, structured observation). QUALITATIVE research explores meaning through words, themes, and narrative (in-depth interviews, focus groups, ethnography, case studies). MIXED METHODS combines both, leveraging the strengths of each. The exam tests whether you can match design to question: "what is the prevalence of X" calls for quantitative survey; "how do clients experience X" calls for qualitative interview; "does this intervention work AND how do clients experience it" calls for mixed methods. Single design fits a single question; complex questions often require mixed approaches.

Concept Check

A research team wants to understand the lived experience of clients completing a substance use recovery program: how clients describe their journey, the meaning they ascribe to setbacks, and what aspects of the program felt most transformative. The MOST appropriate research approach is:

(Cognitive Level: Application) The MOST appropriate approach is qualitative research using in-depth interviews and thematic analysis. The question seeks meaning, lived experience, and subjective accounts, which qualitative methods are designed to capture. Quantitative survey research with standardized scales would miss the meaning-making the research team is asking about. Quasi-experimental comparison answers a different question about treatment effectiveness. Single-system design tracks individual change over time but does not surface the meaning the team is investigating.

Data Collection and Analysis Methods

If you touch research or program evaluation, you need a working sense of how data gets gathered and what each method costs you:

  • Surveys and questionnaires reach large samples but lean on self-report, which can be inaccurate.
  • Interviews yield rich qualitative detail but take time and can pick up interviewer bias.
  • Direct observation captures real behavior, though the observer's presence can change the very behavior being watched (the Hawthorne effect).
  • Record review draws on existing documentation like case files and medical records, where the data may be incomplete or inconsistently kept.
  • Focus groups spark group discussion but can be skewed by a dominant participant.

On the analysis side: descriptive statistics (mean, median, mode) summarize data; inferential statistics (t-tests, chi-square) test whether a finding is statistically significant; and qualitative analysis (coding and theme identification) organizes non-numerical data into patterns.

Concept Check

A researcher notices that participants in a study change their behavior simply because they know they are being observed. This phenomenon is BEST described as:

(Cognitive Level: Recall) The phenomenon is BEST described as the Hawthorne effect: participants modify their behavior because they are aware of being observed, regardless of the specific variable being studied. Selection bias relates to how participants are chosen for the study and the systematic differences between included and excluded individuals. Social desirability bias refers to participants giving responses they believe are socially acceptable rather than accurate. The Hawthorne effect specifically attaches to awareness of observation, not the sample or the response style.
Section 2 of 4 ~8 min

Evaluating practice with individual clients

Concept Check

Administering a standardized depression scale before a 12-week intervention and again at its conclusion, a social worker is using the evaluation method BEST described as:

(Cognitive Level: Recall) The evaluation method is BEST described as pre-post testing: a single measure administered before and after an intervention to detect change in the outcome variable. Single-system design typically involves repeated measures across multiple baseline and intervention phases, not just two points. Qualitative analysis uses non-numerical data. A randomized controlled trial requires random assignment to treatment and control conditions, which is not described here. Pre-post is the simplest outcome-evaluation design and the one most commonly used in routine practice.

Practice evaluation instruments and techniques (IIIC.2)

The outline wants the specific tools workers use to evaluate practice, not just the broad frameworks above. The categories to know:

Rapid Assessment Instruments (RAIs)

Brief, validated self-report tools designed for repeat administration. Used to track change session-by-session or week-by-week. Examples: PHQ-9 (depression), GAD-7 (anxiety), OQ-45 (general distress), URICA (readiness for change). The defining features: short, easy to score, sensitive to change over short periods.

Standardized outcome measures

Validated instruments with established reliability and validity, often used in pre-post designs. Examples: Beck Depression Inventory (BDI-II), Outcome Questionnaire (OQ-45), Children's Depression Inventory (CDI), Trauma Symptom Inventory. Selected to match the target outcome.

Treatment fidelity measures

Track whether the intervention was delivered as intended (also called treatment integrity). Includes checklists of required intervention components, session audio review by independent rater, supervisor review of session notes against a model fidelity rubric. Critical for evaluating evidence-based treatments.

Behavioral observation

Structured tracking of observable behaviors over time (frequency counts, duration recording, interval sampling). Often paired with single-system designs. Strength: not vulnerable to self-report bias. Weakness: observer effects, requires training and inter-rater reliability checks.

Triangulation across instruments. No single instrument captures the full picture. Best practice combines: a standardized symptom measure (e.g., PHQ-9), a functional measure (e.g., work attendance, sleep), a relational measure (e.g., satisfaction, alliance), and clinician observation. Convergence across sources increases confidence; divergence is itself data.

Treatment fidelity is the most commonly under-recognized concept on this section. When a manualized intervention does not produce expected outcomes, the first question is whether the intervention was actually delivered as designed. Without fidelity data, lack of effect could mean the treatment does not work OR could mean the treatment was not really delivered.

Concept Check

Tracking a single client's anxiety scores weekly for four weeks during a baseline phase, then weekly for eight weeks during a CBT intervention phase, then weekly for four weeks during a withdrawal phase, the social worker's evaluation design is BEST described as:

(Cognitive Level: Application) The design is BEST described as a single-system (N=1) design comparing baseline and intervention phases across time. The defining features (a single case, repeated measures, distinct phases including baseline, intervention, and withdrawal) match the single-system design used for evaluating practice with individual clients. Pre-post testing uses only two measurement points, not phase-based repeated measures. A qualitative case study collects narrative rather than scaled data. Goal attainment scaling rates progress on individualized goal levels and does not by itself create a phased experimental structure.
Reliability and validity: the relationship the exam tests

RELIABILITY = consistency of the measurement (does the tool give similar results when administered repeatedly under similar conditions?). VALIDITY = whether the tool measures what it is intended to measure (does the depression scale actually measure depression, not anxiety or general distress?). The exam tests the asymmetry: a tool can be RELIABLE without being VALID (a broken bathroom scale that always reads five pounds high is highly reliable, not valid). A tool CANNOT be valid without being reliable (a scale that gives wildly different readings cannot be measuring weight accurately). Reliability is necessary but not sufficient for validity. Common reliability types: test-retest, inter-rater, internal consistency (Cronbach's alpha). Common validity types: face, content, construct, criterion (concurrent and predictive).

Concept Check

A measurement tool is BEST described as RELIABLE when it:

(Cognitive Level: Recall) A measurement tool is BEST described as RELIABLE when it produces consistent results across repeated administrations under similar conditions. Validity, not reliability, refers to whether a tool measures what it is intended to measure (the construct it claims to assess). IRB approval relates to research ethics review and does not describe the psychometric properties of the instrument. A tool can be reliable without being valid (consistent but measuring the wrong thing), but cannot be valid without being reliable.

Case Recording and Documentation

Documentation is two things at once: a clinical tool and a legal safeguard. Good case recording follows a few principles:

  • Timeliness: write it up as soon as you can after the session, while the details are fresh.
  • Accuracy: keep facts ("client stated...") separate from clinical impressions ("this suggests...").
  • Relevance: include only what bears on service delivery and the treatment goals.
  • SOAP format: a common structure, Subjective (the client's report), Objective (observable data), Assessment (your clinical analysis), and Plan (next steps).
  • Plain language: records should make sense to other professionals who may read them.
  • Client access: records can be reviewed by clients, subpoenaed in court, or audited by regulators.
Concept Check

Working with a client recovering from a traumatic injury, a social worker sets specific individualized goals (returning to work part-time, walking unassisted for ten minutes, sleeping six hours per night) and creates a five-level scale for each goal from much worse than expected to much better than expected. The evaluation method BEST described is:

(Cognitive Level: Application) The evaluation method BEST described is Goal Attainment Scaling: each client-specific goal is operationalized into a five-level scale from much worse to much better than expected, and progress is measured against the individualized scale rather than a standardized norm. A standardized symptom inventory uses fixed validated items applied to all clients, not individualized goals. A client satisfaction measure captures service experience rather than goal-based outcomes. A single-system AB design specifies phases and is a study structure, not a scaling approach.

Interdisciplinary and Intradisciplinary Collaboration

Social workers rarely work alone, and the exam expects you to know the language of collaboration:

  • Interdisciplinary teams bring together different disciplines (social work, medicine, nursing, psychology, education) around shared client goals.
  • Intradisciplinary collaboration is social workers working with other social workers across settings or specializations.
  • Your unique contribution on the team is the person-in-environment perspective; no other discipline brings that lens.
  • Shared terminology: you need a basic grasp of legal, medical, and educational vocabulary to communicate well across a team.
  • Ethics: your primary obligation stays with the client, even when the team's recommendation runs against the client's wishes, and you advocate for those wishes.
Concept Check

Documenting a session in SOAP format, a social worker writes: 'Client reported sleeping three hours per night and feeling hopeless. Affect flat; client tearful when discussing job loss. Symptoms consistent with major depressive episode; suicide risk low. Will schedule weekly sessions and refer for psychiatric evaluation.' The entry BEST illustrates which SOAP element ordering?

(Cognitive Level: Application) The entry BEST illustrates Subjective, then Objective, then Assessment, then Plan. The client's self-report of three hours of sleep and feeling hopeless is Subjective (S). The observed flat affect and tearfulness are Objective (O). The clinical formulation of major depressive episode and suicide risk level is Assessment (A). The session-end actions of weekly sessions and psychiatric referral are Plan (P). The other orderings rearrange these into sequences that do not match the SOAP convention.
Section 3 of 4 ~12 min

Evaluating agency programs

Needs assessment: the front end of program evaluation (IIIC.3)

Before anyone designs a program, a needs assessment asks what a community actually requires. It answers a chain of questions: who needs what, in what quantity, against what existing supply, leaving what gap?

Standard methods:

  • Key informant interviews. Structured conversations with community leaders, service providers, and people with direct knowledge of the population. Strong for context; vulnerable to the informants' perspective.
  • Community forums and town halls. Open meetings where community members describe needs. Strengths: voice, transparency, organizing potential. Limitations: dominant voices, attendance bias.
  • Surveys. Structured questionnaires distributed to the population. Reach more people; vulnerable to response rates and self-report bias.
  • Focus groups. Facilitated discussions with small groups of community members. Generate nuanced data on shared concerns and language.
  • Secondary data analysis. Existing data sources: census data, health department statistics, school records, social service utilization, hotline calls. Cheap, broad, lagged.
  • Service utilization data. Which services are full, which have waiting lists, who is being turned away, what populations are absent from the caseload despite presumed need.

The IIIC.3 framework distinguishes:

  • Normative need. A standard set by experts or professional bodies (e.g., a recommended caseload size, a target screening rate).
  • Felt need. What people say they need when asked.
  • Expressed need. Demand for services actually used or sought (utilization data, waiting lists).
  • Comparative need. Need identified by comparing this community to similar communities with different service levels.

The four types of need rarely agree. A community may have high normative need (per expert standards), low felt need (residents do not name it as a problem), low expressed need (no one is asking for services), but high comparative need (similar communities have much more service). The disagreement itself shapes program design: a program addressing normative need without acknowledging felt need will fail to engage; a program responding only to expressed demand will miss invisible populations.

Concept Check

A community health center plans to launch a new mental health program but is unsure which populations to target, what services to offer, or what existing resources already meet community needs. Before designing the program, the MOST appropriate evaluation activity is to:

(Cognitive Level: Reasoning) The MOST appropriate activity is a needs assessment documenting population characteristics, service gaps, and existing resources to inform program design. Needs assessment precedes program design and answers the question of what the program should look like. A pilot with outcome data presumes the program is already designed. Formative evaluation of a logic model presumes the program has been conceptualized in a logic model already. Cost-effectiveness analysis compares program options but is downstream of identifying needs.

Formative vs. summative evaluation

This is the most basic split in program evaluation, and the outline names both. The exam checks whether you can tell which one a scenario is describing.

Formative evaluation

  • Purpose: improve the program WHILE it is running
  • Timing: early and ongoing
  • Audience: program staff and managers
  • Output: recommendations for adjustment
  • Examples: mid-program staff focus groups, six-month process review, fidelity checks, pilot evaluation

Summative evaluation

  • Purpose: judge the program's overall impact
  • Timing: at the end (or end of cycle)
  • Audience: funders, policymakers, agency leadership
  • Output: verdict on continuation, expansion, or termination
  • Examples: end-of-grant outcome report, three-year impact study, cost-effectiveness analysis

The exam trap: a program reports outcomes mid-cycle to a board considering early termination. Is this formative or summative? It is summative in PURPOSE (judging continuation) even though it is mid-cycle in TIMING. Purpose drives the classification, not timing alone.

Concept Check

Six months into a two-year pilot program, an evaluation team conducts interviews with staff and clients to identify what is working, what is not, and what should be adjusted before the program continues. This activity is BEST described as:

(Cognitive Level: Reasoning) The activity is BEST described as formative evaluation: it is conducted DURING implementation, gathers information about what is and is not working, and informs ongoing program adjustment rather than rendering a final judgment. Summative evaluation occurs at the end of a program to assess overall outcomes, not at the midpoint to guide course corrections. Process evaluation specifically examines operational fidelity and implementation quality, which is one type of formative activity but a narrower frame than the broad mid-implementation review described here.

Process vs. outcome evaluation

The second core split. Process evaluation asks what is happening; outcome evaluation asks what changed.

  • Process evaluation. Documents what the program ACTUALLY does. Captures activities, services delivered, populations reached, fidelity to model, dosage of services, dropout patterns. Answers: Are we doing what we said we would do? Are we reaching whom we said we would reach? Are we delivering at the intensity we planned? Process evaluation is also called IMPLEMENTATION evaluation. Critical when interpreting any outcome evaluation: if outcomes are weak but process data show low fidelity, the issue is implementation, not the model.
  • Outcome evaluation. Measures the CHANGES the program produced. Short-term outcomes (knowledge, attitudes, skills); intermediate outcomes (behavior, status); long-term outcomes (sustained change, system-level shifts). Often visualized in a LOGIC MODEL: inputs → activities → outputs → outcomes (short, intermediate, long-term) → impact.
  • Impact evaluation. A specific kind of outcome evaluation that tries to attribute change to the program (rather than to other factors). Strongest when it uses a comparison group (randomized or matched). Without a comparison, observed outcomes may be due to maturation, history, or other influences, not the program itself.

Inputs → Activities → Outputs → Outcomes: the logic-model vocabulary the exam may use. INPUTS are resources (funding, staff, facilities, partnerships). ACTIVITIES are what the program does (workshops, counseling sessions, case management). OUTPUTS are units of activity (number of clients served, sessions delivered, materials distributed). OUTCOMES are the resulting changes in clients, families, communities. Outputs are NOT outcomes: "we served 500 clients" is an output; "client depression scores decreased by an average of 6 points" is an outcome. This is a frequently-tested distinction.

Cost-effectiveness and cost-benefit analysis

Two related economic methods the outline names, close enough to confuse and distinct enough to test.

Method What it compares Output When to use
Cost-effectiveness Program cost vs. outcomes in NATURAL units (lives saved, recidivism reduction, depression score drop) Cost per unit of outcome (e.g., $2,400 per re-arrest avoided) Comparing programs with the SAME outcome; outcome cannot easily be converted to dollars
Cost-benefit Program cost vs. benefits all converted to DOLLAR terms Net benefit (benefit minus cost) or benefit-to-cost ratio (e.g., $3.50 returned per $1 invested) Comparing programs with DIFFERENT outcomes; outcomes can be monetized

The exam-relevant distinction. Cost-effectiveness uses NATURAL OUTCOME UNITS (cost per X outcome achieved). Cost-benefit converts ALL outcomes TO DOLLARS. A cost-benefit analysis of a suicide prevention program tries to monetize a life saved (e.g., via human-capital or willingness-to-pay methods). Cost-effectiveness avoids that step and just reports cost per life saved, leaving the value judgment outside the analysis.

Caveats both methods share. What counts as a "cost" and a "benefit" depends on perspective (program, government, society). Long-term benefits are often discounted to present value. Many social-work benefits are notoriously hard to quantify (family stability, dignity, community trust). The exam rewards recognizing that economic evaluation is one input among many, not the final word on program value.

Section 4 of 4 ~7 min

Research foundations & ethics

Research ethics

Research, like practice, runs under ethical rules. The Belmont Report (1979) set out three principles that still anchor every IRB review: respect for persons, beneficence, and justice. In operational terms:

  • Informed consent. Participants understand the study purpose, procedures, risks, benefits, alternatives, and right to withdraw, and agree voluntarily. Consent is a PROCESS, not a single signature. Special protections apply to children, prisoners, pregnant participants, and adults with diminished capacity.
  • Confidentiality and data security. Identifiable participant data is protected through coded identifiers, secure storage, limited access, and de-identification before publication. Certificates of Confidentiality (federal) can protect researchers from being compelled to disclose identifying information.
  • Voluntary participation. Participants may decline initially or withdraw at any time without penalty. This is especially important when researcher and participant share an institutional context (clinician researching their own clients, supervisor researching supervisees) where implicit coercion is a concern.
  • IRB (Institutional Review Board) approval. Federal regulations (45 CFR 46, the Common Rule) require IRB review of research involving human subjects at federally funded institutions. The IRB classifies studies as exempt, expedited, or full-board review based on risk. Practice evaluation may or may not require IRB review depending on whether findings will be generalized beyond the program.
  • Minimization of harm and the beneficence principle. Research should not cause physical, psychological, social, or economic harm to participants. Where some risk is unavoidable, it must be justified by anticipated benefits and minimized through study design.
  • Justice in subject selection. Burdens and benefits of research are distributed fairly across populations. Recruiting only marginalized groups for risky studies (or only privileged groups for beneficial ones) is unjust.

The historical anchors for these rules: Tuskegee Syphilis Study (1932-1972), Henrietta Lacks (HeLa cells, 1951), Willowbrook Hepatitis Studies (1956-1970), the Stanford Prison Experiment (1971). Each produced specific regulatory responses (Belmont Report, Common Rule, HIPAA, modern IRB structure). Many of the populations exploited in these studies were Black, disabled, incarcerated, or institutionalized.

Concept Check

A doctoral student social worker wants to interview adults in a domestic violence shelter about their experiences with services. The shelter director, citing client confidentiality, asks the student to obtain only verbal consent and not record identifying information. The MOST appropriate next step is to:

(Cognitive Level: Reasoning) The MOST appropriate next step submits the protocol to an Institutional Review Board for review, presenting both federal informed-consent requirements and the shelter's confidentiality concerns for an ethics-grounded resolution. IRB review reconciles federal research standards with site-specific concerns and population vulnerability. Proceeding without IRB review violates research ethics. Declining the study entirely is premature; IPV research is sensitive but ethically conductable with appropriate protections. Recruiting before IRB approval is unethical.
Consultation vs. Supervision: Consultation is a voluntary process where a social worker seeks expert advice on a specific case or issue. The consultant provides recommendations but the social worker retains decision-making authority. Supervision is a formal, ongoing relationship where the supervisor has authority over the supervisee's practice and may be legally liable for the supervisee's actions. On the exam, know which situation calls for consultation (peer advice) versus supervision (authority, accountability).
Concept Check

An LMSW with five years of post-licensure experience encounters a clinically complex case involving severe trauma history and active substance use. She seeks the advice of a senior LCSW with extensive trauma expertise to discuss assessment and treatment options. The relationship between the two social workers is BEST described as:

(Cognitive Level: Reasoning) The relationship is BEST described as consultation: the LMSW voluntarily seeks specialized expertise and retains full decision-making authority over her own practice. Supervision is a formal ongoing relationship where the supervisor has authority over and may bear legal liability for the supervisee's decisions, which is not the structure described. Peer review is a formal quality assurance process examining practice, not a request for advice on a specific case. Co-therapy means both clinicians work directly with the client in joint sessions, which is not described here.

Research design quick reference (IIIC.4)

The exam tests basic recognition of the major designs, not advanced methodology. The categories to know:

  • Experimental (randomized controlled trial, RCT). Random assignment to treatment and control groups. The gold standard for establishing causality because random assignment balances unmeasured differences across groups. Often impractical or unethical in social work settings.
  • Quasi-experimental. Comparison groups exist but assignment is not random (matched groups, waitlist controls, pre-existing comparison sites). Stronger than no comparison; weaker than RCT because unmeasured differences may bias results.
  • Pre-experimental (one-group pre-post, posttest-only). No comparison group. Cannot rule out alternative explanations (maturation, history, regression to the mean). Common in agency practice; limited for causal claims.
  • Descriptive. Documents what exists without testing causal claims. Includes surveys, prevalence studies, case studies. Useful for understanding scope; not designed to test interventions.
  • Qualitative designs. Phenomenology (lived experience), grounded theory (theory generation from data), ethnography (cultural immersion), narrative (story analysis), case study (in-depth single case or small set). Strong for meaning-making and theory generation.

Key threats to internal validity the exam may name: HISTORY (events outside the study affect outcomes), MATURATION (natural change over time), TESTING (the pretest itself influences posttest), INSTRUMENTATION (measure changes during the study), REGRESSION TO THE MEAN (extreme scores drift toward average), SELECTION (groups differ at baseline), ATTRITION (differential dropout). Random assignment is the most powerful defense; comparison groups address several; longitudinal repeated-measure designs address others.

Summary Cram aid & consolidated traps

Lesson summary

Exam essentials, at a glance
EBP, 3 components
Research + clinical expertise + client values
Reliability ≠ validity
Reliable without valid possible; valid without reliable impossible
SSD vs pre-post
Many time points (SSD) vs. two time points (pre-post)
Fidelity matters
No effect could mean it does not work OR was not delivered as designed
Four types of need
Normative, felt, expressed, comparative; they rarely agree
Formative vs summative
Improve (formative) vs. judge (summative); PURPOSE drives
Outputs ≠ outcomes
Sessions delivered (output) vs. client change (outcome)
Cost-effective vs cost-benefit
Natural units (cost-effective) vs. dollarized (cost-benefit)
IRB & Belmont
Respect, beneficence, justice; informed consent is a process
Common traps the exam plants
  • "The worker should follow the manualized treatment exactly even though the client objects." No: EBP integrates research, clinical expertise, AND client values; rigid manual-following over client objection misapplies the framework.
  • "A reliable instrument is also valid." No: reliability is necessary but not sufficient. A consistently wrong measure is reliable but not valid.
  • "Tracking outcomes weekly over six months is pre-post testing." No: pre-post is two time points (before and after). Repeated measurement over time is single-system design.
  • "The program reported strong outcomes, so the model works." Maybe, maybe not: without fidelity data, you do not know whether the model was actually delivered. Outcomes are not interpretable without process data.
  • "The 300 sessions we delivered prove the program is working." No: 300 sessions is an OUTPUT, not an outcome. Outputs measure activity; outcomes measure CHANGE.
  • "A mid-program review is always formative." No: TIMING does not classify the evaluation; PURPOSE does. A mid-program review used to decide whether to terminate the program is summative in purpose.
  • "The community said they needed X, so the program should address X." Felt need is one of four types; comparing felt need to normative, expressed, and comparative need produces a fuller picture.
  • "Cost-benefit analysis is more comprehensive than cost-effectiveness because it dollarizes benefits." Not necessarily: cost-benefit requires monetizing outcomes that often cannot be ethically or accurately converted to dollars (life saved, family stability). Cost-effectiveness avoids that step.
  • "Practice evaluation does not require IRB review." Sometimes true (when findings stay internal), sometimes false (when findings will be published or generalized). The defining test is whether the activity meets the federal definition of research, not whether it is called "evaluation."
Practice what you just learned

Test yourself with exam-style questions on this topic.

Practice Questions