In this section, the committee presents its conclusions and recommendations on defining future evaluation objectives, strengthening the output assessment, and improving use of the APR to capture data for future evaluations. The goal is to address aspects of the process that might be reconsidered to improve future evaluations and to ensure that evaluation results optimally inform NIDRR's efforts to maximize the impact of its research grants.
Defining Future Evaluation Objectives
The primary focus of the summative evaluation was on assessing the quality of research and development outputs produced by grantees. The evaluation did not include in-depth examination or comparison of the larger context of the funding programs, grants, or projects within which the outputs were produced. Although capacity building is a major thrust of NIDRR's center and training grants, assessment of training outputs, such as the number of trainees moving into research positions, was also beyond the scope of the committee's charge.
NIDRR's program mechanisms vary substantially in both size and duration, with grant amounts varying from under $100,000 (fellowship grants) to more than $4 million (center grants) and their duration varying from 1 to more than 5 years. Programs also differ in their objectives, so the expectations of the grantees under different programs vary widely. For example, a Switzer training grant is designed to increase the number of qualified researchers active in the field of disability and rehabilitation research. In contrast, center grants and Model System grants have multiple objectives that include research, technical assistance, training, and dissemination. Model System grants (BMS, TBIMS, SCIMS) have the added expectation of contributing patient-level data to a pooled set of data on the targeted condition.
The number of grants to be reviewed was predetermined by the committee's charge as 30, which represented about one-quarter of the pool of 111 grants from which the sample was drawn. The committee's task included drawing a sample of grants that reflected NIDRR's program mechanisms. The number of grants reviewed for any of the nine program mechanisms included in the sample was small—the largest number for any single program was 10 (FIP). Therefore, the committee made no attempt to compare the quality of outputs by program mechanism.
NIDRR directed the committee to review two outputs for each of the grantee's projects. A grantee with a single project had two outputs reviewed, a grantee with three projects had six outputs reviewed, and so on. Although larger grants with more projects also had more outputs reviewed, the evaluation design did not consider grant size, duration, or the relative importance of a given project within a grant.
The committee was asked to produce an overall grant rating based on the outputs reviewed. Results at the grant level are subject to more limitations than those at the output level because of the general lack of information about how the outputs did or did not interrelate; whether, and if so how, grant objectives were accomplished; and the relative priority placed on the various outputs. In addition, for larger, more complex grants, such as center grants, a number of expectations for the grants, such as capacity building, dissemination, outreach, technical assistance, and training, are unlikely to be adequately reflected in the committee's approach, which focused exclusively on specific outputs. The relationship of outputs to grants is more complex than this approach could address.
Recommendation 6-3: NIDRR should determine whether assessment of the quality of outputs should be the sole evaluation objective.
Considering other evaluation objectives might offer NIDRR further opportunities to continuously assess and improve its performance and achieve its mission. Alternative designs would be needed to evaluate the quality of grants or to allow comparison across program mechanisms. For example, if one goal of an evaluation were to assess the larger outcomes of grants (i.e., the overall impact of their full set of activities), in addition to the methods used in the current output assessment, the evaluation would need to include interviewing grantees about their original objectives to learn about how the grant was implemented and any changes that may have occurred in the projected pathway, how various projects were tied into the overall grant objectives, and how the outputs demonstrated the achievement of the grant and project objectives. The evaluation would also involve conducting bibliometric or other analyses of all publications and examining documentation of the grant's activities and self-assessments, including cumulative APRs over time. Focusing at the grant level would provide evidence of movement along the research and development pathway (e.g., from theory to measures, from prototype testing to market), as well as allow for assessment of other aspects of the grant, such as training and technical assistance and the possible synergies of multiple projects within one grant.
If the goal of an evaluation were to assess and compare the impact of program mechanisms, the methods might vary across different program mechanisms depending on the expectations for each, but would include those mentioned above and also stakeholder surveys to learn about the specific ways in which individual grants have affected their intended audiences. With regard to sampling methods, larger grant sample sizes that allowed for generalization and comparison across program mechanisms would be needed. An alternative would be to increase the grant sample size in a narrower area by focusing on grants working in specific research areas across different program mechanisms or on grants with shared objectives (e.g., product development, knowledge translation, capacity building).
NIDRR's own pressing questions would of course drive future evaluations, but other levels of analysis on which NIDRR might focus include the portfolio level (e.g., Model System grants, research and development, training grants), which NIDRR has addressed in the past; the program priority level (i.e., grants funded under certain NIDRR funding priorities) to answer questions regarding the quality and impact of NIDRR's priority setting; and institute-level questions aimed at evaluating the net impact of NIDRR grants to test assumptions embedded in NIDRR's logic model. For example, NIDRR's logic model targets adoption and use of new knowledge leading to changes/improvements in policy, practice, behavior, and system capacity for the ultimate benefit of persons with disabilities (National Institute on Disability and Rehabilitation Research, 2006). The impact of NIDRR grants might also be evaluated by comparing grant proposals that were and were not funded. Did applicants that were not funded by NIDRR go on to receive funding from other agencies for projects similar to those for which they did not receive NIDRR funding? Were they successful in achieving their objectives with that funding? What outputs were produced?
The number of outputs reviewed should depend on the unit of analysis. At the grant level, it might be advisable to assess all outputs to examine their development, their interrelationships, and their impacts. A case study methodology could be used for related subsets of outputs. If NIDRR aimed its evaluation at the program mechanism or portfolio level, sampling grants and assessing all outputs would be the preferred method. For output-level evaluation, having grantees self-nominate their best outputs, as was done for the present evaluation, is a good approach.
Although assessing grantee outputs is valuable, the committee believes that the most meaningful results would come from assessing outputs in the context of a more comprehensive grant-level and program mechanism-level evaluation. More time and resources would be required to trace a grant's progress over time toward accomplishing its objectives; to understand its evolution, which may have altered the original objectives; and to examine the specific projects that produced the various outputs. However, examining more closely the inputs and grant implementation processes that produced the outputs would yield broader implications for the value of grants, their impact, and future directions for NIDRR.
Strengthening the Output Assessment
The committee was able to develop and implement a quantifiable expert review process for evaluating the outputs of NIDRR grantees, which was based on criteria used in assessing federal research programs in both the United States and other countries. With refinements, this method could be applied to the evaluation of future outputs even more effectively. Nonetheless, in implementing this method, the committee encountered challenges and issues related to the diversity of outputs, the timing of evaluations, sources of information, and reviewer expertise.
Diversity of Outputs
The quality rating system used for the summative evaluation worked well for publications in particular, which made up 70 percent of the outputs reviewed. Using the four criteria outlined earlier in this chapter, the reviewers were able to identify varying levels of quality and the characteristics associated with each. However, the quality criteria were not as easily applied to such outputs as websites, conferences, and interventions; these outputs require more individualized criteria for assessing specialized technical elements, and sometimes more in-depth evaluation methods. Applying one set of criteria, even though broad and flexible, could not guarantee sufficient and appropriate applicability to every type of output.
Timing of Evaluations
The question arises of when best to perform an assessment of outputs. Technical quality can be assessed immediately, but assessment of the impact of outputs requires the passage of time between the release of the outputs and their eventual impact. Evaluation of outputs during the final year of an award may not allow sufficient time for the outputs to have full impact. For example, some publications will be forthcoming at this point, and others will not have had sufficient time to have an impact. The trade-off of waiting a year or more after the end of a grant before performing an evaluation is the likelihood that staff involved with the original grant may not be available, recollection of grant activities may be compromised, and engagement or interest in demonstrating results may be reduced. However, publications can be tracked regardless of access to the grantee. Outputs other than publications, such as technology products, could undergo an interim evaluation to enable examination of the development of outputs.
Sources of Information
Committee members were provided with structured briefing books containing the outputs to be reviewed. They were also provided with supplemental information on which members could draw as necessary to assign quality scores. These other sources included information submitted through the grantees' APRs and information provided in a questionnaire developed by the committee (presented in Appendix B). The primary source of information used by committee members in assigning scores was direct review of the outputs themselves. The supplemental information played a small role in assessing publications, whereas for outputs such as newsletters and websites, this information sometimes provided needed context and additional evidence helpful in assigning quality scores. However, it is important to note that the supplemental information represented grantees' self-reports, which may have been susceptible to social desirability bias. Therefore, committee members were cautious in using this information to serve as the basis for boosting output scores. Moreover, the APR is designed as a grant monitoring tool rather than as a source of information for a program evaluation, and the information it supplied was not always sufficient to inform the quality ratings.
To illustrate the limitations of the information available to the committee, the technical quality of a measurement instrument was difficult to assess if there was insufficient information about its conceptual base or its development and testing. Likewise, for conferences, workshops, and websites, it would have been preferable for the grantee to identify the intended audience so that the committee might have better assessed whether the described dissemination activities were successful in reaching that audience. For the output categories of tools, technology, and informational products, grantees sometimes provided a publication that did not necessarily describe the output. In addition, some outputs were difficult to assess when no corroborating evidence was provided to support grantees' claims about technical quality, advancement of the field, impact, or dissemination efforts.
The committee did not use standardized reporting guidelines, such as CONSORT (Schulz et al., 2010) or PRISMA (Moher et al., 2009), used by journals in their peer review processes for selecting manuscripts for publication. The committee members generally assumed that publications that had been peer reviewed warranted a minimum score of 4 for technical quality. (In some cases, peer-reviewed publications were ultimately given technical quality scores above or below 4 following committee discussion.) Had reporting guidelines been used in the review of research publications, it is possible that the committee's ratings would have changed.
The committee was directed to assess the quality of four types of prespecified outputs. While the most common output type was publications, NIDRR grants produce a range of other outputs, including tools and measures, technology devices and standards, and informational products. These outputs vary widely in their complexity and the investment needed to produce them. For example, a newsletter is a more modest output than a new technology or device. To assess the quality of outputs, the committee members used criteria based on the cumulative literature reviewed and their own expertise in diverse research areas of rehabilitation and disability research, medicine, and engineering, as well as their expertise in evaluation, economics, knowledge translation, and policy. However, the committee's combined expertise did not include every possible content area in the broad field of disability and rehabilitation research.
Recommendation 6-4: If future evaluations of output quality are conducted, the process developed by the committee should be implemented with refinements to strengthen the design related to the diversity of outputs, timing of evaluations, sources of information, and reviewer expertise.
Corresponding to the above points, these refinements include the following.
Diversity of outputs
The dimensions of the quality criteria should be tailored and appropriately operationalized for different types of outputs, such as devices, tools, and informational products (including newsletters, conferences, and websites) and should be field tested with grants under multiple program mechanisms and refined as needed.
For example, the technical quality criterion includes the dimension of accessibility and usability. The questionnaire asked grantees to provide evidence of these traits. However, the dimensions should be better operationalized for different types of outputs. For tools, such as measurement instruments, the evidence to be provided should pertain to pilot testing and psychometrics. For informational products, such as websites, the evidence should include, for example, results of user testing, assessment of usability features, compliance with Section 508 standards (regulations from the 1998 amendment to the Rehabilitation Act of 1973 requiring the accessibility of federal agencies' electronic and information technology to people with disabilities). For technology devices, the evidence should document the results of research and development tests related to such attributes as human factors, ergonomics, universal design, product reliability, and safety.
The quality criterion related to dissemination provides other clear examples of the need for further specification and operationalization of the dimensions. For example, the dissemination of technology devices should be assessed by examining progress toward commercialization; grantees' partnerships with relevant stakeholders, including consumers and manufacturers; and the delivery of information through multiple media types and sources tailored to intended audiences for optimal reach and accessibility.
Timing of evaluations
The committee suggests that the timing of an output evaluation should vary by the output type. Publications would best be assessed at least 2 years after the end of the grant. However, plans for publications and dissemination and the audience for scientific papers could be included in the final report. As stated earlier, other outputs developed during the course of the grant should be evaluated on an interim basis to assess the development and evolution of products. Outputs that have the potential to generate change in practice or policy may require more time to pass before impact materializes and can be measured, and so would best be evaluated on an interim basis as well.
Sources of information
A more proactive technical assistance approach is needed to ensure that grantees provide the data necessary to assess the specific dimensions of each quality criterion. As stated earlier, the information supplied in the APR and the questionnaire was not always sufficient to inform the quality ratings. (See also the above discussion of information requested on the grantee questionnaire and the discussion below of the APR.)
The committee suggests that for future output evaluations, NIDRR should consider developing an accessible pool of experts in different technical areas who can be called upon to review selected grants and outputs. In addition, it is essential that future review panels include scientists with disabilities. Consumers also could also play a vital role as review panel members by addressing key criteria related to impact and dissemination.
Improving Use of the Annual Performance Report
NIDRR's APR system has many strengths, but the committee identified some improvements the agency should consider in building greater potential for use of these data in evaluations. The APR system (Research Triangle International, 2009) includes the grant abstract, funding information, descriptions of the research and development projects, and outcome domains targeted by projects, as well as a range of variables for reporting on the four different types of grantee outputs, as shown in Table 6-5. The system is tailored to different program mechanisms as needed. All of the descriptive information listed above, plus the output-specific variables listed in Table 6-5, were utilized in the committee's evaluation. The data were provided in electronic databases and in the form of individual grant reports.
Data Elements Related to Outputs That Are Covered in the APR.
The APR data set NIDRR provided to the committee at the outset of its work was helpful in profiling the grants for sampling and in listing all of the grantees' projects and outputs. It facilitated asking the grantees to nominate outputs for the evaluation since it enabled the committee to generate comprehensive lists of all reported projects and outputs to make the task of output selection less burdensome for the grantees. If grantees had more recent outputs originating from their NIDRR grants that they wished to nominate as their top two for the committee's review, they had the option of doing so.
NIDRR also provided grantees' narrative APRs from the last year of their grants, as well as their final reports. These narratives were highly useful to the committee for compiling descriptions of the grants.3
Beyond the Essay, III
Summative Assignments: Authentic Alternatives to the Essay
Metaphor Maps || Student Anthologies || Poster Presentations
The essay is often the go-to assignment in humanities courses, and rightfully so. Especially in the text-based disciplines, the craft of the essay is highly valued as part of practicing the work of the field. More broadly, developing effective writing skills is a universal learning objective in higher education and, to varying degrees, is often dependent on these humanities classes. There are, however, alternative assignments in which students can rigorously but creatively perform their understandings in summative projects to be rigorously assessed, while still practicing–and even calling attention to–the habits of mind of the discipline.
Students synthesize and unify multiple themes or concepts through metaphors, and then explicate their own thinking
This assignment encourages students to practice and perform a variety of ways of thinking:
- think creatively about a text, concept, or unit (or several) by thinking metaphorically,
- synthesize varied pieces of a complex concept or text, and
- articulate their thinking in new and self-authored ways.
It involves two parts: first, students draw an image of a single metaphor they use to make sense of a concept, text, or unit (or several), and then–more importantly–they explicate their drawing. Sample instructions are in the box to the right.
Ultimately, metaphor maps are less about the drawing and more about how students synthesize and unify complex, multidimensional thinking around a single metaphor–and how clearly and effectively they explain these ideas. This strategy stretches them beyond the typical modes of learning and challenges them to organize their thoughts in a new way.
Some suggested criteria for assessing metaphor maps include the following:
- Unity & Synthesis
Each is further developed in the box to the right.
- In “Using ‘Frameworks’ To Enhance Teaching and Learning” (2012), Patrice W. Hallock describes an assignment in which her students draw their thinking about how they make sense of course content. One student used the metaphor of a camera (right). Although her assignment doesn’t include an essay explication, this student-generated and -drawn metaphor for a concept is the beginning of a metaphor map.
- In a philosophy class, a sample metaphor for critical thinking is a ship at sea surrounded by ethical mountains (below, right; Pierce).
- In a multicultural literature class, a student drew a baseball field in the final inning. The teams represented two of the cultures he’d read about in the class, the baseball field represented the All-American setting where they were at conflict, and the final inning suggested a time of crisis.
- These “Minimalist Fairy Tales” drawings offer great examples of a slight alternative to metaphor maps. Students can be asked to draw a simple image from the text that captures the essential meaning of the whole text. Synecdoche Maps!
Students perform the work of editors or curators
A significant genre in the humanities is the anthology, collections of poems, stories, essays, artwork, etc, selected, researched, and annotated by an editor. Students can take on this role of editor, acting as curator and commentator as they establish a sense of authority and ownership over the material (Chick, 2002). They make intentional decisions about which pieces to include, what contexts to provide in their editorial notes, and even what paper, binding, font, and illustrations to use. If the pieces are short enough, as in a poetry anthology, students can be required to write or type the pieces themselves “to engage with every letter, every punctuation mark, every capital or lower-case letter, and every line break, and to consider the meanings of these choices by the poet” (p. 420). They include a title page, table of contents, prologue, and epilogue framing their anthology.
Giving students guidance for their editorial responses to each selection is helpful. Some possibilities include the following:
- Argue for its significance
- Interpret its meaning
- Describe its historical and cultural context
- Write a biographical headnote using details most relevant to the selection
- Explain how it illustrates an important disciplinary theory or concept
Ultimately, students are “defining their own aesthetics” and becoming “aware of the ramifications of making aesthetic choices” by creating their anthologies (p. 422). This analysis can then be connected to the formation of the canon, revealing the subjective nature of “what they may have thought were universal or unquestioned notions” of quality and significance in the field.
Students visually showcase their learning and present it to wider audiences
Lee Shulman, President Emeritus of the Carnegie Foundation for the Advancement of Teaching, has written of the importance of engaging with our teaching as we do our research–as “‘community property”: “We close the classroom doors and experience pedagogical solitude, whereas in our life as scholars, we are members of active communities: communities of conversation, communities of evaluation, communities in which we gather with others in our invisible colleges to exchange our findings, our methods, and our excuses” (2004, p. 140). What if we asked our students to do the same with their learning–as fellow citizens of the university, emerging scholars and researchers and producers of their own knowledge? In this model of making learning community property, the audience for student learning extends beyond the instructor and often even classmates–reaching out to a larger community that remains authentic to disciplinary and learning goals.
The genre of the academic poster is a staple in the natural and social sciences, displayed at conferences and other meetings to share research findings with peers, and students in these fields begin practicing these ways of going public fairly early. As Hess, Tosney, and Liegel demonstrate in “Creating Effective Poster Presentations” (2013), these visual representations of knowledge “operate on multiple levels”: “source of information, conversation starter, advertisement of your work, and summary of your work.” Poster sessions can be lively sites of conversations about new and interesting work in the field, but few (if any) disciplines in the humanities use this genre. In this way, assigning posters may feel inauthentic; however, the genre’s attention to sharing content in a concise, visual, and public format can be adapted to more closely reflect the meaning-making in the humanities. In fact, given many humanities disciplines’ appreciation of form reflecting content, the poster can make visible specific rhetorical moves, encouraging students to think not only about their ideas but also how they form their ideas.
It’s important to note that the poster is a conversation starter: it doesn’t have to present the project in its entirety. Instead, it can highlight part of the project, which the presenter uses to begin an oral explanation of the rest of the project.
For example, a poster might capture the processes of close reading and analysis of a single key passage of text (or image of a work of art), part of a larger project but focused on a key moment.
A poster could also illustrate responding to an earlier thought, such as correcting a misconception, refuting an argument, revising a theory, and the like.
Or a poster may graphically represent the thinking processes used to arrive at an interpretation or conclusion. Here is an example of an inductive analysis, interpretation, or argument.
Posters can also effectively represent the common humanities move of offering a new way of thinking about a time period, text, idea, person, etc., often framed as filling in a gap in earlier explanations.
CFT Graduate Teaching Fellow Jessica Riviere (German) developed this model and the next while preparing for a conference.
Posters can also make clear the move of proposing new ideas that lie at the intersection of others. This common thinking process of synthesis lends itself to Venn diagrams.
Finally, a poster may also represent the narrative of a project, such as an analysis of a novel, a chronology of interactions, or a conversation between theorists or even between hypothetical figures of different eras–anything that might be told in narrative form, along with an overarching analysis.
“Rethinking the Design of Presentation Slides: The Assertion-Evidence Structure” similarly re-envisions what slides can do for engineers and scientists. (The rhetorical move of making an assertion and supporting it with evidence is of course used in the humanities as well.) The focus of that work is PowerPoint slides for presentations, but the framework can be applied to a single slide for a poster by stating the assertion and then illustrating it with visual images (or text boxes), as in the example to the right.
The infographics on Sidney Eve Matrix’s “Visualizing Social Media Culture” page and on Maria Popova’s “The Lives of 10 Famous Painters, Visualized as Minimalist Infographic Biographies” illustrate additional possibilities, although they’re made with more sophisticated software than PowerPoint. Using them as examples, though, will inspire some students to go further than the simpler models described above.
- There are plenty of websites with step-by-step instructions on how to make academic posters using the traditional scientific poster model. (Simply Google “academic posters.”) For our purposes, though, most of their instructions don’t apply–except for their useful explanations for using PowerPoint. (All of the above posters are simply individual PowerPoint slides.)
- Chick, Nancy L. (2002). Anthologizing transformation: breaking down students’ ‘private theories’ about poetry. Teaching English in the Two-Year College, 29.4. 418-423.
- Hallock, Patrice W. (Sept 17, 2012). Using ‘frameworks’ to enhance teaching and learning. Faculty Focus. Magna Publications.
- Hess, George, Tosney, Kathryn, and Liegel, Leon. (2013). Creating effective poster presentations. North Carolina State University.
- Pierce, K. (2013). Concept maps in philosophy courses. In Socrates’ Wake: A Blog about Teaching Philosophy.
- Shulman, Lee S. (2004). Teaching as community property: Putting an end to pedagogical solitude. Teaching as Community Property: Essays on Higher Education. San Francisco: Jossey-Bass. 140-144.