RATING AND RANKING
In past NRC assessments of doctoral programs, rankings have been provided for doctoral programs in specified fields. In 1982, rankings for each program in a field were presented in a list arranged alphabetically by university. In 1995, the rankings based on peer assessment of scholarly quality of program faculty were arranged in numerical order for each program within a field. The NRC commissioned in 2001 a study to address the methodology questions raised by responses to the 1982 and 1995 assessments. The methodology study proposed a variety of new approaches to the address these questions and it outlined methods to put the rating and ranking tasks on a sounder statistical basis. The NRC accepted the resulting report, which was released in late 2003, and it has been taken as the foundation of the current Ph.D. assessment study.
Charge to the Committee
When it authorized the current study in 2005, the National Research Council gave the Committee the following charge:
An assessment of the quality and characteristics of research-doctorate programs in the United States will be conducted. The study will consist of 1) the collection of quantitative data through questionnaires administered to institutions, programs, faculty, and admitted to candidacy students (in selected fields), 2) collection of program data on publications, citations, and dissertation keywords, and 3) the design and construction of program ratings using the collected data including quantitatively based estimates of program quality. These data will be released through a web-based, periodically updatable database and accompanied by an analytic summary report. Following this release, further analyses will be conducted by the committee and other researchers and discussed at a conference focusing on doctoral education in the United States. The methodology for the study will be a refinement of that described by the Committee to Examine the Methodology for the Assessment of Research-Doctorate Programs, which recommended that a new assessment be conducted.
A major issue then is to define the technical procedures to conduct task (3). This necessary task is a difficult one: the Committee must specify a set of weights which, when multiplied by the collected data, will produce “quantitatively based estimates of program quality”. The Committee has adopted two primary approaches to this task. They vary according to how the rating task is specified.
1. Explicit assignment of weights to the quantitative variables.
In this approach, raters are asked to assign weights to the quantitative variables. It is unmanageable to assign weights to every variable, so key variables will be identified, either empirically (through factor analysis) or judgmentally. Once the weights are determined, the program rating (and range of ratings) is applied to the program-specific value of each variable. The range of ratings is determined using the variability across the raters of the weight assigned to each variable. And this intrinsic uncertainty in the process is factored into all subsequent stages of the analysis so that all ratings and rankings are reported as based on the quartile ranges of uncertainty.
2. Implicit derivation of weights.
This is the “anchoring study”. A sample of raters in each field is asked to rate a sample of programs on the basis of perceived quality of the PhD program. They are asked for their familiarity with the program and, if they are familiar with it, how they rate it on a five point scale. The ratings are then regressed on the quantitative variables. The regression coefficients are used as weights. This method was detailed in Appendix G of the 2003 NRC study “Assessing Research-Doctorate Programs: A Methodology Study” and appears to have given useful results. Again a range of ratings is determined using the variability across the raters of the weight assigned to each variable. . The Program Quality Questionnaire has been approved by The National Academies Institutional Review Board.
Either exercise can be conducted by a stratified random sample of faculty or by a group of experts within a broad field.
Additional Analyses of Graduate Program Performance
In addition to the above analyses, the Committee plans to produce subsidiary ratings designed to highlight specific components of program quality, which, given the study’s many audiences, might best be broken out and treated separately. Examples of what may be done follow:
I) One sub-component would be labeled "Research Impact" (RA), and
metrics might include the following measures in its definition: citations/faculty member, publications/faculty member, honors and awards etc.
II) A second sub-component would be labeled "Student Support and
Outcomes" (RB) and metrics might include the following: fraction of students
having full support, time to degree, attrition rate, fraction that has a
position in a relevant field on graduation, etc.
III) A third sub-component would be labeled "Diversity of Academic
Environment" (RC) and metrics might include the following: fraction of students
and faculty that are female and minority etc.
Data analysis available to users
An important product of the study will be a website containing quantitative data for each program. Software will be provided so that users can construct their own ratings of selected programs based on the variables that they view as important and weights that they have chosen. The Committee cannot duck its rating task, but users will be able to disagree with the assigned weights and justify their alternative approach.