Examinations: Quality assurance in education

Examinations are used for a number of purposes. These purposes are as follows:

Diagnosis of weaknesses and strengths in curriculum areas
Promotion
Selection and placement
Assessment of school achievement
Human resource creation

From examination results, the teacher can observe the areas of the curriculum where pupils/students show weaknesses and areas where they demonstrate greater performance. With this information, the teacher could then plan remedial instruction for the curriculum areas of weakness.

Examinations are secondly, used to provide data for promoting pupils/students to higher classes of instruction. Third, the results of examinations are used for selection and placement of pupils/students for different types of treatment. Fourth, examinations are used for assessing the level of achievement of pupils/students for purposes of certificate award. Fifth, examinations administered at the end of school completion provide the data which school authorities and the government could use to assess whether or not the country is achieving its goal of human resource creation or not.

Following the Issues discussed on curriculum development in the August edition of this magazine, this article focuses on the topic of examinations used for assessing the level of achievement of the objectives of the instruction pupils/students have acquired over a specified period of teaching and learning. The article is aimed at end of School Examination Boards purposely set up to administer examinations to primary and secondary school pupils/students for purposes of providing certification that will provide evidence of the level of success the pupils/students reached by the end of the years of their schooling. School teachers and school authorities will also acquire a lot of benefits in reading this article.

It will not be possible to exhaust all that need to be written in one article on the setting, administration, scoring, releasing examination results, followed by release of Reports to Schools, and then deal with issues of schools that require remarking of examination scripts of some of their candidates because of dissatisfaction with their examination results. The article will consider only the essential aspects of examinations setting and grading. Part 2 of this article to appear in the October edition will deal with issues on examination malpractices, grades clean-up during the grade compensation processes and the uses of reports to schools.

The article will use the words tests and examinations interchangeably as becomes necessary in some contexts; and will also use the word “candidates,” examinees or students interchangeably.

There are two types of examinations or tests for assessing school achievement of pupils/students. These are Norm Referenced Tests and Criterion Referenced Tests.

Norm-Referenced Tests

The test movement that began nearly ninety years ago started as norm-referenced tests. Norm-referenced tests are designed in such a way as to allow examiners and school authorities to be able to compare the performance of a body of pupils/students who take a particular test to the performance of a norm group for the purposes of grades award. The norm group is generally a sample of pupils/students taken from a representative sample of schools in the country. The examination papers are try-tested in the selected norm schools and the performance of these schools used for establishing the bench marks or grade levels for A, B, C etc.

When the performance levels for grades A, B, C etc have been established using examination data from the norm group, the established performance levels are then used as the bench marks for awarding examination grades to all subsequent cohorts of students. The issue of comparing the performance of a candidate to the performance of a norm group rather than comparing a candidate’s performance directly with the requirements of the examination paper or papers became an issue of contention in the late 1960s.

Norm-referenced tests are suitable for assessment in aptitude testing and in tests for selection where construct validity is of importance. For school achievement examinations, the two major requirements are that the examination papers should have high content validity, not construct validity; secondly, a candidate’s performance should be directly compared to the candidate’s performance over the curriculum objectives tested. In this way, a candidate who scores 80% for example, in an examination should be considered as having done very well for grade A, not because the candidate performed to the level of the top group in the norm-referenced group, but rather because the candidate performed very well on the body of curriculum objectives examined.

Norm-referenced examinations do not meet these two requirements and consequently faded out by 1975. From the mid-1970s, the criterion-referenced examination system replaced the norm-referenced examination system in many countries.

Criterion-Referenced Tests

In the 1960s, test specialists in America and Europe began to point out that when a curriculum has been prepared for schools in a country, in a state or a region, it would be unfair for the performance of every student in the country to be compared to a selected norm group for the award of examination performance grades. Since the curriculum consists of a body of specific objectives, and since not all curriculum objectives can be examined at any one time, the examination should consist of a sample of selected curriculum objectives. Where each selected objective is considered a criterion for assessing the performance of all students in a country or state, examination papers developed by inclusion of a sample of criterion objectives are referred to as criterion-referenced tests or criterion-referenced examinations.

Decisions on number of test papers

Before writing test items, the number of test papers must first have been specified. Objective test items are also referred to as multiple choice questions (MCQ) by some education measurement specialists. This terminology is not accurate because Paper 1 does not necessarily consist of only multiple choice questions. The paper may consist of multiple choice questions and supply-type questions. For ease of reference, the term “item” will be used to cover every question type that will require the examinee to select an answer from a list of possible answers or write single words and single statement as an answer.

The examination bodies of General Certificate of Education (GCE) and School Certificate of Education (SCE) in Britain decided to merge the two examinations into one examination called, General Certificate of School Education (GCSE) in the mid-80s by offering two sets of examination papers; one set for relatively high ability students and the other for relatively low-ability students.

For simplicity in the examinations administered at the end-of-school completion, this article will use a three-paper system for all students with the following examination paper weights:

Type of paper No. of items/questions Weight of paper

1. Paper 1: Objectives test paper 60 20%

2. Paper 2: Essay-type test paper 10-15 50%

3. School-based assessment (SBA) - 30%

Speed tests: Paper 1 is intended as a speed test and should be answered in 60 minutes; one item per minute. The paper generally consists of low-ability thinking dimensions: knowledge and understanding; also referred to as comprehension. It is suggested that Paper 1 should be given a weight of 20% in the examination total marks.

Power tests: Paper 2 is intended as a power test, giving the examinee sufficient time to give their best in answering the questions. The paper is allocated 50% weight in the total marks awarded in the subject examination.

Paper 2 should consist of 10 up to 15 questions based on high-ability thinking, where the candidate is required to select and answer 5 questions. To make scoring the paper easier, it is suggested that each question must be written in a way that will test one of the four high-ability thinking skills. Where a question has two or more parts: a), b), and c) parts for instance, each of the two or three parts should test the same thinking skill; application, analysis, evaluation or creative thinking. Each test paper could be scored out of 100 and scaled down to the suggested percentages below.

School-Based Assessment (SBA)

The SBA paper will consist of practical tasks and some written examinations, including mid-term examinations, administered at least in the last two years before completion of the particular school level. Where the school system consists of Junior Secondary and Senior Secondary sectors, also referred to as Junior High and Senior High Schools, SBA performance scores should be collected from Year 1 up to the end of the completion year for each of the two education levels.

Weights for SBA scores

Since some candidates are more capable of doing very well on tasks that require well developed skills, it is suggested that 10% of the 30% weight allocation as well as the actual raw marks should be distributed as follows:

Marks Weight

1. Written examinations 35 10%

2. Assignment on SBA tasks: 65 20%

SBA tasks must be designed to test application of knowledge, analysis, evaluation and the ability to create new things.

Content validity consideration

Unlike norm-referenced tests, the validity of concern in school achievement tests is content validity. To be content valid, an achievement test must consist of items and questions based on a representative sample of the curriculum objectives students studied in school.

For purposes of content validity, the team of examiners for each subject should read as many of the textbooks and other supplementary texts used in the school system to be able to get the accurate level of knowledge and skills required of candidates in the examinations. They should also read a selection of other learning materials generally required in the school system to be able to develop the accurate frame of mind for test items writing, and to be able to determine the level of difficulty of the items and questions they will write for the examination papers; that is whether some items or essay questions they will write would be too difficult or too easy etc.

Test items selection and score interpretation

There are two conditions for setting criterion-referenced examination papers. First, the test items and essay-type questions based on the curriculum objectives should be randomly selected such that the examination paper would consist of a representative sample of the body of objectives contained in the curriculum. Secondly, the performance of each candidate in the examination must be directly interpreted to indicate what the candidate knows and could do in the examination. The total score of each candidate must be interpreted in terms of their capability over the curriculum objectives studied. The curriculum objectives are hence the criteria for assessing the performance of candidates or examinees.

However, where performance levels, consisting of the type of knowledge and skills have been specified by examiners and school authorities, a candidate’s performance in an examination is referenced to specified performance levels, called Grade Related Criteria or simply as Grade Criteria, for the award of the appropriate grade.

In a criterion-referenced examination, a candidate is faced with the situation of “every one for himself or herself.” It is the candidate’s performance in a criterion-referenced examination that determines their grade; it is not the performance of a norm-group that determines the grades of candidates as happens in norm-referenced examinations, and as indicated earlier.

Test specifications table

The test specifications table, also referred to as Test Blueprint, is developed to show the following:

Topics to be covered in the examination
Number of objective items and essay-type questions to be allocated to topics and their sub-parts
Types of thinking skills required in each topic, test items and questions
Percentage of objective items and essay-type questions allocated to the topics selected from the school curriculum for the examination

The Item bank

For purposes of setting objective examination papers, about 50 times the number of items generally used in Paper 1 should be written. This means about 3000 items should be written, edited and stored in the item bank.

Each item that gets into the item bank should go under Group Review carried out by the subject examining team, with another person from a relevant or related cognate subject area, and secondly, edited by the examining board’s editor.

Test try-out

After test items and questions have been edited, all items and samples of the essay-type questions should go through the test try-out process.

The reliability of each set of examination papers for each subject depends on whether the test-tryout sample is heterogamous or not. The tryout samples should hence include students from high performing schools, middle level performing schools and low level performing schools.

Statistics on test tryout data

The usual item statistics including difficulty indices and item discrimination indices, and statistics from 2-parametre or 3-parametre item calibration indices should be archived.

In criterion-referenced examinations, the item statistics are useful mainly for the purpose of identifying items and questions that may be ambiguous and will therefore require revision.

The statistics are also helpful for identifying items and questions that do not fit in the domain of items or questions required. For instance, an item or question that was initially written to test application of knowledge may be found to have been written for analysis of knowledge. Such items/questions are then culled out, revised and added to the appropriate pool of items or pool of questions.

Encryption: After completion of item editing and statistical calculations, the items should be encrypted and stored in the item bank.

Question bank

If the number of essay-type questions used in the actual examination paper is 15, then the total number of questions that should be written should be 15 x 50, that is, 750 questions, minimum.

Just as the case of test items, all questions should go through group review, through the department editor, go through required revisions, encrypted and stored in the Question Bank.

Illustrations and images required in the objective test paper as well the essay-type paper should be linked appropriately and archived in both the question and item banks.

Examination papers selection

At the appropriate time in the year when examination paper setting should be completed for printing, a computerized test items selection system should be used in the paper setting process. The computerized paper setting system is based on the test blueprint showing the number of items needed for each specified topic as well as the thinking dimensions required. Using the blueprint, the computerized system will select 60 items for Paper 1 and 15 essay questions for Paper 2 in just a few minutes from the item and question banks.

Stratified randomly parallel examination papers

Any two or three sets of examination papers selected in the manner described above are referred to as stratified randomly parallel tests (or examination papers). The tests are stratified because the percentage of items and questions required in the examination are based on the percentages determined in the test blue print for each topic, where related topics in a unit of the curriculum constitute a stratum for purposes of instruction and test item selection.

Two or three test papers selected from the items banks are secondly randomly parallel because the items and questions are randomly selected by the computerized system described above. The set of test papers are parallel because every set of items and questions written and tried out for each topic and thinking dimension in the set examination papers will have equal means, equal standard deviations and equal reliabilities; the characteristics required for parallel tests.

Final edit

To avoid possibilities of issuing “errata” to correct some error in the question papers just before the administration of the examinations, or sometimes even during the candidates examination process, it is important that all sets of question papers selected from the item and question banks should be decrypted and edited again before sending the papers for the Chief Examiner’s signature.

Preventing possible examination leakage

After the Chief Examiner’s signature, the examination papers should be encrypted again and stored in a secure Laptop for the printer. The officer sending the signed question papers on a laptop should be the one who will decrypt the question papers when this officer gets to the office of the printer. The printer will be required to sign the necessary papers to attest that the examination papers have been in secure packages etc. An examining board may however, use their own secure ways for sending examination papers to the printer, if the board finds their own ways very secure.

The process described above are taken in order to ensure that only very few people will see the final examination papers before they get to the printer. The head of the department for test development, the responsible subject officer and the chief examiner are the persons who would invariably see the final copies of the examination papers before they get to the printer.

Standards Setting

Standard setting in examinations consists of a number of conditions in the award of grades for candidates’ performance. There are basically four conditions for standards setting. These are:

Determination of grade criteria
Determination of grade cut-off scores
Consideration of application of examination paper error margin
Consideration of application of huddles in the grades award process

Grading systems

To start with, it is noted that different examination boards have different sets of grades. A couple of boards may use a nine-point grade award system; others may use a five or six-point system and some others a seven-point system. The following examples show the six and seven grades systems.

A Very Good
B Good
C Credit
D Satisfactory
E Fair
No assessment

The seven-grade system could be as follows:

A Excellent
B1 Very Good
B2 Good
C Credit
D Satisfactory
E Fair
No assessment

Grade related criteria

Whatever the grading system adopted, it is important to note that the specification of grade criteria is one of the critical features of criterion-referenced testing. The grade criteria should be written by the curriculum development team in conjunction with the team of examiners and some other persons selected from industry. The products of the school system are essentially used by industry and the participation of industry in the determination of qualities required for the award of end-of-school grades is therefore a matter of importance. The qualities required in each grade should be defined by statements that describe the capability or performance behaviours exhibited by candidates for the award of each grade:

An example of the expected behaviours required of candidates from school authorities in the English Language examination performance could be as described below.

English language is generally examined in three or two examination papers:

Grammar (Assessed by requiring the candidate to correct some wrongful sentences)
Reading (Assessed by a comprehension passage)
Writing (Assessed by an essay or composition)

In some other cases, English language may be examined in two examination papers:

Reading
Writing

It is assumed in this case, that Grammar is adequately covered in the essay component of the examination. Grammar belongs to the knowledge and comprehension dimensions and it is noted that while students could score high marks in a test on grammar, they would be incapable of writing good essays.

The main component of the English Language examination is the essay. The essay then becomes the focus for the grade award process. For the award of Grade A, the grading team could have specified the following criteria:

Grade A

The candidate:

1. Uses correct grammatical structures

2. Uses commas, full stops etc correctly

3. Uses at least three idiomatic expressions

4. Writes the beginning and concluding parts with great care

5. Writes in a fluent style making the essay interesting to read.

The above criteria or descriptors give clear indication of the ability of the candidate for the award of Grade A.

The grade criteria team should then go ahead and write the criteria for the award of the other grades.

Setting grade cut-offs

After writing the grade criteria, the same team should then set the cut-off scores for the award of Grades B, C etc. The six-point grading system is used as an example below.

Grade award Grade criteria Grade cut-off score

1. A Very Good (Already specified) 80% and above

2. B Good 70–79%

3. C Credit 55-69%

4. D Satisfactory 45-54

5. E Fair 35-44

6. No assessment or Below performance Less than 35%

Note that Grade C, the middle grade, has a wider class interval of 15 points; Grade B, D and E have class intervals of 10 points; while Grade A and the “No assessment,” class are open ended classes.

It should secondly, be noted that the grade cut-off scores do not necessarily apply to all subjects. Examiners and curriculum specialists may make different conditions and requirements for grade criteria and associated grade cut-off score points.

Source

Quansah, K. B. The Method of Examining in the West Indies (With preliminary discussion on curriculum development in the West Indies). Paper presented at the joint meeting of the Florida State Task Force on Education Review and the Florida Senate Select Committee on Education, Tallahassee, Florida, July 30th, 1992.