Thursday, February 27, 2014

Blog 8 - Backward Design


In this course, you have worked within a given unit and grade level to create a variety of assessments for an eLearning course. One significant part of assessing students in a variety of ways is to get a good picture of what the students actually have learned. How does Backward Design provide for a way to see if students are actually learning what you want to teach then by looking at the assessment results? Hint: Revisit lesson 1 and apply what you have learned throughout the course to the concept of Backward Design. 

When teachers and educators use a backward design lesson planning process it ensures that they are thinking about what the achievement looks like in the students before teaching even begins. Backward design forces the educator to look at the learning outcomes and performance objectives to ensure that they are based on the standards and the appropriate grade-level curriculum. In other words, backward design forces the educator to think about the end in mind and then create the curriculum from the performances required for the standard and the teaching needed to prepare students to perform.
Educators should focus on answering these questions during the design phase  (Tasmanian Department of Education):
  • What is worthy and requiring of understanding?
  • What is evidence of understanding?
  • What learning experiences and teaching promote understanding, interest and excellence?
Through the backward design, it helps to clarify the learning you want to see (answering the question “What is worth and requiring of understanding?”). Then the educator must think about the evidence that is needed to ensure that the students achieved those desired learning (answering the question “What is evidence of understanding?”). Finally, the educator plans the teaching and learning activities and resources to help the students reach those goals (answering the question “What learning experiences and teaching promote understanding, interest and excellence?”).

What is worth and requiring of understanding?
Per McTighe & Wiggins (2004), educators must consider the goals, what students should know, understand, and be able to do. They need to consider the big ideas (e.g., essential question) and any specific knowledge and skills that are targeted in the goal and needed for effective performance. As McTighe & Wiggins (2004, p. 18) state they might ask the following questions:

        What are the goals (e.g., what would be seen in classrooms, schools, and the district if designing, teaching, and assessing for understanding were the norm)?

        To achieve [the] goals what understandings will be needed (e.g., by teachers, administrators, policymakers, parents, students)?

        What essential questions will focus [the] goals, stimulate conversation, and guide [the] actions?

        To achieve [the] goals, what knowledge and skills will be needed (e.g., by teachers, administrators, policymakers, parents, students)?
What is evidence of understanding?
Educators must consider the evidence of learning – how do they know if the student has achieved the desired results and met the learning standard? Educators need to make sure that they can identify whether the student really understands the “big idea” (e.g., essential question) as well as what acceptable proficiency evidence looks like. McTighe & Wiggins (2004) suggest that educators look at the backward design as a way to document and validate that the desired results of was achieved. McTighe & Wiggins (2004, p. 18) state they might ask the following questions:

        What will count as evidence of success?

        What baseline data … should be collected?

        What are key indicators of [the] short-term and long-term progress?
What learning experiences and teaching promote understanding, interest and excellence?
Once the educator has identified the results and the appropriate evidence of understanding, they can now plan the learning activities. The educator should identify the sequence of activity that will result in the best desired results. The learning activities need to be engaging and effective. This is especially true in an eLearning environment where the environment is student-centered. McTighe & Wiggins (2004, p. 18) state they might ask the following questions:

        What actions will help … realize [the] goals efficiently?

        What short- and long-term actions will we take?

        Who should be involved? Informed? Responsible?

        What predictable concerns will be raised? How will [they be addressed]?

Through backward design, educators are forced to look at the desired results and assessment evidence even before creating the action plan (e.g., the lesson). Teachers already have a clear identified goal on what they want the student to be able to do by the end of the lesson. So they establish a framework for assessing the students, determine an assessment plan, produce the assessment (whether is it a constructed-response, fixed-response, written, performance, interactive and collaborative), even before creating the learning activities.  
With a backward design model, educators will know what should be assessed prior to teaching and the students will know what they will be measured on to show mastery of the concept. With an eLearning environment, students can receive instant feedback on their assessments (e.g., feedback on how they answered a multiple choice question, score immediately posted at the conclusion of the assessment, and qualitative feedback). The teacher can review the assessment results and be able to identify which objectives the students have mastered and which ones they might need have an intervention. This all ensures that students are actually learning.

Works Cited

McTighe, J., & Wiggins, G. (2004). Understanding by Design Professional Development Workbook. Alexandria, VA: Association for Supervision and Curriculum Development
Oosterhof, A., Conrad, R.-M., & Ely, D. P. (2008). Assessing Learners Online. Upper Saddle River: Merrill/Prentice Hall.
Tasmanian Department of Education. (n.d.). Principles of Backward Design. Retrieved February 27, 2014, from http://www.wku.edu/library/dlps/infolit/documents/designing_lesson_plans_using_backward_design.pdf

Sunday, February 23, 2014

Blog 7 - Performance Assessments

In your blog, define performance assessment. Describe how this type of assessment can be used in eLearning. What makes performance assessment truly worthwhile in eLearning? What can you foresee as the pitfalls and problems with performance assessment in the eLearning environment?

What is a Performance Assessment?

According to Oosterhof, Conrad, & Ely (2008) performance assessments involves the learner performing a task to create a product. There task often involves several steps and requires specific skills. These skills are educationally-based and/or on-the-job training and evaluation. The assessment is not the “typical” fixed response (multiple choice, true/false) or constructed-response (fill-in-the blank) assessment. The performance assessment can be written, but does not have to be. For example, a performance assessment can evaluate the student’s ability to complete a tax return or use/apply the English grammar rules in a story – these would be written. On the other hand a non-written assessment might evaluate the learner’s ability to play a musical instrument or throw a football. This is because performance assessments often evaluate motor skills as in the example of speech and language, performing arts, and spots.

To assist an educator in identifying whether an assessment is a performance based assessment or not, they should use the following criteria (Oosterhof, Conrad, & Ely, 2008, p. 144):
1.       Specific behaviors or outcomes of behaviors are to be observed

2.       The behaviors represent performance objectives or goals of the course

3.       It is possible to judge the appropriateness of the learners’ actions or at least to identify whether one possible response is more appropriate than some alternative response

4.       The behavior or outcome cannot be directly measured using a paper-and-pencil test, such as a test involving a multiple-choice, essay, or other written format.
There are two types of performance assessments – (1) single task performance assessments which measure knowledge of a concept or rule and (2) complex-task performance assessments which are used when problem-solve skills are involved. No matter which type is used, they are developed using three basic steps (Oosterhof, Conrad, & Ely, 2008):

1.       Identify the capability to be assessed. This involves identifying the goal to be assessed and the type of capability involved.

2.       Establish the performance to be observed. This involves a summary of descriptions of tasks associated with the goal being assessed, a description of the tasks to be performed, identification on whether to focus on process or product, identification of prerequisite skills, and learner’s instructions.

3.       Define a scoring plan.
Use in eLearning Environment
A critical role in determining whether eLearning learners have learned (or not) are the products they create. Performance assessments play a major role in an eLearning environment, however, “…only to situations where the learners’ actions are being used to directly measure an explicit goal of instruction” (Oosterhof, Conrad, & Ely, 2008, p. 158).  The performance an educator observes is only an indication of the learner’s knowledge; it is not the knowledge the educator hopes the learner is achieving.  Therefore, performance assessments are not used to measure declarative knowledge (Gagne’s reference, http://www.icels-educators-for-learning.ca/index.php?option=com_content&view=article&id=54&Itemid=73#des, Webb refers to as level 1, http://www.aps.edu/rda/documents/resources/Webbs_DOK_Guide.pdf,and Bloom’s Taxonomy refers to as remembering, http://ww2.odu.edu/educ/roverbau/Bloom/blooms_taxonomy.htm).  They are used often to measure procedural knowledge and problem-solving skills.

A product, versus a process, is favored in an eLearning environment performance assessment. Even though a process is involved, the end result is that the online learner is able to produce a product. For example, a student studying computer programming would produce a program that does “x” and in order to produce that program they must follow the coding process.
An important thought is that the eLearning environment is making educators change from a teacher-centered model to a student-centered model. In this model learners have greater opportunities to demonstrate what they know and what they can do – that’s performance assessments!

Not every content area is ideal for an eLearning environment. Even though a content area is designed in an eLearning environment, not all assessments can be “paper-and-pencil” (or electronic online testing) and/or performance based.  Let’s take an example of a foreign language class. Yes, the instructor can assess the learner’s ability through the traditional “paper-and-pencil” test and this can be electronically with automatic scoring. Grammatical rules, vocabulary and other declarative knowledge can be tested. In a foreign language class it is important for the learner to be able to SPEAK the language and not just phrases but be able to carry on a conversation. A performance assessment can be utilized to assess whether a student can dialog with another learner or even the instructor in a given scenario. Doing such an assessment in a traditional classroom would be relatively easy since the educator can listen and watch the conversation unfold. In an eLearning environment this scenario would pose a challenge. However, with today’s technology of video, video conferencing, Skype, and other two-way communication programs, an educator can observe two or more learners in such a conversation.
There are four elements in online performance assessments (Oosterhof, Conrad, & Ely, 2008, p. 182):

1.       communicating the tasks and expectations to learners

2.       conveying products and other responses from each learner to the instructor and often to other learners

3.       providing feedback to each learner, and

4.       maintaining records.

An educator (and educational institutions) can address these four elements with the selection of an appropriate learning management system, identifying the options for file transfer and the use detached or integrated feedback between the educator and learners, and implementing techniques for efficiency (such as file naming conventions, scoring process, and use of hotkeys).
Advantages and Pitfalls in eLearning
While there are many advantages of performance assessments there are also disadvantages. Performance assessments can measure skills that paper-and-pencil tests cannot measure such as writing, foreign language, music, art, and sports -- these skills are based on motor skills. The educator’s challenge is to figure out which content area can be taught using an eLearning environment and which ones cannot.

Performance assessments promote the teaching and learning of complex skills; this is true for the traditional face-to-face and eLearning environments.
Written assessments traditionally focus on the product that results from doing a task, whereby in performance assessments the focus on the process a learner must do to get to the product. Summative and formative assessments rely on observations of a learner when completing a task. In the eLearning environment this observation is often not feasible.

In the traditional educator-student environment, the educator can easily observe what the learner knows through how the student is performing and their facial and physical appearance. In an eLearning environment, this insight is more dependent on the formal assessments.
In the educational environment, test security is a concern, no matter what type of test is administered. There is much greater test security in the paper-and-pencil tests than performance assessments. In performance assessments, learners are asked to product a product. The scoring plan and even product models (such as past student work) is provided ahead of time to the learner.
There are three limitations of performance assessments:  (1) they are less efficient, (2) subjective scoring, and (3) problems with generalizability.

Performance assessments take more time to prepare, administer, and score (compared to written tests). However, once the assessment is created, that same assessment can be administered for several school years. This is not the case of a written test. It takes more time for a learner to complete a performance assessment than a written test and additional time for the educator to score it. An educator cannot rely on computer algorithms to score the assessment, such as the case of a fixed response or constructed-response test.
As with long constructed response, the educator must be aware of scoring bias. The reliability of the assessment results might be questioned if the educator is scoring their own learners on a performance based or essay assessment. Therefore, it is important to have a well-defined scoring plan that is consistently followed. Scoring options include (1) comparison with a model, (2) checklists, (3) rating scales, and (4) rubrics. The scoring plan can be scored analytically or holistically and may or may not involve numbers. The important part of a scoring plan is that qualitative feedback and numerical score is given to the learner. 
A performance assessment does not generalize across domains since the educator is evaluating the learner’s ability to perform each task. In the case of a meteorology student, can they create a precipitation forecast involving a cold front as well as one involving a tropic storm?

Work Cited
Oosterhof, A., Conrad, R.-M., & Ely, D. P. (2008). Assessing Learners Online. Upper Saddle River: Merrill/Prentice Hall.

Thursday, February 20, 2014

Blog 6 - Construct-response and Fixed-response

Describe the differences between constructed-response and fixed-response written assessments. Describe the benefits of using both in eLearning. Finally, describe the necessity for a balance between teacher-graded and computer-graded assessment items.

As stated in blog #5 (http://maj-eln.blogspot.com/2014/02/blog-5-constructed-response-and-fixed.html), there are differences between constructed-response and fixed-response assessments.  Constructed-response items, often known as “fill-in-the-blank” allows the student must enter or write out their answer. Whereas in fixed test items the student selects their answer from the response options. The most common types of fix-response items are multiple-choice or true-false test items. However, variations of fixed-response include matching, ranking, multiple true-false, and embedded-choice items. (Refer to blog #5, http://maj-eln.blogspot.com/2014/02/blog-5-constructed-response-and-fixed.html, for further differences between these two response types.)
Both of these test construction options have benefits in the eLearning environment. With both constructed-response and fix-response the student enters their answer and the computer can automatically evaluate whether the student has answered the test item correctly or not and assign the appropriate score to that test item. The drawback, however, in a constructed-response is often computer programs use an algorithm that is based on letter recognition. If the student misspells the word or inserts an extra space, the computer would mark this as an incorrect answer. A way around this is for the teacher to have the ability to overwrite the computer’s score and assign the appropriate points following the test item’s scoring plan. In the online assessment program I’m familiar (Galileo, www.ati-online.com) with this is an easy process as I can see the students who have the incorrect answer and quickly evaluate the student’s response and enter the appropriate points.

In addition to a “fill-in-the-blank” type of constructed-response, a teacher may create a short or long essay to assess the student’s mastery of the stated performance objective. A short essay should be able to be answered in less than 10 minutes (Oosterhof, Conrad, & Ely, 2008). “How does the Moon’s appearance change during a four-week lunar cycle?” is an example of a short essay.  An example of a long essay test item is “What important contributions has space exploration contributed our everyday life? Provide at least two of these contributions. Explain why and how each contribution has impacted everyday life.”
There can be many fixed-response test items on a single test. A teacher can assess many instructional objectives since declarative and procedural knowledge is assessed. The computer scores these items quickly and accurately. Unfortunately, a student can easily guess and/or cheat on this type of test.

Even though a constructed response measures instructional objective more directly, a problem is that there should only be so many essay questions/prompts per test. Another problem with essay style test items in an eLearning environment is that students must type their answer and they might not be good typist. The students need to know how they are being assessed when answering such prompts, so a scoring plan should be provided to them upfront. The scoring plan must be well defined, not only for the student but also for the teacher (or whoever is reviewing and scoring the student’s response). High-stakes assessments, such as state standardized testing, has long written responses which computer algorithms can help score, but human intervention is still required. For a formative assessment, the teacher often must manually review this and score the student’s response. This can be a time-consuming process if there is a large class size. Additionally, inter-rated reliability is a factor. If there are four teachers giving the same test in which there are test items that must be manually scored by the teacher, will all four teachers be faithfully following the defined scoring plan? Should the teachers share the scoring (e.g., teacher A score teacher B’s test)? Will the scoring rotation delay the feedback to the student? Providing the student with timely feedback is important, not only in an eLearning environment but also in the traditional “brick and mortar” environment.

Speaking of feedback, the student can get immediate feedback on each test item or at the end of the assessment.  However, the test creator must ensure that the feedback is appropriate. In a constructed-response, the student is asked “What is 2 plus 3?” and they answer “5” the computer can display “great!” or “you’ve answered correctly.” If the student answered incorrectly, a message “Sorry, the answer is 5” along with an explanation of why. Providing correct/incorrect feedback can communicate misleading information to the students who select the correct answer, but for the wrong reason. Or at the end of the test, a final score is provided to the student.
The teacher must keep time constraints in mind. A constructed-response test is going to take longer for the student to complete than a fixed-response test.  A teacher must decide when it is better to have a test be administered online with all the test scoring conducted electronically and when a teacher-intervention is needed. Creating a test online takes time, depending on the teacher’s experience in test construction and their learning management system.

Some assessments are not suitable for an eLearning, such as in the case of needing to measure a performance based skill (e.g., dancing skills). A teacher needs to balance between constructed and fixed-response test types and those that might be more performance based. Performance based items will need to be teacher-graded with a clearly defined scoring plan.
When a test is created, the teacher should consider placing constructed response, especially essay types of test items, at the end of the test. This way the student can answer the easier items first and more challenging questions later. A written test should have a variety of fixed-response items followed by more challenging constructed-response items.

The test creator needs to have a balance between computer-graded and teacher-graded assessment items. This balance not only helps the educator but also the student. By providing a mixture of computer-graded and teacher-graded tests, the computer can quickly and accurately grade fixed-response and “fill-in-the-blank” constructed-response items, leaving the teacher time to review the only the incorrect constructed-response items and time to review and score the short and long essay constructed-response items. Students have the experience of easier to more challenging test items through the different types of test items as they progress through their test. It is important that both types of assessments provide the students with clear, meaningful, and timely feedback.
Work Cited
Oosterhof, A., Conrad, R.-M., & Ely, D. P. (2008). Assessing Learners Online. Upper Saddle River: Merrill/Prentice Hall.

Friday, February 14, 2014

Blog 5 - Constructed-response and Fixed-response


Describe the differences between constructed-response and fixed-response assessments. When would you use each type of assessment in eLearning? Why?

As stated in blog #4 (http://maj-eln.blogspot.com/2014/02/blog-4-pros-and-cons-of-constructed.html), there are two different types of constructed response test items -- completion items and essay items. A completion item is often known as a “fill-in-the-blank” item. The student often just completes the sentence.  In an essay item (short or long format) the student provides a narrative response to the test item.  In a constructed response test item, the student must enter or write out their answer.
Fix-response test items prompt the student to select their answer from the response options. The most common types of fix-response items are multiple-choice or true-false test items. However, variations of fixed-response include matching, ranking, multiple true-false, and embedded-choice items.

Time Considerations

Student can answer more multiple choice questions in a shorter period of time than constructed response items. Students generally can answer 1 multiple-choice item per minute and 2 true-false test items per minute (Oosterhof, Conrad, & Ely, 2008). Fixed-response would be a great way to quickly assess student at the beginning of a concept to be able to measure prior knowledge or to quickly check for understanding in the middle or end of a concept.

For both types of assessment options, it takes time to write a well-crafted test item to measure the indented performance objective. The teacher’s challenge is to write constructed response items so that there is only one correct answer since there can be multiple answers. A teacher may take only a few minutes to create a constructed response items, such as a short answer or essay; they need to make sure that the question is measuring the instructional objectives and the scoring plan is well-defined. Creation of a well-defined scoring plan can be time consuming.
A fixed-response test item also takes time to write since the teacher must create an appropriate stem and response options. The response options should be able to measure not only what the student knows but the distractors should help the teacher in identifying the student’s thought process. For example when creating a multiple-choice item about the American Revolution, the teacher might include some prominent historical figures from the time but not directly associated with the Revolutionary War.  All the wrong answers are plausible, but not valid for that particular test item.

With online assessments, scoring can be instant. With a fixed-response, computers can quickly score and grade a student’s test.  There is consistency and objective scoring since the computer can quickly do this and the teacher doesn’t have to be involved. The computer can quickly check the student’s selection (e.g., A, B, C, etc. for multiple-choice or true/false) and assign the student the correct point value for the correct answer. Students can get immediate feedback on how they did on their test.
Constructed response items can also be scored using technology. A response textbox is part of the test item; the student types their answer in the provided textbox. The computer program automatically scores and grades their response using letter recognition algorithm. Since a student can make mistakes typing but response is recognizable, teacher has the ability to overwrite computer’s score and assign partial credit (points) following the test item’s scoring plan.

The scoring plan should be clearly defined. Oosterhof, Conrad, and Ely (2008, p. 92) states that a scoring plan should have 3 characteristics: (1) total number of points assigned to the item based on its importance relative to other items, (2) specific attributes to be evaluated in students’ responses, and (3) for each attribute, criteria for awarding points, including partial credit. A scoring plan is essential for constructed response (especially short answer or essay).
Student Knowledge
In a fixed-response, when students know something of the subject, they have a better chance of getting the answer correct over a constructed response item. Part of this is due to guess parameters (guessing is discussed later in this blog) and part of it can be attributed to recognition of terms or concepts.

Immediate student feedback is possible in both fixed-response and constructed response. As mentioned above, the scoring is done using technology so the student can get immediate feedback on fixed-response test items. The student selects “A” for their answer and the student has it right or wrong. For a constructed response, the student can still receive feedback, but the feedback may be delayed if the teacher must review the test item (especially in a fill-in-the-blank scenario where the computer is looking for letter-to-letter recognition) or when there is a score plan and thus teacher review.
Both fixed-response and constructed response items can measure declarative and procedural knowledge what Webb refers to as level 1 (http://www.aps.edu/rda/documents/resources/Webbs_DOK_Guide.pdf) and Bloom’s Taxonomy refers to as remembering (http://ww2.odu.edu/educ/roverbau/Bloom/blooms_taxonomy.htm).  A well-crafted test item (either fixed-response or constructed response) can measure various procedural knowledge capabilities when well-written. The higher knowledge areas involves more complex skills such as problem solving is where both fixed-response and constructed response fall short. So based on the type of information the teacher is looking to get about their student’s mastery of the subject, it may be necessary to use fixed-response and constructed response items.

Measurable Items

Both types of test items can be used in variety of subject areas (e.g., English, math, science, social studies). A teacher can easily write a constructed response on factual knowledge of space exploration and write fixed-response multiple choice items on the same topic. However, both test items may not be suitable for other subject areas such as music and even certain concepts/aspects of science and math. In music, for example, a teacher cannot write a constructed response or fixed-response for Arizona’s music standard Strand 1: Create, Concept 2: Playing instructions, alone and with others, music from various genres and diverse cultures (http://www.azed.gov/standards-practices/files/2011/09/music.pdf).  In science one cannot write test items to address science problems and math computations. Other test item formats must be used to address these concepts.

Fixed-response items are susceptible to guessing. For example a four alternative test item, the student has 25% chance of selecting the correct answer; a true-false the student has a 50% chance of selecting the correct answer. Test reliability increases when there are multiple-choice, alternate-choice, and essay test items in the same assessment.
For constructed response test it may include many items in one test. This allows more adequate sampling of content and thus increases the number of test items must be included.  Compared to short answer or essays, fixed-response allows for better sampling of content.

Format

It goes without saying, but both types of test items need to be free of grammatical errors and extraneous wording. The constructed response must be written in such a way that there is only a single or homogeneous set of responses. Multiple-choice fixed-responses must have a stem that clearly presents the problem to be addressed and the grammar in each option is consistent with the stem.

The teacher (that is the test writer) must keep in mind the reading level of the students. Is the student being assessed on their reading skills or their knowledge of the learning objective? Using vocabulary and wording that is of higher reading level than the student prevents the educator from knowing if the student did not master the objective because of their reading level or because of the subject matter.
Constructed response may be more suitable for younger students since they can just “fill-in-the-blank” as they read on the computer. Fixed-response may be more challenging for younger students since there is often much more reading involved. Therefore, it is important to note that both of these testing options may not be appropriate for younger students if their reading skills have not yet fully developed.

Conclusion

In an eLearning environment, the educator needs to use both fixed-response and constructed response test items. However, it is important to utilize them appropriately. Looking to see if the student can demonstrate a realistic task, such as those required in the workforce, students should not be asked to answer multiple-choice or true/false questions, but extended constructed response so that they can demonstrate their organize and communicate their thoughts. 

 
Work Cited
Oosterhof, A., Conrad, R.-M., & Ely, D. P. (2008). Assessing Learners Online. Upper Saddle River: Merrill/Prentice Hall

Tuesday, February 4, 2014

Blog 4 - Pros and Cons of Constructed-response


Describe the various types of constructed-response assessments. What are the advantages and disadvantages of using these types of assessments? Include pros and cons of making the exam as well as grading and giving feedback.
In a written assessment, there are two categories of assessment test items – constructed-response and fixed-response. A constructed-response item includes completion and essay formats. Students enter or write their response rather than selecting the answer among options, such as in the case of multiple-choice options. Most educators and students are used to a multiple choice or true-false test – that is what fixed-response test items is.
There are two types of constructed-response items – completion items and essay items. A completion item is often known as a “fill-in-the-blank” item. The student often just completes the sentence.  In an essay item the student provides a narrative response to the test item.

Developing Assessments: A Guide to Multiple Choice, Constructed-Response, Thematic Essays, and Document Based Questions (http://www.edteck.com/michigan/guides/Assess_guide.pdf) provides the foundation of creating test items (e.g., test aligned to school district standards, assesses a variety of cognitive levels, uses authentic materials, and assesses a range of skills. This document further provides guidelines for constructed-response test items, including scoring.

The three advantages to the completion format:  (1) ease of construction, (2) student-generated answers, and (3) the ability of including many items in one test. With advantages there are also disadvantages, which are (1) limited to measuring recall information, what Webb refers to as level 1 (http://www.aps.edu/rda/documents/resources/Webbs_DOK_Guide.pdf) and Bloom’s Taxonomy refers to as remembering (http://ww2.odu.edu/educ/roverbau/Bloom/blooms_taxonomy.htm) and (2) scoring errors occurs over objectively scored items.
Completion Format Advantages and Limitations


Advantages

Limitations
Ease of construction
  • Readily measures recall of information 
  • There is no detailed scoring plan
Limited to measuring recall of information
  • Often does not measure procedural knowledge


Student generates the answer

  • Students do not have to solve the problem presented
  • Minimizes guessing
  • Reliability better than multiple-choice
  • Test item must be carefully constructed to ensure that student response is identical to desired response (especially when test is electronically scored)

Scored erroneously

  • Since there can be a variety of responses
  • Answer choices are multiple-choice, true-false, or other alternate-choice format
  • Guessing parameters of multiple-choice is .25 (for a 4 item test item)
  • Electronically scored responses errors can occur

Include many items in one test

  • More adequate sampling of content
  • Since there is more sampling of content, increased number of test items must be included
  • Increase generalizability of test scores




 According to Oosterhof, Conrad, and Ely (2008, p. 88), when writing completion items, it is important for the educator to use the following 8 criteria items:
  1. Does this item measure the specific skills?
  2. Is the reading skill required by this item below the students’ ability?
  3. Will only a single or very homogeneous set of responses provide a correct response to the item?
  4. Does the item use grammatical structure and vocabulary that is different from the contained in the source of instruction?
  5. If the item requires a numerical response, does the question state the unit of measure to be used in the answer?
  6. Does the blank represent a key word?
  7. Are blanks placed at or near the end of the item?
  8. Is the number of blanks sufficiently limited?


When scoring a completion format, there is less objective than other test item formats (e.g., multiple-choice or true/false) since the student supplies their own response. An educator’s challenge is to write completion items so that there is only one correct answer since there can be multiple answers. The educator should include in their scoring plan the correct answer and, when applicable, a list of other acceptable alternatives. The scoring plan ensures that the educator scores consistently as it is not fair to accept an answer as right on one student’s test and the same answer not acceptable on another student’s test.

Essay items have a number of strengths over the completion format since they are able to: (1) measure instructional objectives more directly, (2) allows the educator to gain insight into the student’s thoughts, (3) less time-consuming to construct, and (4) provide a more realistic task to the student. The limitations of an essay item, however, is that they (1) provide less adequate sampling of assessed content, (2) there is reliability issues with how the essay item is scored, and (3) there is a time factor.
Essay Test Item Advantages and Limitations
Advantages
Limitations
Measures instructional objective more directly
·         Measures the behavior of the performance objective
Less adequate sampling of assessed content
·         Student must take the time to read and answer the test item, so test cannot include all learned content
·         One broad essay test question should not cover a greater percentage of skills
Student insight
·         Measures higher-level cognitive objectives
·         Student selects, organizes, and integrates information in a logical way
·         Not measuring the student’s writing skills, but rather their mastery of the content. If writing skills are assessed, the writing score should be reported separately.
Reliability issue
·         Educator bias in scoring could affect test reliability
·         Difference in how teachers score
·         Educator test scoring fatigue (papers at the top of the pile scored differently than those at the end)
·         Scores influenced by educator’s student expectation
·          Writing conventions and presentation affects score (although, handwriting would not be a factor in an online test)
Time-consuming to construct
·         Less time-consuming to construct the test
·         Time is spent to ensure accurate scoring plan
Time factor
·         Educator must take time to read and score test (even if automatic scoring is available)
·         If other educators are scoring (inter-rater reliability) it may take additional time for others to read and score tests
·         Time consuming to produce a well-defined scoring plan
·         Student must take the time to read and answer each test item (student should answer within 10 minutes)
Realistic task
·         In the workforce, students are not asked to perform a task with multiple-choice or true/false questions, but rather have to organize and communicate their thoughts

According to Oosterhof, Conrad, and Ely (2008, p. 96), when writing essay items, it is important for the educator to use the following 6 criteria items:
  1. Does this item measure the specified skill?
  2. Is the level of reading skill required by this item below the learners’ ability?
  3. Will all or almost all students answer this item in less than 10 minutes?
  4. Will the scoring plan result in different readers (scorers) assigning similar scores to a given student’s response?
  5. Does the scoring plan describe a correct and complete response?
  6. Is the item written in such a way that the scoring plan will be obvious to knowledgeable learners?



An educator needs to set time aside to score an essay test. This is whether the educator is using technology to automatically score the test or not. Even with automation, the educator should review the students’ answers. Educators may find that essay tests are easier to prepare since fewer questions are included in a test. However, they need to consider not just the test writing component but the test scoring component as well.

The student should have access to the scoring plan prior to answering the essay item. This ensures that they have clear expectations of what is expected of them and provide guidance of responding to the essay. It is important that when scoring an essay test item that there is consistency, thus ensuring that all answers are given the correct point value. The scoring plan should be clearly defined. Oosterhof, Conrad, and Ely (2008, p. 92) states that a scoring plan should have 3 characteristics: (1) total number of points assigned to the item based on its importance relative to other items, (2) specific attributes to be evaluated in students’ responses, and (3) for each attribute, criteria for awarding points, including partial credit.
The using of rubrics (either holistic or analytical) is important. An analytical scoring plan includes a description and point (score) the student receives for all the necessary elements and point value and criteria for partially answering. An overall score is then assigned to the student. The holistic approach takes into consideration how the student’s answer, as a whole. There is no one correct response. This type of approach focuses on quality, and understanding of the content/skills (http://www.uni.edu/chfasoa/analyticholisticrubrics.pdf).  

Work Cited
Oosterhof, A., Conrad, R.-M., & Ely, D. P. (2008). Assessing Learners Online. Upper Saddle River: Merrill/Prentice Hall