Folios

0123-4870

Universidad Pedagógica Nacional

https://doi.org/10.17227/folios.62-20705

Recibido: 24 de enero de 2024; Aceptado: 28 de enero de 2025

Designing an Assessment Course for In-Service English Language Teachers in Colombia

Desenvolvimento de um curso de avaliação para professores de inglês em serviço na Colômbia

Diseño de un curso de evaluación para docentes de inglés en servicio en Colombia

F. Giraldo, 1

1 M. A. in Teaching English as a Second Language. Departamento de Lenguas Extranjeras - Universidad de Caldas en Manizales, Colombia. frank.giraldo@ucaldas.edu.co Universidad de Caldas Manizales Colombia frank.giraldo@ucaldas.edu.co

Abstract

Research on language assessment literacy (lal) has focused on describing this construct for different stakeholders involved in language assessment, while paying secondary attention to pedagogical initiatives aimed at fostering lal among teachers. This research article reports on a diagnostic study which sought to describe the lal of forty Colombian English language teachers; the purpose of this diagnostic was to use the teachers’ feedback to design an online language assessment course. Through a mixed-methods research design, a questionnaire and a content analysis scheme were used for data collection and analysis. The convergent findings indicated issues in the design of assessment instruments and a related need to develop the teachers’ design dimension of their lal. Implications for the design of the course are presented, based on the data generated in the study.

Keywords:

language assessment, language assessment literacy, language testing, teacher professional development, validity in language assessment.

Resumo

As pesquisas sobre avaliação em contexto de línguas (acl) têm se concentrado em descrever esse constructo para diferentes atores envolvidos na avaliação de línguas, enquanto têm dado atenção secundária às iniciativas pedagógicas voltadas para o fortalecimento do acl entre os professores. Este artigo de pesquisa relata um estudo diagnóstico que buscou descrever o acl de quarenta professores colombianos de inglês; o objetivo do diagnóstico foi utilizar o feedback dos professores para projetar um curso online de avaliação linguística. Por meio de um desenho de pesquisa de métodos mistos, foram utilizados um questionário e um esquema de análise de conteúdo para a coleta e análise dos dados. Os achados convergentes indicaram problemas no desenho de instrumentos de avaliação e uma necessidade relacionada de desenvolver a dimensão de design do acl dos professores. As implicações para o desenho do curso são apresentadas com base nos dados gerados no estudo.

Palavras-chave:

avaliação de línguas, avaliação em contexto de línguas, testes de língua, desenvolvimento profissional docente, validade na avaliação de línguas.

Resumen

La investigación sobre la Literacidad en Evaluación de Lenguas (lel) se ha centrado en describir este constructo para diferentes actores involucrados en la evaluación lingüística, prestando una atención secundaria a las iniciativas pedagógicas destinadas a fomentar la lel entre los docentes. Este artículo de investigación presenta un estudio diagnóstico que buscó describir la lel de cuarenta docentes colombianos de inglés; el propósito de este diagnóstico fue utilizar la retroalimentación de los docentes para diseñar un curso en línea sobre evaluación de lenguas. A través de un diseño de investigación de métodos mixtos, se emplearon un cuestionario y un esquema de análisis de contenido para la recolección y el análisis de datos. Los hallazgos convergentes indicaron problemas en el diseño de instrumentos de evaluación y una necesidad relacionada de fortalecer la dimensión de diseño en la lel de los docentes. Se presentan las implicaciones para el diseño del curso, basadas en los datos generados en el estudio.

Palabras clave:

evaluación de lenguas, literacidad en evaluación de lenguas, pruebas de lengua, desarrollo profesional docente, validez en la evaluación de lenguas.

Introduction

Language assessment literacy (henceforth lal) refers to the knowledge, skills, and principles that various stakeholders (e.g., language testers, applied linguists, language teachers, etc.) need to participate in assessment-related tasks, namely the design and evaluation of language tests (Davies, 2008; Fulcher, 2012; Kremmel & Harding, 2020). Existing research on lal has thoroughly described what this construct entails, even though it differs across stakeholder groups (Kremmel & Harding, 2020; Pill & Harding, 2013; Taylor, 2013). Regarding language teachers, scholars (Giraldo, 2022; Inbar-Lourie, 2013, November; Scarino, 2013; Stabler-Havener, 2018; Taylor, 2013) have suggested that teachers’ lal comprises, among other elements, the following:

Theoretical knowledge: models and frameworks of language ability; language pedagogy; concepts in language testing.

Contextual knowledge: personal beliefs, experiences, and needs; local assessment guidelines and practices.

Practical skills: designing assessment instruments for all language skills; analyzing assessment data; evaluating language performance reliably and validly; connecting assessment to teaching and learning.

Principles: ethics, fairness, democracy, and transparency.

Research on teachers’ lal has also focused on the skills and needs these stakeholders have (Fulcher, 2012; Sultana, 2019; Tsagari & Vogt, 2017; Vogt & Tsagari, 2014). Findings in this area have shown that, in general, teachers do not feel sufficiently prepared for assessment. As authors have suggested, this may be a result of inappropriate or non-existent training in language assessment at the pre-service level (Lam, 2015; Vogt & Tsagari, 2014). Thus, a research avenue that has been solidifying in later years is the development and implementation of lal training for teachers.

Overall, as it may be expected, lal training has had an overwhelmingly positive impact on teachers. Specifically, they develop an awareness of concepts related to language testing (Baker & Riches, 2018; Boyd & Donnarumma, 2018; Kremmel et al., 2018); learn how to design professional assessment instruments (Koh et al., 2018; Montee et al., 2013; Restrepo, 2021); and analyze how assessment can positively impact teaching and learning (Arias et al., 2012; Baker & Riches, 2018).

Although research has clearly shown the impact of lal training on teachers, a current gap exists in research on how such lal training initiatives—whether in the form of workshops or courses—have been designed. Montee et al. (2013), for example, mentioned that they used a questionnaire and the analysis of assessment instruments to create an assessment course for teachers of less-commonly taught languages. Similarly, Arias & Maturana (2005) conducted a diagnostic study on language teachers’ assessment practices and found problems regarding teachers’ theoretical and practical knowledge for language assessment. The findings in this study led to a language assessment training initiative, reported in Arias et al. (2012); however, Arias et al. did not discuss how such lal training was designed.

Perhaps more importantly, research on the design of lal courses could provide valuable feedback to advance lal pedagogies—a current call in lal research (Fulcher, 2020; Giraldo, 2022). Thus, the purpose of this article is to report on the design of an online language assessment course for in-service English language teachers in Colombia. Two related research questions guided the design of the course:

1.What are the language assessment literacy needs of a group of English teachers in Colombia?

2.How can these needs inform the design of an assessment course for these teachers?

This article begins with a literature review focusing on pedagogical approaches to fostering teachers’ lal. It then presents the research context and methodology, followed by findings and related discussions. The article concludes with final reflections and, most importantly, implications for the design of the online assessment course for the teachers in this study.

Literature Review

lal Training for Teachers

Davies (2008) and Malone (2017) explain that a major source of lal is language testing textbooks. As these authors suggest, the field has shown increasing concern with providing lal for classroom teachers, which is evident in textbooks that directly target these stakeholders. Additionally, they explain that while there remains an emphasis on theoretical and technical aspects of testing, there is also growing attention to the critical and social dimensions of assessment—i.e., the principles side of lal (e.g., ethics and fairness).

Language testing courses in undergraduate and graduate programs are another source of lal reported in the literature. Findings in this aspect show that these courses, like textbooks, tend to focus on theoretical and practical aspects of assessment, while not placing sufficient emphasis on principles (Brown & Bailey, 2008; Fulcher, 2012; Lam, 2015; O’Loughlin, 2006).

The Nature of Assessment Courses for Language Teachers

A review of assessment courses for language teachers (for in-depth reviews, see Giraldo, 2021; Gan & Lam, 2022) suggests three trends in the literature. First, courses for teachers traditionally focus on assessment qualities such as reliability and validity, as well as the purposes that assessment serves (Arias et al., 2012; Delgado & Rodriguez, 2022; Kleinsasser, 2005; Montee et al., 2013; Nier et al., 2009; O’Loughlin, 2006). Second, most courses have a strong emphasis on analyzing assessment instruments to promote lal, followed by the design of contextualized assessments (Baker & Riches, 2018; Kleinsasser, 2005; Giraldo et al., 2023; Koh et al., 2018; Kremmel et al., 2018; Levi & Inbar-Lourie, 2020; Montee et al., 2013; Restrepo, 2021). Finally, in conjunction with the practical aspect of these courses, teachers lal is evaluated through the analysis of the test tasks and instruments they create (Baker & Riches, 2018; Levi & Inbar-Lourie, 2020).

As explained earlier, there are few studies reporting—in a detailed manner—on the actual design of assessment courses for language teachers. Thus, in the following sections, this article describes how an online language assessment course for in-service English teachers in Colombia was developed.

Methodology

A convergent mixed-methods approach was employed, whereby quantitative and qualitative data were collected to strengthen the findings and support informed decisions for course design. Mixed-methods approaches allow for thorough descriptions of phenomena, as data from various angles support broader and more nuanced interpretations (Edmonds & Kennedy, 2017; Ivankova & Greer, 2015). In the present study, qualitative data came from the analysis of assessment instruments that teachers shared, while the quantitative data were gathered through a questionnaire completed by the teachers. The research approach and findings from this study may provide useful insights for a larger audience: Other researchers interested in designing online assessment courses for language teachers may draw on these findings to inform their own lal contexts.

Context of the Study

Various scholars in Colombia have called for the professional development of language teachers in lal, at both the pre- and in-service levels (Giraldo & Murcia, 2019; Hernández-Ocampo, 2022; Herrera & Macías, 2015; López & Bernal, 2009). However, few studies report on lal training specifically for in-service language teachers in this country. In fact, most existing research has focused on lal courses for pre-service teachers (Giraldo & Murcia, 2019; Jaramillo-Delgado & Gil-Bedoya, 2019; Restrepo, 2020), although a growing body of research on lal is emerging on in-service teacher training (Giraldo, 2024; Restrepo, 2021).

In response to the need to foster in-service teachers’ lal in Colombia, a project was proposed.^² The project was divided into two major stages: 1) A diagnostic stage to collect data and design the lal course and 2) an implementation stage to deliver it. The findings presented in this article are based on the diagnostic stage, which served as feedback for the design of the Online Language Assessment Course (henceforth olac).

To gather participants for the olac, information about it was emailed to all Secretarías de Educación (Offices of Education) in all 32 departments of Colombia. Additionally, the information was shared via academic contacts (professors and academics) and through personal social media (Facebook and WhatsApp). Any English language teacher in Colombia was eligible to participate, as olac was initially envisioned as a mooc (Massive Open Online Courses).

A comprehensive, informative consent letter—available in both English and Spanish—was included in the call for participants to ensure that teachers were fully aware of the project’s scope and the implications of their participation in the olac. Data collection took place in March 2023, and data analysis—used to inform the olac design—took place between May and July 2023.

Participants

Forty English language teachers from various regions of Colombia participated in the diagnostic stage of the aforementioned project. They were English language teachers in various educational contexts, ranging from elementary schools to universities. Figures 1 to 3 present information about participants’ locations (by department), their years of experience teaching English, and the kinds of learners they taught at the time. Figure 4 provides information regarding teachers’ prior training in language assessment.

Number of Teachers Participating in Diagnostic Stage (per State)

Types of Learners the Participants Teach (ranked)

Corpus

To participate in the olac, the teachers were asked to share two assessment instruments: one for receptive skills (listening or reading) and one for productive skills (speaking or writing). In total, the teachers shared 80 documents, of which 40 were used in the final analysis.

Instrument and Procedures

Questionnaire. All participants completed an online questionnaire (designed using Google Forms), which was divided into five sections:

Section 1 (9 items) gathered background information (e.g., name, email, place of work).
Section 2 (8 items) focused on prior training in language assessment.
Section 3 (9 items) asked teachers about the types of assessment activities in which they had been involved.
Section 4 (3 open-ended items) explored teachers’ learning goals, challenges in assessment, and expectations for the course.
Section 5 (32 items) presented a list of suggested course topics—31 closed-ended and one open-ended item. The response scale ranged from 1 = Not important at all to 5 = Extremely important.
Appendix A includes the descriptive statistics for all the 31 closed-ended items in Section 5.

The questionnaire used in this study was based on Giraldo & Yan (2023) and piloted with 19 teachers. During the pilot, Cronbach’s alpha was calculated at 0.92, which suggests a satisfactory level of internal consistency (Dörnyei & Taguchi, 2010). The teachers in the pilot exercise also provided comments on the usefulness of the questionnaire to design the olac. They agreed that the instrument was fit for purpose and that it included topics they found relevant for their assessment practice. One minor change was made to the questionnaire, based on the teachers’ feedback: include an age-range that was missing in Section 1. In the final administration of this questionnaire with the 40 teachers, Cronbach’s alpha was 0.90, which reiterated the satisfactory internal consistency of this instrument.

Analysis grid for the assessment instruments. A simple table was used to evaluate the strengths and areas for improvement in the assessment documents the teachers shared. This table included three columns: A code for each assessment instrument (e.g., Teacher1L), a section to record strengths, and a section to record aspects to improve. Analyzing language assessments at face value is not an easy task. To approach it objectively, a set of design guidelines was used, the most important of which are summarized below (based on Alderson et al., 1995; Brown, 2011; Carr, 2011; Fulcher, 2010, among others):

For receptive skills:

All the items in the reading/listening test
have the potential to collect information about these skills and not others;
do not overlap, i.e. one item does not give the answer to the other;
are unambiguous and written in clear and correct language;
do not have more than one answer in the multiple choice or true false formats, except where expressed otherwise in the instrument.
For productive skills:
The assessments for speaking/ writing
include language-based criteria;
do not have overlapping criteria;
are based on a realistic or performance-based task;
avoid the assessment of construct-irrelevant aspects;
include clear scoring, where applicable.
Of the 80 documents shared, 40 instruments were appropriate for content analysis—that is, they were skills-based language assessments. Table 1 shows how the 80 documents were categorized.

Data Analysis

The qualitative data in this study came from the content in the corpus of language assessments and the open-ended items in the questionnaire. To analyze these data, qualitative content analysis (qca) was used. According to Schreier (2012), in qca, a coding frame can be used, which is composed of main categories (or dimensions), subcategories, and units of coding. The frame can be arrived at through data-driven and concept-driven (theoretical) analysis. Table 2 is an example of how qca was used for the data in the present research report.

The quantitative data were analyzed using descriptive statistics, including mean, median, range, and ranks. These calculations were done for Section 5 of the questionnaire, which asked teachers to rate the importance of various topics to be included in the olac. In line with the mixed-methods approach for this case study, data from the questionnaire and the qca were grouped to account for the major findings, which are presented and discussed in the next section.

Findings and Discussion

This section presents and discusses the findings emerging from qualitative and quantitative data on Colombian English teachers’ feedback used to design an online language assessment course. To structure the report, results from the qualitative content analysis ware presented first, followed by quantitative data from the questionnaire. Each finding will be followed by a discussion against relevant existing literature.

Issues with the Design of Receptive Skills Assessments

The analysis of the 34 assessment instruments for receptive skills led to the identification of design issues that could make the use of these instruments problematic for professional language assessment. Construct-irrelevance was a common design issue among the instruments, in various test methods. Various items and/or tasks for reading/listening seemed to be designed to assess other constructs unrelated to receptive or productive skills. For example, in Sample 1 below, the instrument is designed to assess how to use the words more and than in comparative sentences. Sample 2, on the other hand, was meant to assess listening comprehension; however, the item can be answered without listening as it assesses the ability to identify a synonym for a word in isolation.

Sample 1, from Teacher8R (gap filling):

Complete with more or than.
Stella thinks Simon’s homework was _________ interesting _________ hers.
Sample 2, from Teacher20L (multiple choice):
A synonym for “Snooze” is
a. Take a nap
b. Scamper
c. Sunny Back Porch

Another issue that was common in the instruments, particularly in multiple-choice and true-false items, was the fact that items did not have one unambiguously right answer, which they should. For example, in Sample 3, the teacher includes options that overlap.

Sample 3, from Teacher9L (multiple choice):

The 5 newly industrialised countries mentioned are:

a. Brazil, China, Taiwan, South Korea and India
b. India, China, Brazil, South Korea and Japan
c. China, South Korea, Taiwan, Brazil and Vietnam
d. India, China, Thailand, Brazil and South Korea
Let us suppose that the right answer is d.. However, the countries mentioned in d. are also mentioned in a., b., and c.. That means these other options are partially correct. Similarly, in Sample 4 below, the right answer is not clear as there is none. The item is asking for information that is not stated in the text students were supposed to read, so the item cannot be true nor false.
Sample 4, from Teacher11R (True-False)

Text to be read (only segment that mentions information about Gustavo):

My dad Gustavo isn’t a doctor, he’s a teacher. My mother is very good treating patients and my father is the best teaching English, Spanish and French.
Item:
Gustavo speaks German, Italian and Latin. True or False
According to the information in the text, and the item, one cannot state whether it is true or false that Gustavo speaks any of the three languages.

The teachers’ answers in the questionnaire reiterated a need regarding the design of assessments for receptive skills. Table 3 below shows the results from ranking 31 topics for the online assessment course, where assessing receptive skills was ranked third and creating test items was ranked eighth. Furthermore, notice that the range for assessing receptive skills is 1, which means the 40 teachers agreed on the importance of this topic for the course. The first ranked topic is included in the table as it directly relates to assessing receptive skills.

Other research studies have suggested that teacher-made assessments of receptive language skills tend to have the same issues that are described above: construct irrelevance and ambiguity in answer keys (Arias & Maturana, 2005; Frodden et al, 2005; Giraldo et al., 2023). The design of useful reading or listening tests involves the adherence to numerous guidelines like the ones mentioned in the analysis grid above or discussed by authors (Alderson, 1995; Brown, 2011; Carr, 2011); thus, the teachers who designed these assessments may not have had the training needed to design useful tests of reading or listening comprehension (see Table 4 for details) or may have simply forgotten to follow specific design guidelines. Finally, based on the content analysis and the results in Table 3, the design of assessments for receptive language skills seemed to be a must for designing the online course.

Issues with the Design of Productive Skills Assessments

Whereas 34 assessment instruments of receptive skills were appropriate for analysis, six assessments of productive skills were analyzed: four speaking tests and two writing tests; three main reasons explain the low number of productive-skills tests. First, out of 40 assessment documents, 20 were tests of other language skills: nine tests sought to assess knowledge of vocabulary (e.g., providing the correct written translation for words in Spanish); seven tests targeted knowledge of grammar (e.g., forming syntactically correct sentences through unscrambling); three tests were exercises for students to translate sentences from Spanish into English; and two documents were lesson plans (see Table 1 for how the documents were classified). Second, nine assessments only included the task for learners to perform but did not specify assessment criteria in any way; thus, these nine documents, in reality, were tasks and not assessments. Finally, four teachers did not provide any document related to the assessment of productive language skills.

Issues with construct definition in speaking or writing were common across all six analyzed assessments. On the one hand, four tests included the criteria to be judged in either writing or speaking performances; however, there was no clear description of what was to be judged about the stated criteria. For instance, in Sample 5 below, the teacher was seeking to assess writing skills within a tale-like genre. The assessment criteria were content, accuracy, vocabulary, and punctuation. However, the rubric does not specify what about these criteria was assessed; for instance, there were no details regarding what punctuation marks were the targets for this assessment task.

Sample 5, from Teacher15W

WRITING TEST SCORE:

Read the Canterville’s Ghost again. Then, write a paragraph giving a different end to the story.

Grading criteria.:

Content: ___/10 | Accuracy: __/10 | Vocabulary: __/10 | Punctuation: __/10

On the other hand, the other two tests included criteria that were unrelated to the construct of language ability; in other words, they assessed other aspects in addition to speaking or writing skills. Sample 6 is a rubric that was used to assess speaking; in this instrument, two criteria are not related to the construct of language ability or communicative competence: visual aids and preparedness. Let us suppose that, hypothetically speaking, a student is located at the Superior level in all the criteria. If this is the case, 50% of the performance will be attributed to these two constructs which are not language skills.

In the open answers to the questionnaire, teachers reported that assessing productive skills and rubric creation were aspects they would expect to see in the course. For example, Teacher35 stated that they want to learn about “establishing criteria when assessing speaking”, whereas Teacher 21 reported that one of their challenges is “is how to evaluate the speaking skill.” Regarding rubric creation, Teacher17 stated that they “would like to learn more about qualitative evaluation, how to create rubrics”. Finally, assessing productive skills was ranked fourth, among the 31 topics in the questionnaire; the mean for this item was 4.7, the median 5, the mode 5, and the range 2, suggesting that the forty teachers found this topic very important for the course.

Other research studies have shown that, when it comes to assessing productive language skills, teachers do seem to have trouble clarifying the construct to be assessed (Arias et al., 2012; Giraldo et al., 2023; Levi & Inbar-Lourie, 2020). As the data above suggests, this issue was also present in the instruments that were scrutinized. However, when it comes to including constructs unrelated to language (e.g., preparedness), two perspectives must be considered.

On the one hand, teachers in this study may need to include these other aspects, as part of institutional or even classroom policies for assessment. While these other criteria may pose validity issues for interpreting results from language assessments, Brookhart (2003) argues that in classroom assessment, context is construct-relevant. On the other hand, in general educational assessment, assessing skills unrelated to a class or course is called score pollution, which may be considered an unethical practice (Green et al., 2007; Rasooli et al., 2019); in language testing, this is called construct irrelevance and considered a threat to validity (Fulcher, 2010; Messick, 1989). In conclusion, the assessment of productive language skills must be addressed in the course, specific attention should be given to construct definition -and most importantly- the course should address to what extent assessing constructs unrelated to language ability is valid and ethical within teachers’ specific institutional and/or personal assessment ecosystems.

Instrument Design as a Major lal Need

Especially in the open items of the questionnaire, the most common theme revolved around the need to learn or develop skills regarding the design of assessment instruments. This lal need was expressed either as a learning want, challenge, or as a course expectation. The teachers used words such as tools or instruments to provide this feedback for designing the online course. For instance, the sample below shows how the teacher connects the design of assessments to language learning and development, within their educational context:

Teacher19

I would like to create tools that allow me to indisputably assess what my students learn as the foreign language lessons unfold. I also want to create tools to assess the students’ learning performance after they finish modules and lessons set in any foreign language learning course.

Additionally, Teacher22 comments on their challenge when it comes to test design. According to this teacher, a challenge for them is “establishing criteria when assessing speaking, writing and designing a listening test.” Notice that assessing productive skills comes up as a need again. Finally, Teacher31 states that they expect to “learn to create practical and reliable assessment tools”; this answer further suggests the teachers’ need to have a design-based assessment course.

The quantitative results from the questionnaire also may be interpreted as feedback to design a course in which the design of assessment instruments is a pillar. Table 4 presents three top ranked topics that involve the design of language assessments; the low range of 2 suggests that there was agreement among all the teachers, who thought these topics were very important for the course.

Perhaps not surprisingly, language teachers expect their learning and professional development to be grounded on a practical approach. This expectation has, in fact, been observed in lal research studies which suggest that teachers want to learn about practical aspects in assessment (Fulcher, 2012; Malone, 2013). lal courses, as the literature has shown, have responded to this call and have been based, mostly, on practical test analysis and test design tasks (see the section titled The Nature of Assessment Courses for Language Teachers for a review).

Limitations

Two limitations emerging from this research must be addressed. Initially, the project with which this report is aligned was called “Designing a Massive Online Open Course (mooc) on Foreign Language Assessment”. The goal was to create a course that would welcome a high number of Colombian English language teachers, which is why the invitation to participate was sent to the Offices of Education in all 32 states of Colombia. However, given that 40 teachers participated in the diagnostic stage, the mooc became a small-scale assessment course. Therefore, the results of this sample are relevant to the ecology of this course and no generalizations can be made to other contexts. Notwithstanding the small nature of this research, the findings in this study may provide feedback for stakeholders interested in designing online assessment courses for language teachers elsewhere, more specifically, in the Colombian context of language teacher education.

The second limitation relates to the nature of the research design. The lal of the 40 teachers in this study was studied through a questionnaire and content analysis of 80 assessment documents they shared, from which 40 instruments (34 for receptive skills and 6 for productive skills) were appropriate for scrutiny. The information from these two research methods can, in no way, provide a complete picture of teachers’ lal, especially as this construct is developmental (Baker, 2021; Yan & Fan, 2021); thus, the data in this study, while practical and useful for lal course design, is somewhat limited. Perhaps an in-depth, individual interview may have provided more data for course design and more nuances as to the teachers’ situated lal. However, additional research methods may have made the initial project (the mooc explained above) highly resource-heavy and time-consuming.

Conclusions and Implications

The purpose of the present research article was to report on the description of 40 Colombian English language teachers’ lal. The information provided by these teachers became the basis for designing an online assessment course for these key stakeholders in lal research. The convergent analysis of 40 assessment instruments and the results from the questionnaire were useful to reach two conclusions: on the one hand, there was a lack of clarity in the constructs to be assessed and how they were assessed; on the other hand, the assessment of constructs irrelevant to language ability was common. Therefore, as the data also shows, the professional design of assessments was a major need emerging from the data yielded by the two research methods.

Considering the data in this study, the online assessment course should be designed with these instructional principles in mind:

Course sessions should address the design of assessment instruments for receptive and productive language skills.

Guidelines for designing items to assess receptive skills are to be part of the content in the course; for example, how to make items construct relevant; how to design options that do not overlap; and how to spot faulty items and argue why they are faulty.

Special consideration should be given to the theory and practice involved in designing useful rubrics to assess productive skills.

The theoretical concept of validity should be addressed from a contextual, technical, and ethical perspective; in other words, teachers should be taught how to make their assessments more construct-relevant while considering their assessment contexts. Additionally, the course should touch upon how construct irrelevance may lead to unethical or unfair practices in assessment.

Appendix B is the schedule with the weekly topics for the olac. Some commentary accompanies the schedule to substantiate the decision-making process that bridged research and instruction for the present lal initiative.