Evaluation of Technical Description Writing: An Assessment for ESP Learners in Engineering Programs
Main Article Content
Abstract
This paper reports an empirical evaluation of a CBT (Closed Book Test) designed to assess technical description writing skills among first-year engineering students enrolled in an English for Specific Purposes (ESP) module. Grounded in Bachman and Palmer’s (1996) test usefulness framework, the study examines the assessment in terms of its validity, reliability, practicality, authenticity, interactiveness, and impact. The CBT required students to produce a written description of an electronic object, using appropriate terminology, critically evaluating the product, and suggesting improvements. Test development involved content expert validation, internal and external moderation, and alignment with ESP module outcomes. Data were collected through test scripts from the entire student cohort (N = 34), expert CVI ratings, post-test survey responses (Likert-scale and open-ended items), and moderators’ comments. Analysis included blind marking of all test scripts by two examiners using standardised analytic rubric, paired samples t-test for inter-rater reliability (p = 0.163), and exploratory factor analysis for construct validity. The mixed-methods approach combined quantitative analysis (survey ratings, statistical tests) with qualitative analysis of open-ended survey responses and moderator feedback. The post-test student survey across all six usefulness dimensions yielded consistently high mean scores (4.1–4.5). The evaluation confirmed the CBT's overall test usefulness across all six dimensions through multiple validation methods, with 85% of students affirming its effectiveness in improving their technical writing skills. Limitations include the small sample size, single-institution context, and potential response bias. Future research should focus on scaling the CBT model across institutions and disciplines, implementing hybrid automated scoring systems, refining rubric analytics, and conducting longitudinal studies to examine skill transfer to professional contexts.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Adams, D. J. (2014). Clarity, organisation, precision, economy: A technical writing guide for engineers. University of New Haven, Civil Engineering Faculty Book Series. https://digitalcommons.newhaven.edu/cgi/viewcontent.cgi?article=1000&context=civilengineering-books
Agarwal, P. K., Karpicke, J. D., Kang, S. H. K., Roediger, H. L. III, & McDermott, K. B. (2008). Examining the testing effect with open- and closed-book tests. Applied Cognitive Psychology, 22(7), 861–876. https://doi.org/10.1002/acp.1391
Ahmadjavaheri, Z., & Zeraatpishe, M. (2020). The impact of construct-irrelevant factors on the validity of reading comprehension tests. International Journal of Language Testing, 10(1), 1–10. https://www.ijlt.ir/article_114277_2cf1a67513f425e6dea47b4b181aae97.pdf
Alqurashi, F. (2022). ESP writing teachers’ beliefs and practices on WCF: Do they really meet? Journal of Language and Linguistic Studies, 18, 569–593. http://www.jlls.org
Anthony, L. (2018). Introducing English for specific purposes. Routledge.
Artemeva, N. (2009). Stories of becoming: A study of novice engineers learning genres of their profession. In C. Bazerman, A. Bonini, & D. Figuieredo (Eds.), Genre in a changing world: Perspectives on writing (pp. 158–178). The WAC Clearinghouse and Parlor Press.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford University Press.
Badjadi, N. E. I. (2013). Conceptualising essay tests’ reliability and validity: From research to theory. ERIC. https://files.eric.ed.gov/fulltext/ED542099.pdf
Barkaoui, K. (2008). Effects of scoring method and rater experience on ESL essay rating processes and outcomes [Doctoral dissertation, University of Toronto]. TSpace. https://utoronto.scholaris.ca/items/5a310a23-8dd2-473d-bf47-17d5103612a7
Bartlett, M. S. (1954). A note on the multiplying factors for various χ2 approximations. Journal of the Royal Statistical Society. Series B (Methodological), 16(2), 296–298. https://doi.org/10.1111/j.2517-6161.1954.tb00174.x
Bazerman, C. (1988). Shaping written knowledge: The genre and activity of the experimental article in science. University of Wisconsin Press.
Bobek, E., & Tversky, B. (2016). Creating visual explanations improves learning. Cognitive Research: Principles and Implications, 1, 1–14. https://doi.org/10.1186/s41235-016-0031-6
Carifio, J., & Perla, R. J. (2009). A critique of the theoretical and empirical literature of the use of diagrams, graphs, and other visual aids in the learning of scientific-technical content from expository texts and instruction. Interchange, 40(4), 403–436. https://doi.org/10.1007/s10780-009-9102-7
Cheng, L. (2004). Washback or backwash: A review of the impact of testing on teaching and learning. In L. Cheng, Y. Watanabe, & A. Curtis (Eds.), Washback in language testing (pp. 25–40). Routledge. https://doi.org/10.4324/9781410609731-9
Comrey, A. L., & Lee, H. B. (2013). A first course in factor analysis. Psychology Press.
Davis, L. L. (1992). Instrument review: Getting the most from a panel of experts. Applied Nursing Research, 5(4), 194 –197. https://doi.org/10.1016/S0897-1897(05)80008-4
Dobrin, D. N. (2019). What’s technical about technical writing? In P. V. Anderson, R. J. Brockmann, & C. R. Miller (Eds.), New essays in technical and scientific communication (pp. 227–250). Routledge.
Douglas, D. (2000). Assessing languages for specific purposes. Cambridge University Press.
Dudley-Evans, T., & St. John, M. J. (1998). Developments in English for specific purposes. Cambridge University Press.
Ewer, J. R., & Latorre, G. (1969). A course in basic scientific English. Longman.
Faseeh, M., Nadeem, M., & Arif, M. (2024). Hybrid approach to automated essay scoring: Integrating deep learning embeddings with handcrafted linguistic features for improved accuracy. Mathematics, 12(21), Article 3416. https://doi.org/10.3390/math12213416
Flowerdew, J. (2016). English for specific academic purposes (ESAP) writing: Making the case. Writing & Pedagogy, 8(1), 5–32. https://doi.org/10.1558/wap.v8i1.30051
Fulcher, G., & Davidson, F. (2007). Language testing and assessment. Routledge.
Hair, J. F., Risher, J. J., Sarstedt, M., & Ringle, C. M. (2019). When to use and how to report the results of PLS-SEM. European Business Review, 31(1), 2–24. https://doi.org/10.1108/EBR-11-2018-0203
Hughes, A. (2020). Testing for language teachers. Cambridge University Press.
Hyland, K. (2006). English for academic purposes: An advanced resource book. Routledge.
Hyland, K. (2019). Second language writing. Cambridge University Press.
Hyland, K., & Jiang, F. K. (2017). Is academic writing becoming more informal? English for Specific Purposes, 45, 40–51. https://doi.org/10.1016/j.esp.2016.09.001
Jönsson, A., Balan, A., & Hartell, E. (2021). Analytic or holistic? A study about how to increase the agreement in teachers’ grading. Assessment in Education: Principles, Policy & Practice, 28(3), 212–227.https://doi.org/10.1080/0969594X.2021.1884041
Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36. https://doi.org/10.1007/BF02291575
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000
Knoch, U., & Elder, C. (2010). Validity and fairness implications of varying time conditions on a diagnostic test of academic English writing proficiency. System, 38(1), 63–74. https://doi.org/10.1016/J.system.2009.12.006
Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275–304. https://doi.org/10.1177/0265532208101008
Knoch, U., & Macqueen, S. (2020). Assessing English for professional purposes. Routledge.
Korolyova, L. Y. (2017). The discursive approach to developing tests for ESP assessment in higher educational institutions. Educational Studies Moscow, 2017(4), 167–172. https://doi.org/10.17277/voprosy.2017.04
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4(1), 84–99. https://doi.org/10.1037/1082-989X.4.1.84
Malmström, H., Pecorari, D., & Shaw, P. (2018). Words for what? Contrasting university students’ receptive and productive academic vocabulary needs. English for Specific Purposes, 50, 28–39. https://doi.org/10.1016/j.esp.2017.11.002
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan.
Naqvi, S., Srivastava, R., Al Damen, T., Al Aufi, A., Al Amri, A., & Al Adawi, S. (2023). Establishing reliability and validity of an online placement test in an Omani higher education institution. Languages, 8(1), Article 61. https://doi.org/10.3390/languages8010061
Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? Critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147
Quah, B., Zheng, L., Sng, T. J. H., Yong, C. W. & Islam, I. (2024). Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations. BMC Medical Education, 24, Article 962. https://doi.org/10.1186/s12909-024-05881-6
Rachmawati, D. L., & Hastari, S. (2022). Formative assessment as an innovative strategy to develop ESP students’ writing skills. Voices of English Language Education Society, 6(1), 78–90. https://doi.org/10.29408/veles.v6i1.5174
Rahman, K. A., Seraj, P. M. I., Hasan, M. K., & Rahman, M. M. (2021). Washback of assessment on English teaching-learning practice at secondary schools. Language Testing in Asia, 11(1), Article 12. https://doi.org/10.1186/s40468-021-00129-2
Ramineni, C., & Williamson, D. (2018). Understanding mean score differences between the e-rater® automated scoring engine and humans for demographically based groups in the GRE® general test. ETS Research Report Series, 2018(1), 1–31. https://doi.org/10.1002/ets2.12192
Rashtchi, M., & Khoshnevisan, B. (2020). Lessons from critical thinking: How to promote thinking skills in EFL writing classes. European Journal of Foreign Language Teaching, 5(1), Article 3153. https://doi.org/10.46827/ejfl.v5i1.3153
Sèna, U. O. (2022). Appraising the challenges related to the teaching of ESP to advanced learners in Beninese higher education. Journal of English Language and Literature, 9(1), 118–136. https://doi.org/10.54513/JOELL.2022.9114
Stevens, J. P. (2012). Applied multivariate statistics for the social sciences. Taylor & Francis. https://doi.org/10.4324/9780203843130
Swales, J. M. (1990). Genre analysis: English in academic and research settings. Cambridge University Press.
Tabachnick, B. G., Fidell, L. S., & Ullman, J. B. (2019). Using multivariate statistics (7th ed.). Pearson.
Tomas, C., Whitt, E., Lavelle-Hill, R., & Severn, K. (2019). Modeling holistic marks with analytic rubrics. Frontiers in Education, 4, Article 89. https://doi.org/10.3389/feduc.2019.00089
Wang, Q., & Gayed, J. M. (2024). Effectiveness of large language models in automated evaluation of argumentative essays: Finetuning vs. zero-shot prompting. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2024.2371395
Weigle, S. C. (2002). Assessing writing. Cambridge University Press.
Williamson, D. M., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13. https://doi.org/10.1111/j.1745-3992.2011.00223.x
Winke, P., & Lim, H. (2017). The effects of test preparation on second-language listening test performance. Language Assessment Quarterly, 14(4), 380–397. https://doi.org/10.1080/15434303.2017.1399396