Evaluation of Technical Description Writing: An Assessment for ESP Learners in Engineering Programs

Samia Naqvi

doi:10.61508/refl.v33i1.286058

PDF

Published: Dec 30, 2025

DOI: https://doi.org/10.61508/refl.v33i1.286058

Keywords:

Test validation English for specific purposes Technical description Validity Reliability Practicality Authenticity Interactiveness Impact

Samia Naqvi

Center for Foundation Studies, Middle East College, Oman

Abstract

This paper reports an empirical evaluation of a CBT (Closed Book Test) designed to assess technical description writing skills among first-year engineering students enrolled in an English for Specific Purposes (ESP) module. Grounded in Bachman and Palmer’s (1996) test usefulness framework, the study examines the assessment in terms of its validity, reliability, practicality, authenticity, interactiveness, and impact. The CBT required students to produce a written description of an electronic object, using appropriate terminology, critically evaluating the product, and suggesting improvements. Test development involved content expert validation, internal and external moderation, and alignment with ESP module outcomes. Data were collected through test scripts from the entire student cohort (N = 34), expert CVI ratings, post-test survey responses (Likert-scale and open-ended items), and moderators’ comments. Analysis included blind marking of all test scripts by two examiners using standardised analytic rubric, paired samples t-test for inter-rater reliability (p = 0.163), and exploratory factor analysis for construct validity. The mixed-methods approach combined quantitative analysis (survey ratings, statistical tests) with qualitative analysis of open-ended survey responses and moderator feedback. The post-test student survey across all six usefulness dimensions yielded consistently high mean scores (4.1–4.5). The evaluation confirmed the CBT's overall test usefulness across all six dimensions through multiple validation methods, with 85% of students affirming its effectiveness in improving their technical writing skills. Limitations include the small sample size, single-institution context, and potential response bias. Future research should focus on scaling the CBT model across institutions and disciplines, implementing hybrid automated scoring systems, refining rubric analytics, and conducting longitudinal studies to examine skill transfer to professional contexts.

How to Cite

Naqvi, S. (2025). Evaluation of Technical Description Writing: An Assessment for ESP Learners in Engineering Programs. rEFLections, 33(1), 1–26. https://doi.org/10.61508/refl.v33i1.286058

Issue

Vol. 33 No. 1 (2026): January - April (In progress)

Section

Research articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

Adams, D. J. (2014). Clarity, organisation, precision, economy: A technical writing guide for engineers. University of New Haven, Civil Engineering Faculty Book Series. https://digitalcommons.newhaven.edu/cgi/viewcontent.cgi?article=1000&context=civilengineering-books

Agarwal, P. K., Karpicke, J. D., Kang, S. H. K., Roediger, H. L. III, & McDermott, K. B. (2008). Examining the testing effect with open- and closed-book tests. Applied Cognitive Psychology, 22(7), 861–876. https://doi.org/10.1002/acp.1391 DOI: https://doi.org/10.1002/acp.1391

Ahmadjavaheri, Z., & Zeraatpishe, M. (2020). The impact of construct-irrelevant factors on the validity of reading comprehension tests. International Journal of Language Testing, 10(1), 1–10. https://www.ijlt.ir/article_114277_2cf1a67513f425e6dea47b4b181aae97.pdf

Alqurashi, F. (2022). ESP writing teachers’ beliefs and practices on WCF: Do they really meet? Journal of Language and Linguistic Studies, 18, 569–593. http://www.jlls.org

Anthony, L. (2018). Introducing English for specific purposes. Routledge. DOI: https://doi.org/10.4324/9781351031189

Artemeva, N. (2009). Stories of becoming: A study of novice engineers learning genres of their profession. In C. Bazerman, A. Bonini, & D. Figuieredo (Eds.), Genre in a changing world: Perspectives on writing (pp. 158–178). The WAC Clearinghouse and Parlor Press. DOI: https://doi.org/10.37514/PER-B.2009.2324.2.08

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford University Press.

Badjadi, N. E. I. (2013). Conceptualising essay tests’ reliability and validity: From research to theory. ERIC. https://files.eric.ed.gov/fulltext/ED542099.pdf

Barkaoui, K. (2008). Effects of scoring method and rater experience on ESL essay rating processes and outcomes [Doctoral dissertation, University of Toronto]. TSpace. https://utoronto.scholaris.ca/items/5a310a23-8dd2-473d-bf47-17d5103612a7

Bartlett, M. S. (1954). A note on the multiplying factors for various χ2 approximations. Journal of the Royal Statistical Society. Series B (Methodological), 16(2), 296–298. https://doi.org/10.1111/j.2517-6161.1954.tb00174.x DOI: https://doi.org/10.1111/j.2517-6161.1954.tb00174.x

Bazerman, C. (1988). Shaping written knowledge: The genre and activity of the experimental article in science. University of Wisconsin Press.

Bobek, E., & Tversky, B. (2016). Creating visual explanations improves learning. Cognitive Research: Principles and Implications, 1, 1–14. https://doi.org/10.1186/s41235-016-0031-6 DOI: https://doi.org/10.1186/s41235-016-0031-6

Carifio, J., & Perla, R. J. (2009). A critique of the theoretical and empirical literature of the use of diagrams, graphs, and other visual aids in the learning of scientific-technical content from expository texts and instruction. Interchange, 40(4), 403–436. https://doi.org/10.1007/s10780-009-9102-7 DOI: https://doi.org/10.1007/s10780-009-9102-7

Cheng, L. (2004). Washback or backwash: A review of the impact of testing on teaching and learning. In L. Cheng, Y. Watanabe, & A. Curtis (Eds.), Washback in language testing (pp. 25–40). Routledge. https://doi.org/10.4324/9781410609731-9 DOI: https://doi.org/10.4324/9781410609731-9

Comrey, A. L., & Lee, H. B. (2013). A first course in factor analysis. Psychology Press. DOI: https://doi.org/10.4324/9781315827506

Davis, L. L. (1992). Instrument review: Getting the most from a panel of experts. Applied Nursing Research, 5(4), 194 –197. https://doi.org/10.1016/S0897-1897(05)80008-4 DOI: https://doi.org/10.1016/S0897-1897(05)80008-4

Dobrin, D. N. (2019). What’s technical about technical writing? In P. V. Anderson, R. J. Brockmann, & C. R. Miller (Eds.), New essays in technical and scientific communication (pp. 227–250). Routledge. DOI: https://doi.org/10.4324/9781315224060-18

Douglas, D. (2000). Assessing languages for specific purposes. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511732911

Dudley-Evans, T., & St. John, M. J. (1998). Developments in English for specific purposes. Cambridge University Press.

Ewer, J. R., & Latorre, G. (1969). A course in basic scientific English. Longman.

Faseeh, M., Nadeem, M., & Arif, M. (2024). Hybrid approach to automated essay scoring: Integrating deep learning embeddings with handcrafted linguistic features for improved accuracy. Mathematics, 12(21), Article 3416. https://doi.org/10.3390/math12213416 DOI: https://doi.org/10.3390/math12213416

Flowerdew, J. (2016). English for specific academic purposes (ESAP) writing: Making the case. Writing & Pedagogy, 8(1), 5–32. https://doi.org/10.1558/wap.v8i1.30051 DOI: https://doi.org/10.1558/wap.v8i1.30051

Fulcher, G., & Davidson, F. (2007). Language testing and assessment. Routledge. DOI: https://doi.org/10.4324/9780203449066

Hair, J. F., Risher, J. J., Sarstedt, M., & Ringle, C. M. (2019). When to use and how to report the results of PLS-SEM. European Business Review, 31(1), 2–24. https://doi.org/10.1108/EBR-11-2018-0203 DOI: https://doi.org/10.1108/EBR-11-2018-0203

Hughes, A. (2020). Testing for language teachers. Cambridge University Press. DOI: https://doi.org/10.1017/9781009024723

Hyland, K. (2006). English for academic purposes: An advanced resource book. Routledge. DOI: https://doi.org/10.4324/9780203006603

Hyland, K. (2019). Second language writing. Cambridge University Press. DOI: https://doi.org/10.1017/9781108635547

Hyland, K., & Jiang, F. K. (2017). Is academic writing becoming more informal? English for Specific Purposes, 45, 40–51. https://doi.org/10.1016/j.esp.2016.09.001 DOI: https://doi.org/10.1016/j.esp.2016.09.001

Jönsson, A., Balan, A., & Hartell, E. (2021). Analytic or holistic? A study about how to increase the agreement in teachers’ grading. Assessment in Education: Principles, Policy & Practice, 28(3), 212–227.https://doi.org/10.1080/0969594X.2021.1884041 DOI: https://doi.org/10.1080/0969594X.2021.1884041

Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36. https://doi.org/10.1007/BF02291575 DOI: https://doi.org/10.1007/BF02291575

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000 DOI: https://doi.org/10.1111/jedm.12000

Knoch, U., & Elder, C. (2010). Validity and fairness implications of varying time conditions on a diagnostic test of academic English writing proficiency. System, 38(1), 63–74. https://doi.org/10.1016/J.system.2009.12.006 DOI: https://doi.org/10.1016/j.system.2009.12.006

Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275–304. https://doi.org/10.1177/0265532208101008 DOI: https://doi.org/10.1177/0265532208101008

Knoch, U., & Macqueen, S. (2020). Assessing English for professional purposes. Routledge. DOI: https://doi.org/10.4324/9780429340383

Korolyova, L. Y. (2017). The discursive approach to developing tests for ESP assessment in higher educational institutions. Educational Studies Moscow, 2017(4), 167–172. https://doi.org/10.17277/voprosy.2017.04 DOI: https://doi.org/10.17277/voprosy.2017.04.pp.167-172

MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4(1), 84–99. https://doi.org/10.1037/1082-989X.4.1.84 DOI: https://doi.org/10.1037//1082-989X.4.1.84

Malmström, H., Pecorari, D., & Shaw, P. (2018). Words for what? Contrasting university students’ receptive and productive academic vocabulary needs. English for Specific Purposes, 50, 28–39. https://doi.org/10.1016/j.esp.2017.11.002 DOI: https://doi.org/10.1016/j.esp.2017.11.002

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan.

Naqvi, S., Srivastava, R., Al Damen, T., Al Aufi, A., Al Amri, A., & Al Adawi, S. (2023). Establishing reliability and validity of an online placement test in an Omani higher education institution. Languages, 8(1), Article 61. https://doi.org/10.3390/languages8010061 DOI: https://doi.org/10.3390/languages8010061

Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? Critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147 DOI: https://doi.org/10.1002/nur.20147

Quah, B., Zheng, L., Sng, T. J. H., Yong, C. W. & Islam, I. (2024). Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations. BMC Medical Education, 24, Article 962. https://doi.org/10.1186/s12909-024-05881-6 DOI: https://doi.org/10.1186/s12909-024-05881-6

Rachmawati, D. L., & Hastari, S. (2022). Formative assessment as an innovative strategy to develop ESP students’ writing skills. Voices of English Language Education Society, 6(1), 78–90. https://doi.org/10.29408/veles.v6i1.5174 DOI: https://doi.org/10.29408/veles.v6i1.5174

Rahman, K. A., Seraj, P. M. I., Hasan, M. K., & Rahman, M. M. (2021). Washback of assessment on English teaching-learning practice at secondary schools. Language Testing in Asia, 11(1), Article 12. https://doi.org/10.1186/s40468-021-00129-2 DOI: https://doi.org/10.1186/s40468-021-00129-2

Ramineni, C., & Williamson, D. (2018). Understanding mean score differences between the e-rater® automated scoring engine and humans for demographically based groups in the GRE® general test. ETS Research Report Series, 2018(1), 1–31. https://doi.org/10.1002/ets2.12192 DOI: https://doi.org/10.1002/ets2.12192

Rashtchi, M., & Khoshnevisan, B. (2020). Lessons from critical thinking: How to promote thinking skills in EFL writing classes. European Journal of Foreign Language Teaching, 5(1), Article 3153. https://doi.org/10.46827/ejfl.v5i1.3153 DOI: https://doi.org/10.46827/ejfl.v5i1.3153

Sèna, U. O. (2022). Appraising the challenges related to the teaching of ESP to advanced learners in Beninese higher education. Journal of English Language and Literature, 9(1), 118–136. https://doi.org/10.54513/JOELL.2022.9114 DOI: https://doi.org/10.54513/JOELL.2022.9114

Stevens, J. P. (2012). Applied multivariate statistics for the social sciences. Taylor & Francis. https://doi.org/10.4324/9780203843130 DOI: https://doi.org/10.4324/9780203843130

Swales, J. M. (1990). Genre analysis: English in academic and research settings. Cambridge University Press.

Tabachnick, B. G., Fidell, L. S., & Ullman, J. B. (2019). Using multivariate statistics (7th ed.). Pearson.

Tomas, C., Whitt, E., Lavelle-Hill, R., & Severn, K. (2019). Modeling holistic marks with analytic rubrics. Frontiers in Education, 4, Article 89. https://doi.org/10.3389/feduc.2019.00089 DOI: https://doi.org/10.3389/feduc.2019.00089

Wang, Q., & Gayed, J. M. (2024). Effectiveness of large language models in automated evaluation of argumentative essays: Finetuning vs. zero-shot prompting. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2024.2371395 DOI: https://doi.org/10.1080/09588221.2024.2371395

Weigle, S. C. (2002). Assessing writing. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511732997

Williamson, D. M., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13. https://doi.org/10.1111/j.1745-3992.2011.00223.x DOI: https://doi.org/10.1111/j.1745-3992.2011.00223.x

Winke, P., & Lim, H. (2017). The effects of test preparation on second-language listening test performance. Language Assessment Quarterly, 14(4), 380–397. https://doi.org/10.1080/15434303.2017.1399396 DOI: https://doi.org/10.1080/15434303.2017.1399396

Article Sidebar

Main Article Content

Abstract

Article Details

References