Corpora of Vietnamese Texts (CVT)

Introduction

The Corpora of Vietnamese Texts was completed by Giang Pham (formerly Giang Tang) under the supervision of Kathryn Kohnert, Ph.D. CCC-SLP. Funding was provided by the Graduate Research Partnership Program in the Department of Speech-Language-Hearing Sciences at the University of Minnestoa.

Please cite this work using the following reference:

Pham, G., Kohnert, K., & Carney, E. (2008). Corpora of Vietnamese Texts: Lexical Effects of Intended Audience and Publication Place. Behavior Research Methods, 40, 154-163.

Acknowledgments

I would like to send my deepest gratitude and admiration to Dr. Kathryn Kohnert for her guidance and support throughout this research project. I am very thankful for my friend and colleague, Nguyễn Hải Anh, who spent countless hours purchasing children’s books while in Vietnam, borrowing books from her school library, and typing and proofreading over 350 texts. I am grateful to my parents, Tăng Tiến Đức and Tăng Trần Xuân, for all their support in all my work, and in particular, for their assistance in creating this website as well as typing and proofreading over 50  texts.  I appreciate Nguyễn Hoàng Nam for his technical support on the Research section of this website as well as for designing and creating the entire section on Clinical Materials. I would like to thank Hillcrest Elementary School in Orlando, FL, for allowing me to borrow more than 200 Vietnamese children’s books from their library to complete this project.

Thank you to all those who volunteered to type texts, especially Phạm Đức Tiến and Nguyễn Hoa. Many thanks to Pui Fong Kan, Mahmoud Sadrai, and Bryan Gordon for technical advise related to corpus linguistics. Thanks to Nguyễn Hoàng Nam and Trần Lọc for helpful suggestions about Vietnamese newspaper selection.

Summary

The following table summarizes the composition of the Corpora of Vietnamese Texts.

Composition of CVT

CorpusSourcePublished# words
1. Children’s literature78 booksAbroad42,690
279 booksVN161,793
SUBTOTAL 204,443
2. Newspaper articlesThanh NiênVN114,099
Tuổi TrẻVN151,183
VNNUSA542,834
VOAUSA43,058
SUBTOTAL 851,174
TOTAL WORDS  1,055,617
Scroll to Top