Data collection
Since a text scanning software for the Vietnamese language was unavailable at the time, all the children’s books in the Vietnamese children’s literature corpus were typed into a word processor and saved as text files. (See Vietnamese children’s literature corpus for a complete list of books). I used all Vietnamese picture books available to me (excluding chapter books and comics), which consisted of more than 350 books borrowed from elementary school and public libraries and purchased from bookstores in Viet Nam. All texts were typed in Vietnamese using VPS Keys to type Vietnamese fonts. For more information about VPS Keys by the Vietnamese Professional Society © 1993-2003, please refer to VPSKEYS Software.
The newspaper articles in the Vietnamese newspaper corpus were all collected from online sources. My intention was to gather articles published in Viet Nam as well as in the United States since applications of CVT primarily target Vietnamese American populations. Articles from a variety of categories were selected to elicit a broad representation of daily language use. (See Vietnamese newspaper corpus for a detailed description of newspaper categories). Online articles were copied and pasted onto a word processor and saved as text files.
Data analysis
A concordance software program, MonoConc Pro 2.2 © 1996, 2004 Michael Barlow, was used to analyze the data. Although MonoConc Pro 2.2 has the capability to read multiple languages, it has yet to be programmed for the Vietnamese language. Therefore, I needed to format and code language-specific fonts such as tone markers and vowels to be read by MonoConc Pro 2.2. Click here to see the font coding system created specifically for this project. For more information about MonoConc software, please refer to www.athel.com or write to info@athel.com
Font coding system
Letter | Replaced with |
ả | a3 |
ạ | a5 |
ấ | â1 |
ầ | â2 |
ẩ | â3 |
ẫ | â4 |
ậ | â5 |
ă | a( |
ắ | a(1 |
ằ | a(2 |
ẳ | a(3 |
ẵ | a(4 |
ặ | a(5 |
ế | ê1 |
ề | ê2 |
ể | ê3 |
ễ | ê4 |
ệ | ê5 |
ẻ | e3 |
ẽ | e4 |
ẹ | e5 |
ỉ | i3 |
ĩ | i4 |
ị | i5 |
ỏ | o3 |
ọ | o5 |
ố | ô1 |
ồ | ô2 |
ổ | ô3 |
ỗ | ô4 |
ộ | ô5 |
ơ | o’ |
ớ | o’1 |
ờ | o’2 |
ở | o’3 |
ỡ | o’4 |
ợ | o’5 |
Ở | O’3 |
ủ | u3 |
ũ | u4 |
ụ | u5 |
Ủ | U3 |
ư | u’ |
ứ | u’1 |
ừ | u’2 |
ử | u’3 |
ữ | u’4 |
ự | u’5 |
đ | dd |
ỳ | y2 |
ỷ | y3 |
ỹ | y4 |
ỵ | y5 |