Malaysia New Zealand the Philippines Singapore South Africa Sri Lanka the United States Tanzania and Trinidad and Tobago. The corpus design is identical and therefore full comparability can be achieved across the different data sets. Each individual country s corpus consists of 500 data samples. Three hundred instances should be of speech and 200 instances should be of writing. A wide range of samples should be collected by research teams from a range of different contexts to give good coverage of language usage in a variety of situations. Each corpus should be composed of private and public dialogue as well as scripted and non-scripted monologues. In terms of written material this should consist of the following non-printed non-professional writing including student work printed materials including academic writing non-academic writing reportage news reports instructional writing including administrative writing and writing on skills and hobbies persuasive writing Several sets of corpora which have already been completed are freely available online for research and teaching purposes see this book s website for further details . A range of different levels of linguistic detail can be described and compared across the corpora due to the systematic manner in which the data have been collected researchers can examine and compare morphology lexis grammar syntax discourse and pragmatics. The results that emerge from this project will represent a big step forward to improve our knowledge of World Englishes varieties and thus improve our abilities to describe compare and codify varieties of world Englishes. This could eventually lead to outcomes such as a suite of new teaching materials being developed using authentic language data. Alongside these highly effective attempts at data uniformity it is important also to bear in mind that data collection needs to be viewed in the light of the particular historical time period in which it