English database hits 1 billion words

A language research database that brought words such as "podcast" and "celebutante" to the pages of the Oxford dictionaries has officially hit a total 1 billion words, researchers at Oxford University Press say.

    Drawing on sources including weblogs, chatrooms, newspapers, magazines and fiction, the Oxford English Corpus spots emerging trends in language usage to help guide lexicographers when composing the most recent editions of dictionaries. 

     

    The Oxford University Press publishes the Oxford English Dictionary, considered the most comprehensive dictionary of the language, which in its most recent edition in August added words such as "supersize", "wiki" and "retail politics" on its pages.

     

    Catherine Soanes, the Oxford University Press lexicographer,  said the database was not a collection of a billion different words, but of sentences and other examples of the usage and spelling.

     

    Wiki:

    comes from the Hawaian word "Wiki Wiki" (quick) and means a collaborative website whose content can be edited by anyone who has access to it

     

    "The Corpus is purely 21st-century English," said Judy Pearsall, the publishing manager of English dictionaries.

     

    "You're looking at current English and seeing what's happening right now. That's language at the cutting edge."

     

    As add-on and hybrid words - such as "geek-chic", "inner-child" or "gabfest" - increase in usage, Pearsall said part of the research project's goal was to identify words that have lasting power.

     

    "English gets really creative, really fun. What we're putting in dictionaries is words that will stick around," she said.

     

    Launched in January 2000, the Oxford English Corpus is part of the world's largest-funded language research project, costing $90,000 to $107,000 per year.

     

    Geek-chic: refers to the popularity of people who are considered to be

    geeks

    (a person often of an intellectual bent who is disapproved of), and the subversion and embracing of normally unpopular characteristics such as

    glasses

    and interests like

    comic books

    and

    computer games

    Over the years, it has helped identify how the spellings of common phrases have changed, such as "fazed by" to "phased by" or "free rein" to "free reign."

     

    "Buck naked" has evolved to "butt naked."

     

    The corpus collects evidence from all the places where English is spoken, including North America, Britain, the Caribbean, Australia or India, to reflect the most current and common usage of the English language. 

    SOURCE: Aljazeera + Agencies


    YOU MIGHT ALSO LIKE

    Interactive: Coding like a girl

    Interactive: Coding like a girl

    What obstacles do young women in technology have to overcome to achieve their dreams? Play this retro game to find out.

    Heron Gate mass eviction: 'We never expected this in Canada'

    Hundreds face mass eviction in Canada's capital

    About 150 homes in one of Ottawa's most diverse and affordable communities are expected to be torn down in coming months

    I remember the day … I designed the Nigerian flag

    I remember the day … I designed the Nigerian flag

    In 1959, a year before Nigeria's independence, a 23-year-old student helped colour the country's identity.