MARCS Research Seminar

Event Name
MARCS Research Seminar
Date
14 June 2013
Time
01:30 pm - 02:30 pm
Location
Bankstown Campus

Address (Room): Building 3, Seminar room 3.G.55

Description
Dr Erich Round from the University of Queensland will be presenting "Preparing the Genome Project of language: foundational issues for data design".

Abstract: Biostatistical methods have dramatically increased our ability to infer the genetic history of earth's species, and since languages also possess a genealogical past, there is a strong impetus to extend these methods from genetic to linguistic data. However, statistical methods place stringent requirements on their input data. Using a papers by Dunn and colleagues (2005, 2007, 2009) in Science, Language and PLoS Biology as a prompt for discussion, I identify issues which linguists, and in particular phonological typologists, will need to grapple with if we are to design datasets that meet the mathematical requirements of biostatistical methods. The questions raised begin to delimit a theory of linguistic data design for a young and expanding field of research. Three foundational, methodological principles are proposed:

1. Use micro variables rather than macro. Answers to ‘macro’ variables in linguistics, e.g. “are there prenasalised stops?” are often arrived at by weighing up answers to multiple, antecedent ‘micro’ questions, e.g. “does [NC] appear word initially?”, “does /NC/ contrast with /N+C/?”, among others. Confusingly, analyses of two languages may value macro variables identically while having none of their micro variables' values in common. Micro variables are more informative.

2. Identify and minimize dependencies. In addition to variables whose definitions contain overt logical dependencies, there can also arise ‘covert’ dependencies in datasets, and their identification may require considerable effort. Macro variables for example may share common underlying micro variables. E.g. “are there prestopped nasals?” and “are there closed syllables?” are both sensitive to micro variables about intervocalic clusters, giving rise to dependencies.

3. Track dependencies. Where dependencies do remain in a dataset, they must be recorded. Doing so enables one to sample subsets of the data which are reliably independent.

Adherence to principles such as these will provide a more mathematically coherent basis for the application of advanced statistical methods to typological linguistic datasets.
Contact
Name: Sonya O'Shanna

s.oshanna@uws.edu.au

School / Department: The MARCS Institute