Linguistics research report

A​‌‍‍‍‌‍‍‌‌‍‍‍‌‍‍‍‍‌‍‍​PA 7th style referencing additional files will be uploaded example researc​‌‍‍‍‌‍‍‌‌‍‍‍‌‍‍‍‍‌‍‍​h report will also be included PLEASE READ ALL ATTACHED FILES, DATA INCLUDE​‌‍‍‍‌‍‍‌‌‍‍‍‌‍‍‍‍‌‍‍​D

Task Files

Column Name Description
TokenN unique number to identify that specific token in the dataset (each row is numbered)
Filename Sydney Speaks ID label that indicates the subcorpus, participant pseudonym and speaker number, and demographic code
Speaker Name of the speaker for easy identification
Community Categorical variable.

Ethnic community background:

Anglo, Chinese-CN, Greek, Italian

Age Speaker’s age as at 2021.
Gender Categorical variable.

Female, Male

beforeMatch transcribed speech that appears before the portion that matches our search criteria (up to 5 words before)
Text stretch of speech that matches our search criteria (1-2 words) as it appears in the transcript – contains symbols such as (!) for noise/overlaps, and pauses
afterMatch transcribed speech that appears after the portion that matches our search criteria (up to 5 words after)
participantIU this is how the speech appears transcribed as a single utterance (IU = intonation unit)
targetWords stretch of speech that matches our search criteria and is of interest to us. It appears in plain text without accompanying symbols, making it easier to sort
linkingR Categorical variable.

Realisation of (r):

present – R

absent – NO

birthYear continuous variable.

participant’s year of birth

postcode postcode of participant’s principal suburb of residence during their life
suburbIndex Continuous variable.

Index of Relative Socio-economic Disadvantage (IRSD) – an index from the 2016 Census (ABS) that measures how relatively disadvantaged a suburb is, based on a number of variables. This is based on the participant’s specific suburb.

Lower values indicate more disadvantaged suburbs, higher values indicate greater wealth/advantage

suburbRank Ranked variable.

Using the IRSD, suburbs fall into ten deciles (1-10).

Ranks are assigned to collapse categories:

1: deciles 1-4 (most disadvantaged suburbs)

2: deciles 5-8

3: deciles 9-10 (least disadvantaged suburbs)

occupation qualitative description of participant’s occupation
AUSEI06 Continuous variable.

Index of occupational socioeconomic status based on the Australian Socioeconomic Index 2006 (McMillan et al., 2009).

Range 0-100 (higher scores indicate greater socioeconomic status, lower scores indicate occupations of lower socioeconomic status).

educationLevel qualitative description of participant’s highest level of education
educationRank Ranked variable.

5-point Sydney Speaks ranking of education level, from 1 (lowest)-5 (highest)

1: High school only

2: Trade, Certificate, TAFE

3: Diploma, or Bachelor in progress (uni students)

4: Bachelor/Honours completed

5: Postgraduate level study

schoolType Categorical variable.

Type of high school attended by the participant

State, Catholic, Independent (including Independent Catholic schools), State-selective

schoolRank Ranked variable.

4-point Sydney Speaks ranking of high school type to denote social prestige (1 = lower social prestige, 4 = highest social prestige)

1: State

2: Catholic (systemic)

3: Independent Catholic, State-selective

4: Independent