"all models of language are wrong but some are useful"
"Once the research question is defined and the researcher is performing measurement, we can assess how accurate a method is at recovering the specific concept of interest to the researcher. Because there were many different things about the document that the researcher could have chosen to measure, we do not assume there is a single model that captures a true data generating process, but hopefully there are options that help us capture what we need for our research question. In other words, we are agnostic about which particular model is used for measurement, as long as the model can accurately and reliably measure the concept of interest."
(Grimmer et al 2022, 19)
→ How do you make sure you have a useful model?Know your data
- the better you know your data, the easier it is to judge whether a model works
Know your model
see what is behind your model predictions e.g.
which features does a dictionary actually pick up?
Which features are most predictive in a machine learning model?
how does your complex model classify example sentences?
use quantitative and qualitative validation techniques and present the evidence
Explain your model
make sure you understand what is happening
use only as much complexity as needed for addressing a task
→ no inherent benefits of more complex models, though performance statistics for some tasks have become very impressive!
Upsides of complex models
- significantly improved performance for many tasks
Downsides
understandability - Do you understand BERT? Does your reviewer?
complexity - hidden biases
environmental impact and computational power
- change in scale between word embeddings and transformer models
- "Training a single BERT base model (without hyperparameter tuning) on GPUs was estimated to require as much energy as a trans-American flight." (Bender et al)