Using Multiple Factor Analysis for Historical Research
In humanities research projects, big data is the next big thing. Where I study, at UCL, there is an entire department devoted to digital humanities. I don’t fully understand the relationship between the DH department and the history and German departments – all I know is that I want to experiment with big data methodologies in my own research. And so, I am making charts on Excel and used RStudio to run analyses and generate more charts.
I study memoirs written by members of the Hitler Youth generation. Throughout the course of my research, I’ve noticed differences between the sexes, times, and geographical locations in which these memoirs were written. I wanted to test my findings using data analysis. As Christof Schöch in ‘Big? Smart? Clean? Data in the Humanities’ states, the purpose behind digital humanities is to make the presentation of humanities data more objective – using the scientific methodology and statistical analysis. Digital tools allow us to reconfigure the data, presenting it in clusters.
The example Schöch provides is interesting, although it is a sciences-based example, as opposed to one from the humanities. Regrettably, the dataset is not available on the website, nor is the actual code.
I spent a harrowing evening watching over an hour’s worth of YouTube videos on RStudio and Cluster analysis. It didn’t seem particularly relevant; the examples were largely from the biological sciences, involving biochemical datasets. The datasets made no sense, and I couldn’t figure out why cos2 was needed.
Building my own matrix and running Factor Analysis for Mixed Data (FAMD) – or Multiple Factor Analysis (MFA) – required the expertise of my bioinformation friend and fellow HTTPer, Richard. This, I suspect, is the fatal flaw of the humanities and the push towards big data: if you aren’t aware of FAMD or MFA, why would you even try this methodology? More importantly, running these programmes requires proper training – training that you undertake, ideally, before Master’s-level research.
As I’ve added to the matrix table I created, a few interesting glitches have emerged. FAMD grouped my memoirs in startling new ways; it drew connections between pieces of writing that, through traditional close-reading, I never would have seen.
It became apparent that separations by year are also important. This was good news, as I had already made that judgement call, based on my traditional close-readings of the texts.
The dataset is based upon the readings I did for a specific dissertation chapter, and so it is necessarily biased. Essentially, the dataset is a subjective selection done by me, a human being. Perhaps if all my primary sources were pushed through quantitative textual analysis, the results would be objective.
Still, my results will help me in writing my chapter on family life. In that section, I will compare the Hitler Youths’ memories of their parents’ political leanings.
If I’ve learnt anything from this big data exercise, it’s that academic researches ought to have formal DH training, starting in undergrad. My time being limited by funding, I wish I had learnt RStudio at an earlier stage. I had no idea it would be so useful in historical research.