I began a new research partnership this week! It is about Brazilian Cliometrics, on which my lead is Renato Colistete, and he has an interesting blog, with a lot of references to historical data beyond the usual World Bank and other mainstream datasets that only have resolution for developed countries and post-WW2.

We had an long talk on this Wednesday, and it was clear that the topic has a lot of issues which are both technically challenging in an multidisciplinary fashion as well as there is a lot of results that could be game-changing when it comes to our understanding of our history.

The motivation for this collaboration is a lot related with my old desire of developing research associated with the new area of Cliodynamics, on which one of the most influential figures is Peter Turchin. Those who know me closely knows how his research program has been influential in shaping my worldview and for what we need in science in order to have an truly sustainable society, in an holistic and systemic view.

Having said that, the scientific inquire and its related need for falsifiability needs data for advancing or discarding hypothesis and models, and the main issue when testing cliodynamic models is mostly data issues, specially in the Brazilian context, where our readily available historical datasets have an quite short lifespan.

Renato’s research then is about creating new historical datasets based on digitized and non-digitized archives. His main declared focus is to reconstruct the economical history of Brazil, but as Renato formation as a historian reveals, he has a lot of interest on reconstructing history in an rigorous and bias-free manner - which brings he to investigate the educational and social history in an quantitative manner too.

The sum of Renato’s programme and interests has made me conclude that an partnership with him could bring great dividends to the production of what I believe now that it could be the most meaningful knowledge for our future times, and I’m very happy to find him.

My first homework then is going to analyze the applicability of OCR to some digitized data which contains rich insights into Brazilian history: the Census of 1920 - the first one which contains info about the respondents color and wages. Self-explanatory.

Also, I’m going to do an exploratory study into 19th century Brazilians newspapers when it comes to word frequency. This kind of analysis is used a lot for extracting the “sentiment of the epoch”.