Wikiaves data now on Kaggle!

An very nowadays subject is about Citizen Science: the act of getting voluntarily data from the ordinary citzen which varies from hobbyists to technicians, and from experts to the curious, and Bird-watching is one subject where this concept has been present since its inception.

The main repository when it comes to Brazilian birdwatching is Wikiaves: An open and collaborative repository with social elements where you upload your observations together with photos and audios.

Content-wise, it has about 3M registries and 36k users, which consolidates it as the biggest voluntary citzen science-like repository on Brazil, however an essential ingredient is missing for it to be truly an tool for Science: the easiness and accessibility to harness the available data for using on the scientific knowledge scalpel, which is data analysis towards hypothesis testing.

Wikiaves doesn’t allow you to download the available data in an analytic-friendly way, providing only summaries which aren’t useful at all for scientific hypothesis testing, except for an very macroscopic overview and contextualization. Not all is lost however, as everything that is public can also be publicly downloaded through the use of web scraping.

I’ll save the pains of explaining it, but this has been done through some clever code and downloaded in 8h by paralyzing everything that could be done. And all the metadata together with some example analysis code is on Kaggle: https://www.kaggle.com/danlessa/brazilian-bird-observation-metadata-from-wikiaves

The code for scrapping is also available at an repository at: https://gitlab.com/danlessa/wikiaves

What can be done with all this data then?

Some figuresuggestions:

enter image description here

enter image description here

enter image description here