Data

60 mins

A Gentle Introduction to Scraping Data

School of Data
Abstract:
What happens when the database you need to help your advocacy or social development project is a jumbled mess? No download button, no CSV file, no structured dataset. You see the information you want locked in PDF reports, in social networks (like Twitter and Instagram), or even in webpages, but you can't really do anything with it. No more! We will show you how to quickly and easily extract information from these non-structured sources into useful datasets. You will get data you thought were never accessible before, giving your projects a new level of refinement and relevance.
About this course:
This course is designed for human rights activists and journalists who would like to use data to support their advocacy work or to tell stories. You will learn the fundamental concepts of scraping and discover how to use free and easy-to-use tools to scrape data from web pages (using Google Sheets and a web browser extension called Web Scraper), social networks like Twitter and Instagram (using a web service called IFTTT) and PDF files (using both a web service called Abby Fine Reader and Tabula, a free application you can download to your computer made by journalists).
What do I learn:
By the end of the course you will have a basic understanding of what scraping is and be able to perform basic scraping routines in web pages, social networks and PDF files. You will be able to get data from places not traditionally available to people without programming skills and this will broaden the spectrum of your data collection efforts, giving more juice to your advocacy, journalistic or social development projects.
What do I need to know:
This course is suitable for anyone who completed School of Data's Data Analysis & Data Gathering courses. It requires you to have some familiarity with basic data concepts, such as types of data and how a dataset is organised. You will need an internet connection, a computer and you will be asked to create accounts in a few web services, such as Google Spreadsheets, Twitter, Instagram and IFTTT. You don't need any coding, special technical skills or advanced knowledge of how to work on spreadsheets.

Trainers

Marco Túlio Pires

Marco Túlio Pires is Google News Lab’s Lead for Brazil and Latin America. He was previously the School of Data’s Programme Manager. And, has worked at the intersection of computer science, journalism and education. Marco has helped newsrooms and students in multiple countries around the world to become more data literate.

1.1 Introduction to the course
1.2 What is scraping?
1.3 My first scraper: Me!
1.4 Quiz
2.1 Introduction
2.2 Using ABBY FineReader Online to extract data from PDFs
2.3 Using Tabula to extract tabular data from PDFs
2.4 My second scraper: unlocking PDF files!
2.5 Quiz
3.1 Introduction
3.2 Scraping Twitter data using IFTTT
3.3 My third scraper: Twitter & Instagram
4.1 Introduction
4.2 A brief introduction to HTML
4.3 Using the webinspector
4.4 Quiz
5.1 Introduction
5.2 Using Google Sheets formulas to scrape data
5.3 Scraping data from webpages using Google Sheets
5.4 Scraping wikipedia
6.1 Introduction
6.2 Scraping data from webpages using Web Scraper
6.3 Web scraping
7.1 Wrap up video

Related courses

  • 90 mins

    Data

    Cleaning and Analysing Data

    School of Data

    90 mins

    Data

    Cleaning and Analysing Data

    School of Data
  • 60 mins

    Data

    Data Gathering for Beginners

    School of Data

    60 mins

    Data

    Data Gathering for Beginners

    School of Data

Suggested reading

Skip to navigation
0
0
  • Privacy
  • Terms