Data

60 mins

A Gentle Introduction to Scraping Data

School of Data

Abstract:: What happens when the database you need to help your advocacy or social development project is a jumbled mess? No download button, no CSV file, no structured dataset. You see the information you want locked in PDF reports, in social networks (like Twitter and Instagram), or even in webpages, but you can't really do anything with it. No more! We will show you how to quickly and easily extract information from these non-structured sources into useful datasets. You will get data you thought were never accessible before, giving your projects a new level of refinement and relevance.
About this course:: This course is designed for human rights activists and journalists who would like to use data to support their advocacy work or to tell stories. You will learn the fundamental concepts of scraping and discover how to use free and easy-to-use tools to scrape data from web pages (using Google Sheets and a web browser extension called Web Scraper), social networks like Twitter and Instagram (using a web service called IFTTT) and PDF files (using both a web service called Abby Fine Reader and Tabula, a free application you can download to your computer made by journalists).
What do I learn:: By the end of the course you will have a basic understanding of what scraping is and be able to perform basic scraping routines in web pages, social networks and PDF files. You will be able to get data from places not traditionally available to people without programming skills and this will broaden the spectrum of your data collection efforts, giving more juice to your advocacy, journalistic or social development projects.
What do I need to know:: This course is suitable for anyone who completed School of Data's Data Analysis & Data Gathering courses. It requires you to have some familiarity with basic data concepts, such as types of data and how a dataset is organised. You will need an internet connection, a computer and you will be asked to create accounts in a few web services, such as Google Spreadsheets, Twitter, Instagram and IFTTT. You don't need any coding, special technical skills or advanced knowledge of how to work on spreadsheets.

Trainers

Marco Túlio Pires

Marco Túlio Pires is Google News Lab’s Lead for Brazil and Latin America. He was previously the School of Data’s Programme Manager. And, has worked at the intersection of computer science, journalism and education. Marco has helped newsrooms and students in multiple countries around the world to become more data literate.

Related courses

90 mins
Data
Cleaning and Analysing Data
September 16, 2015
School of Data
90 mins
Data
Cleaning and Analysing Data
September 16, 2015
School of Data
60 mins
Data
Data Gathering for Beginners
September 16, 2015
School of Data
60 mins
Data
Data Gathering for Beginners
September 16, 2015
School of Data

A Gentle Introduction to Scraping Data

Course description

Trainers

Marco Túlio Pires

Course index

1. Scraping, an introduction

2. Unlocking data from PDF files

3. Scraping data from Twitter & Instagram

4. Uncovering the secrets of web pages

5. Scraping webpages, Part 1

6. Scraping webpages, Part 2

7. Conclusion

Related courses

Cleaning and Analysing Data

Cleaning and Analysing Data

Data Gathering for Beginners

Data Gathering for Beginners

Suggested reading

Blog

Blog

Blog

Blog

Blog

Blog

A Gentle Introduction to Scraping Data

.css-139ygr8{width:100%;max-width:1rem;margin-top:0.25rem;fill:#000000;}.css-136uoq9{-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);transform:rotate(90deg);width:100%;max-width:1rem;margin-top:0.25rem;fill:#000000;}Course description

Trainers

Marco Túlio Pires

Course index

1. Scraping, an introduction

2. Unlocking data from PDF files

3. Scraping data from Twitter & Instagram

4. Uncovering the secrets of web pages

5. Scraping webpages, Part 1

6. Scraping webpages, Part 2

7. Conclusion

Related courses

Cleaning and Analysing Data

Cleaning and Analysing Data

Data Gathering for Beginners

Data Gathering for Beginners

Suggested reading

Blog

Blog

Blog

Blog

Blog

Blog

Course description