A Data Science Glossary: Terms you need to know

Oct. 25, 2017

Data science is helping journalism tell intricate and complicated stories more than ever. Before journalists were writing what they felt was important, but now data scientists, developers, designers and journalists are working as a team to make the best out of data storytelling. This is of course a valuable resource and a critical force for social change. 

That’s why we decided to make you a glossary so you can get to know the basic words in data science. 

1. Data scraping: Scraping data involves getting data from interfaces that were designed for human interaction and translating them into a machine-readable format. 

2. Cleaning data: Data cleansing is the effort to improve the overall quality of data by removing or correcting inaccurate, incomplete, or irrelevant data from a data system. 

3. Data visualisation: Data visualisation is a general term that describes an effort to help people understand the significance of data by placing it in a visual context. Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognised easier with data visualisation.

4. Data analytics: Data analytics is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialised systems and software. 

5. Metadata: Metadata is data about data. It represents behind-the-scenes information that is used everywhere, by every industry, in multiple ways. Metadata can be found in information systems, social media, websites, software, images, music services, online retailing and more.

6. Data scientist: A data scientist is an expert in extracting insights and value from data. This role involves using skills in analytics, computer science, mathematics, statistics, creativity, data visualisation and communication as well as business and strategy. 

7. Correlation: Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. Sometimes confusion exists between correlation with causation. Remember, just because two things correlate does not mean one causes the other.

8. Big data:  We use big data as a term to describe a collection of data sets so large and complex that it becomes difficult to process using basic database management tools or traditional data processing applications.

9. Data mining: Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. It uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. 

10. Normalisation: Normalising data involves eliminating the units of measurement for data, enabling you to more easily compare data from different places.

What about you? Have you heard about these terms? Are there any other words you want us to add to our glossary? You can tweet us to @Advocassembly

Are you interested in understanding the basics of data science? You can sign up to our free mini-courses from The School of Data today!