Welcome to Data Fundamentals!
We’re Colby and Serge, founders of The San Francisco Data Company. We’re on an entrepreneurial endeavor to create the data education that we wish we had when we first started. We power Data Field Guide, a digital community and knowledge base for data enthusiasts, as well as San Francisco Data School, which offers access to premium data tools and content, along with a private Slack community.
Over the next ten days, we’ll explore the most fundamental aspects of data without ever having to employ a data tool! Today, we’ll start with context. Finding a straightforward explanation for the most basic yet fundamental questions about data is surprisingly difficult.
Why Data, Why Now?
As in, what’s with the data craze?
First, the trajectory of the volume and variety of data is absolutely massive. Volume refers to the amount of data being produced. Variety alludes to the types of data being produced. These items, together, are growing at an astounding rate.
Second, cost barriers to store this rising amount of data declines at a rapid pace over time. And the barriers to access this data being stored are also being drastically reduced over time.
Finally, data has the ability to create a tremendous amount of value in our world. Data is and will continue to be the backbone of improving things in both our overall society and within businesses.
So, why data, why now? Because data holds potential. The size of that potential is growing larger and becoming easier to unlock over time.
What Is Data?
This is a big, important question. It’s big in that it’s incredibly open-ended. It’s important in that it’s so fundamental yet so simple in composition that it often gets overlooked.
Practically speaking, data is all the things being recorded with technology. In 1989, Tim Berners-Lee invents what ultimately becomes the World Wide Web, which has since led to the creation of countless technologies that record and store data.
Where Does the Data Come From?
Let’s dive into a few examples:
• Every time a Facebook Like button is clicked, data is being recorded (i.e., stored in a database)—the date, the user account that did the clicking, the actual piece of content that was liked, etc.
• Every time a Wikipedia page is authored or updated, there’s data being logged to account for the edit.
• For technology like 23andMe, each time they run a DNA test, they’re producing and recording data of your ancestry and genetic health risks.
• Blackboard is geared toward helping schools connect teachers, students, and parents. The sharing of grades, uploading of homework, and messages sent all produce data.
• The United States Census and the surveys they conduct help inform us about things such as population growth. Each completed survey is a recording of data.
There are countless examples we could share, but just through these five, the intersection of data and technology is evident.
Who Uses Data?
Looking at our examples again, if we abstract our thinking up one layer, we see Facebook as simply an internet company. Wikipedia is a nonprofit. 23andMe is in the biotechnology industry. Blackboard is used by education systems. The US Census is a branch of the government.
In just five examples, we see that there’s a huge range with respect to who’s using data—from governments to internet companies, nonprofits, and beyond.
What Is Data Used For?
Data is primarily used for three things:
1. Make better decisions.
2. Build better products.
3. Enrich our stories.
Let’s use our US Census survey as an example. This survey data is used by:
• local governments to make decisions regarding public transportation
• national agencies to build models to better predict population trends
• journalists to support their articles and/or documentaries
We’ll stop here for now, as you’ll see these questions continue to come to life throughout the rest of the course. Tomorrow, we’ll discuss data professions!
—Colby and Serge
Dataclysm, in which Christian Rudder uses data to show us who we truly are.
Share with friends