Table of Contents
As Duke Economics professor Dan Ariely once famously said, big data is a lot like teenage sex: everyone talks about it, no one really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.
As you might have heard, or felt in your own recruiting efforts, Data Scientists are a veritable rare bird to find, let alone recruit. However, one could argue that a large reason for this perceived scarcity is the fact that many people think of Data Science as a catch-all role. They are consequently looking for Data Scientists who can build and maintain a data warehouse, can set up a data pipeline for analysis, run analyses that reveal groundbreaking insights every time, then turn around and build resilient production systems running perfectly optimized machine learning algorithms that automate said analyses and predictions, and run them seamlessly every time a customer logs in.
Talk about wishful thinking! Given this job description, it is no wonder that positions remain unfilled for months, and even years at a time, while companies sit on unproductive big data assets that could otherwise give them a solid competitive edge in the marketplace.
On the contrary, in my experience, great data scientists are rarely lone wolves, working in solitude and only emerging occasionally to utter brilliant insights. In most companies that use it successfully, data science is a team sport, where data analysts, engineers and scientists work together.
Here is a brief overview of what each of these professionals do, and how they complement each other.
- Edlitera Corporate Training
- What Is The Difference Between Data Analytics, Predictive Analytics and Data Science?
What Does a Data Analyst Do?
Data analysts are interpreters of structured data. They are spreadsheet whizzes, and write SQL queries to extract data from relational databases. Data analysts can be found in many functional groups within a company, including: finance, marketing, operations, and business intelligence.
While this role does not get nearly the same level of publicity as the more glamorous-sounding data scientist and data engineer, it can be a very fulfilling job in the right company, as well as a gateway to higher-level analytics positions. Data analysts can have a lot of freedom in choosing the directions they take in their analyses, and they often have the opportunity to see their work directly informing management decisions on a daily basis, which can be very satisfying, and a great source of professional pride.
What's more, as a data analyst, you will develop highly transferable skills, which you can apply to many other roles in a variety of industries if you ever wish to change tracks in your career. In addition, this role offers exposure to a variety of tools and analysis techniques, which not only increase your marketability as a data analyst, but are also useful in data science work, which can often be the next step on the career ladder for a seasoned analyst.
Data analysts typically have a bachelor's degree, though that is not always required if you are able to convey skills you have acquired in your previous job experience as relevant to this role.
Article continues below
Want to learn more? Check out some of our courses:
Data Analyst Further Reading
Some good sources for further reading on doing analytics are the following:
Image Source: Amazon.com
(Kinley's and Knaflic's books are available on Kindle Unlimited with a subscription, and Maheshwari's book is only available in ebook format, though it is totally worth a read.)
- Data Analytics for Beginners by Paul Kinley
- Data Analysis Made Accessible by Anil Maheshwari
- Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knaflic
What Does a Data Scientist Do?
It's been said that a data scientist is someone who is better at statistics than any software engineer, and better at software engineering than any statistician. This saying is actually not very far from the truth, in my opinion. However, I would add that a data scientist should also be able to make their results and findings accessible to non-technical audiences, so that business stakeholders can rally around the findings and data products put forth, and see to it that they are used effectively for the benefit of the organization.
In short, data scientists are interpreters of unstructured data. A data scientist is typically able to fetch data from public APIs, integrate heterogeneous data from multiple sources, clean it, and extrapolate from it to fill in missing values. Afterwards, they are able to formulate hypotheses and test them through the use of math, statistics, visualization and predictive modeling. Once they see results, data scientists then communicate them to stakeholders, working with them to translate these results into business action items.
Many data scientists working in the industry have a Ph.D. or other advanced degrees, but I have also met many accomplished data science practitioners who started in the job with only a bachelor’s degree and relevant work experience.
- Why Do Data Scientists Earn Six Figure Salaries?
- Machine Learning Style: Most Common Types of Machine Learning and When to Use Them
- What Does a Data Scientist Do All Day?
Data Scientist Further Reading
In terms of further reading, the following books cover everything from intro to advanced topics in data science. Master the concepts in these three books, and you will know more than 99% of all Data Scientists out there.
Image Source: Amazon
- Data Science from Scratch: First Principles with Python by Joel Grus
- Programming Collective Intelligence by Toby Segaran
- Doing Data Science: Straight Talk from the Frontline by Cathy O'Neil and Rachel Schutt
What Does a Data Engineer Do?
Data engineers are usually data infrastructure engineers who are responsible for building and maintaining the infrastructure that transports and houses big data. A data engineer is the one setting up and configuring a Hadoop cluster, building a Spark Streaming pipeline, or migrating a company’s data assets to a public cloud service such as AWS.
In some companies, Machine Learning engineers are also called data engineers, though role requirements could be vastly different. Most of the data engineers I’ve met have started out as back-end or full stack developers who developed an interest in data technologies and have taught themselves Hadoop, Spark, and AWS before transitioning to data engineering. Advanced degrees are typically not required for this role.
Data Engineer Further Reading
For further reading on data engineering technologies and how to get started with each, here are some books I recommend:
Image Source: Amazon
- Hadoop: The Definitive Guide by Tom White
- Learning Spark: Lightning-Fast Big Data Analysis by Holden Karau et al
- Amazon Web Services in Action by Andreas and Michael Wittig