Machine Learning & Data Science

5 Essential Books to Improve Your Skills in Data Science and Machine Learning

Five opensource books you need to read to improve your skills in Machine Learning and Data Science

Syam Kakarla
The Startup
Published in
7 min readNov 24, 2020

--

Photo by Jaredd Craig on Unsplash

In this article, you will get to know about 5 open source books that you must read to start your career or to improve your skills in Data Science and Machine Learning.

The annual Stack Overflow survey provides comprehensive information with the representation from a great diversity of programmers and developers across the globe, with this year’s poll being taken by nearly65,000 people. This year’s survey details which languages developers enjoy using, which are associated with the best-paid jobs, which are most commonly used, as well as developers’ preferred frameworks, databases, and integrated development environments.

We have not seen a technology that largely grows so fast ever, in the history of Stack Overflow .

— Julia Silge, Data Scientist at Stack Overflow

Out of which, Python versatility continues to fuel its rise through Stack Overflow’s rankings for the “most popular” languages, which lists the languages most widely used by developers. This year’s survey finds Python to be the fastest-growing major programming language, with Python edging out Android and enterprise workhorse Java to become the fourth most commonly used language.

Python provides high flexibility which enables programmers to solve any given problem in a short span. Most of the Data Scientists and Machine Learning Engineers use python to create machine learning algorithms, perform data mining, build data models, create web services, and classify data sets.

From creating an efficient model that can reduce the complexity of big data sets, produce results in a shorter span and remain affordable as well as implement them using machine learning, Python is highly important in the field of data science.

So, the first book focuses on building fundamentals and improving knowledge of practical programming with python.

1. Automate the Boring Stuff with Python

Source - ABSP

This book covers the key concepts with practical programming which makes it easier to understand python.

The best part of programming is the triumph of seeing the machine do something useful. Automate the Boring Stuff with Python frames all of the programming as these small triumphs; it makes the boring fun.

Hilary Mason, Founder of Fast Forward Labs and Data Scientist in Residence at Accel

The book is divided into two parts. Part one comprises the basics of python programming which covers the concepts such as

  • Flow Control
  • Lists
  • Dictionaries
  • Structuring Data and
  • String Manipulation

Part two incorporates Practical programming with automation, The second part of the books covers

  • Pattern Matching with Regular Expressions
  • File Handling
  • File Organizing
  • Debugging
  • Web Scraping
  • Working with Excel Spreadsheets, Portable Document Files(PDF), Word Documents, Comma Separated Value (CSV) files, and JavaScript Object Notation (JSON) files, Manipulating Images.
  • Automating Python Programs and
  • Graphical User Interface (GUI) Automation

OpenSource

The book “Automate the Boring Stuff with Python” is available freely as a web copy through their official website and also as a PDF version in GitHub. Use the below official link to read the book.

2. Mathematics for Machine Learning

Source — MML

This book covers the fundamental mathematical tools needed to understand machine learning include Linear Algebra, Analytic Geometry, Matrix Decompositions, Vector Calculus, Optimization, Probability, and Statistics. This self-contained textbook bridges the gap between Mathematics and Machine Learning Texts, introducing the mathematical concepts with a minimum of prerequisites.

It uses these concepts to derive four central machine learning methods: Linear Regression, Principal Component Analysis, Gaussian Mixture Models, and Support Vector Machine (SVM). For students and others with a Mathematical background, these derivations provide a starting point to Machine Learning texts. For those learning mathematics for the first time, the methods help build intuition and practical experience with applying Mathematical Concepts.

The book is divided into two parts, part one covers the mathematical foundations of Machine Learning such as

  1. Introduction and Motivation
  2. Linear Algebra
  3. Analytic Geometry
  4. Matrix Decompositions
  5. Vector Calculus
  6. Probability and Distribution
  7. Continuous Optimization

Part two cover the classic machine learning problems such as

  1. When Models Meet Data
  2. Linear Regression
  3. Dimensionality Reduction with Principal Component Analysis
  4. Density Estimation with Gaussian Mixture Models
  5. Classification with Support Vector Machines

OpenSource

The book is openly available on their official GitHub page and the PDF version is available to download. Use the below link to read the book.

3. Hands-On Machine Learning with Scikit-Learn and TensorFlow

Source — HandsonML

Hands-On Machine Learning with Scikit-Learn and TensorFlow is one of the popular books which incorporate the concepts of machine learning with hands-on examples using Scikit-Learn and TensorFlow.

This book helps to understand a wide range of techniques, starting with simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you’ve learned. The key concepts of the book are:

  • Explore the machine learning landscape, particularly neural nets
  • Use Scikit-Learn to track an example machine-learning project end-to-end
  • Explore several training models, including support vector machines, decision trees, random forests, and ensemble methods
  • Use the TensorFlow library to build and train neural nets
  • Dive into neural net architectures, including convolutional nets, recurrent nets, and deep reinforcement learning
  • Learn techniques for training and scaling deep neural nets

Opensource

The book is not openly available to download but the hands-on exercises provided in the book are open-sourced on GitHub. Here is the link to the author’s GitHub page.

4. Practical Statistics for Data Scientists

Source — gdeck

The book incorporates statistics and machine learning along with the HandsOn exercises in both Python and R. The book covers topics such as

  1. Exploratory Data Analysis
  2. Data and Sampling Distributions
  3. Statistical Experiments and Significance Testing
  4. Regression and Prediction
  5. Classification
  6. Statistical Machine Learning
  7. Unsupervised Learning

Opensource

The book is not openly available to download but the hands-on exercises both in Python and R provided in the book are open-sourced on GitHub. Here is the link to the author’s GitHub page.

5. Deep Learning for Coders with Fastai and PyTorch

PyTorch is one of the fastest-growing open-source Machine Learning libraries based on Torch used for Computer Vision, Natural Language Processing, e.t.c.

FastAI is an entirely new framework built on top of PyTorch. This library provides easier API access to a variety of machine learning related functionality, especially when it comes to neural networks. Much of this aspect of the library sits atop PyTorch, making the creation of neural networks with this lower level library easier and flexible for machine learning coders of all skill levels.

Source — Orielly

This book is the best choice if you want to learn by the top-down approach, the covers a wide range of topics from Computer Vision to Recommendation Systems. Here is a glimpse of the contents of the book.

  1. Your Deep Learning Journey
  2. From Model to Production
  3. Data Ethics
  4. Under the Hood: Training a Digit Classifier
  5. Image Classification
  6. Other Computer Vision Problems
  7. Training a State-of-the-Art Model
  8. Collaborative Filtering Deep Dive
  9. Tabular Modeling Deep Dive
  10. Data Munging with fastai’s Mid-Level API
  11. A Language Model from Scratch
  12. Convolutional Neural Networks
  13. ResNets
  14. Application Architectures Deep Dive
  15. The Training Process
  16. A Neural Net from the Foundations
  17. CNN Interpretation with CAM
  18. A fastai Learner from Scratch
  19. Concluding Thoughts

Opensource

However, the book is not available for free but we have a better chance of learning through the video lectures provided by the author “Jeremy Howard” though FastAI courses and the notebooks are available in GitHub. Here are the links:

Conclusion

This article covered the five essential books that you must consider to read to excel in Machine Learning and Data Science. All the mentioned books are freely accessible. Hope these books will help you to take a further step in your Machine Learning and Data Science career.

Here are some of the articles on Machine Learning and Data Science you may find interest.

--

--

Syam Kakarla
The Startup

Data Engineer, Machine Learning Practitioner and Data Science Enthusiast, https://www.linkedin.com/in/syam-kakarla/