Book summary | Machine Learning by Ethem Alpaydin

justgig8

4 min readApr 13, 2023

First, thanks to MIT Press and Ethem Alpaydin for giving us this gem. Here we go.

Chapter 1

ML talks about a program that learns from data acquires intelligence beyond that of the programmer
Learning from data is not new, it is at the heart of science. Scientists used to make observations, collect data, then come up with laws that explain those data
We are at a point where we want to automate this process of going from data to knowledge, because now we have much more data
Inference of a hidden model (namely the hidden factors and their interaction) from the observed data is at the core of machine learning
Personal computers became popular in 1980s, allowing our lives to be recorded digitally. A significant step as this is data we can analyze and learn from
Then came smartphones, rarely used as phones now but much more. A smartphone is a mobile sensor that makes us detectable, traceable and recordable
Then there is ubiquitous computing i.e. using computers without knowing we are using one
Furthermore if these computers are online, we call them smart objects or talk about Internet of Things
Thousands of years ago, you needed to be a god or goddess if you wanted to be painted, sculpted or have your story remembered. Hundreds of years ago, you needed to be a king or a queen. Later you could be a rich merchant to do so. Now anybody can
With social media, each of us is now a celebrity whose life is worth following and we are our own paparazzi
With ML, data starts to drive the ops, it defines what to do next
Data mining is one type of ML where large data is mined, processed to construct a model with high predictive accuracy
Data today comes from different modalities — it is multimedia i.e. text, touch, sound, video etc.
Brain still surpasses current engineering products in a few things, for e.g. vision, speech recognition and learning
Neurons in the brain have connections called synapses, to tens of thousands of other neurons and they all operate in parallel
It is believed that both processing and memory are distributed together over the network; processing is done by neurons and memory occurs in the synapses
ML i.e. a computer program that learns. Learnings means getting better with experience. Experience is data collected in past. Better implies a performance criterion that is optimized
A lot of data does not necessarily mean there are underlying rules that can be learned. For e.g. a phone book with people’s names and numbers, using this we can’t predict a new person’s number
Going from particular examples to general concepts is called induction. For e.g. we see many trees at different times in different places, all slightly different from each other, yet at the same time they have something in common which our brain understands.. this commonness is the ‘treeness’ using which our brain can see a new object and identify whether it is a tree or not
Neural network research which later led to ML started in the 1980s due to advances in VLSI technology

Chapter 2

No matter how many properties we list as input, there are always other factors that affect the output; we can not possibly record and take all of them as input, and all these other factors that we neglect introduce uncertainty
One of the most critical points in learning is the model that defines the template of the relationship between the inputs and the output
Selecting the right model is a more difficult task than optimizing the parameters of a given model
The task of estimating a numeric output value from a set of input values is called regression
In ML, regression is one type of supervised learning
How well a model trained on the training set predicts the right output for such new instances is called the generalization ability of the model and the learning algo. For e.g. a student can solve all exercises taught previously, but we want them to acquire a general understanding from those so that they can also solve new questions about same topic
ML, and prediction, is possible because the world has regularities. Things in the world change smoothly. This is Leibniz’s dictum that Nature does not make jumps. Objects occupy a continuous block of space in the world. To travel from point A to point B, we need to move through points in between and can not yet just beam from A to B
The assumptions that any learning algo makes to find a unique model is called the inductive bias of the learning algo
Learning also performs compression. Once we learn the rule underlying the data, we do not need data anymore hence requiring less memory to store and less computation to process. For e.g. if we learn rules of multiplication, we do not need to remember the product of every possible pair of numbers
In regression, the task is to find a line that passes as close as possible to the data points
In classification, it is to fit a separating boundary between the data points from different classes
FP = false positives, FN = false negatives, TP = true positives, TN = true negatives
Classification error = (FP + FN) / (P + N)
Classification accuracy = (TP + TN) / (P + N)
Precision = TP / (TP + FP) i.e. what percentage of retrieved instances are really relevant
Recall = TP / P i.e. what percentage of relevant instances are retrieved
We want both precision and recall to be as close to 1 as possible

Chapter 3

<coming soon>. Am still at page 82/ 224 and hence this document is in WIP. Next update expected in a couple of weeks.

PS: I read this in the time period — Mar 04, 2023 to <still reading>.

Book summary | Machine Learning by Ethem Alpaydin

Written by justgig8

No responses yet