First, thanks to MIT Press and Ethem Alpaydin for giving us this gem. Here we go.
Chapter 1
- ML talks about a program that learns from data acquires intelligence beyond that of the programmer
- Learning from data is not new, it is at the heart of science. Scientists used to make observations, collect data, then come up with laws that explain those data
- We are at a point where we want to automate this process of going from data to knowledge, because now we have much more data
- Inference of a hidden model (namely the hidden factors and their interaction) from the observed data is at the core of machine learning
- Personal computers became popular in 1980s, allowing our lives to be recorded digitally. A significant step as this is data we can analyze and learn from
- Then came smartphones, rarely used as phones now but much more. A smartphone is a mobile sensor that makes us detectable, traceable and recordable
- Then there is ubiquitous computing i.e. using computers without knowing we are using one
- Furthermore if these computers are online, we call them smart objects or talk about Internet of Things
- Thousands of years ago, you needed to be a god or goddess if you wanted to be painted, sculpted or have your story remembered. Hundreds of years ago, you needed to be a king or a queen. Later you could be a rich merchant to do so. Now anybody can
- With social media, each of us is now a celebrity whose life is worth following and we are our own paparazzi
- With ML, data starts to drive the ops, it defines what to do next
- Data mining is one type of ML where large data is mined, processed to construct a model with high predictive accuracy
- Data today comes from different modalities — it is multimedia i.e. text, touch, sound, video etc.
- Brain still surpasses current engineering products in a few things, for e.g. vision, speech recognition and learning
- Neurons in the brain have connections called synapses, to tens of thousands of other neurons and they all operate in parallel
- It is believed that both processing and memory are distributed together over the network; processing is done by neurons and memory occurs in the synapses
- ML i.e. a computer program that learns. Learnings means getting better with experience. Experience is data collected in past. Better implies a performance criterion that is optimized
- A lot of data does not necessarily mean there are underlying rules that can be learned. For e.g. a phone book with people’s names and numbers, using this we can’t predict a new person’s number
- Going from particular examples to general concepts is called induction. For e.g. we see many trees at different times in different places, all slightly different from each other, yet at the same time they have something in common which our brain understands.. this commonness is the ‘treeness’ using which our brain can see a new object and identify whether it is a tree or not
- Neural network research which later led to ML started in the 1980s due to advances in VLSI technology
Chapter 2
- No matter how many properties we list as input, there are always other factors that affect the output; we can not possibly record and take all of them as input, and all these other factors that we neglect introduce uncertainty
- One of the most critical points in learning is the model that defines the template of the relationship between the inputs and the output
- Selecting the right model is a more difficult task than optimizing the parameters of a given model
- The task of estimating a numeric output value from a set of input values is called regression
- In ML, regression is one type of supervised learning
- How well a model trained on the training set predicts the right output for such new instances is called the generalization ability of the model and the learning algo. For e.g. a student can solve all exercises taught previously, but we want them to acquire a general understanding from those so that they can also solve new questions about same topic
- ML, and prediction, is possible because the world has regularities. Things in the world change smoothly. This is Leibniz’s dictum that Nature does not make jumps. Objects occupy a continuous block of space in the world. To travel from point A to point B, we need to move through points in between and can not yet just beam from A to B
- The assumptions that any learning algo makes to find a unique model is called the inductive bias of the learning algo
- Learning also performs compression. Once we learn the rule underlying the data, we do not need data anymore hence requiring less memory to store and less computation to process. For e.g. if we learn rules of multiplication, we do not need to remember the product of every possible pair of numbers
- In regression, the task is to find a line that passes as close as possible to the data points
- In classification, it is to fit a separating boundary between the data points from different classes
- FP = false positives, FN = false negatives, TP = true positives, TN = true negatives
- Classification error = (FP + FN) / (P + N)
- Classification accuracy = (TP + TN) / (P + N)
- Precision = TP / (TP + FP) i.e. what percentage of retrieved instances are really relevant
- Recall = TP / P i.e. what percentage of relevant instances are retrieved
- We want both precision and recall to be as close to 1 as possible
Chapter 3
- <coming soon>. Am still at page 82/ 224 and hence this document is in WIP. Next update expected in a couple of weeks.
PS: I read this in the time period — Mar 04, 2023 to <still reading>.