A (machine) learning journal
https://lpalmieri.com/
Recent content on A (machine) learning journalHugo -- gohugo.ioen-ukFri, 14 Sep 2018 09:59:06 +0000Machine Learning: it's time to embrace version control [DataOps]
https://lpalmieri.com/posts/2018-09-14-machine-learning-version-control-is-all-you-need/
Fri, 14 Sep 2018 09:59:06 +0000https://lpalmieri.com/posts/2018-09-14-machine-learning-version-control-is-all-you-need/At data science meetups there is a quite recurring horror story: projects where code and data were passed around between teammates (and clients) using emails with zip files attachments.
If you have ever worked on a PowerPoint presentation before the Google Slides’ era, you know what I am talking about. The most common source of issues in those situations is versioning: knowing how each zip file is related to the other ones in order to understand what has been changed by whom and, more importantly, on what version of the project you are supposed to be working on.Reinforcement Learning: a comprehensive introduction [Part 2]
https://lpalmieri.com/posts/rl-introduction-02/
Wed, 11 Jul 2018 12:00:00 +0000https://lpalmieri.com/posts/rl-introduction-02/Recap In the previous post we introduced state-value and history-value functions for a policy $\pi$ which allow us to compute the expected return at different starting points in time. They can be used to compare the effectiveness of different policies, which plays well with our intent of finding the optimal policy for the task at hand.
How do we compute them? We derived a generalized form of the Bellman equation which, under a set of stronger hypotheses on the agent and on the environment, simplifies to a manageable expression which we feel confident to solve.Reinforcement Learning: a comprehensive introduction [Part 1]
https://lpalmieri.com/posts/rl-introduction-01/
Mon, 11 Jun 2018 12:00:00 +0000https://lpalmieri.com/posts/rl-introduction-01/Recap In the previous post we introduced:
states, $\{S_t\}_{t=1}^{T}$; actions, $\{A_t\}_{t=1}^{T}$; rewards, $\{R_{t+1}\}_{t=1}^{T}$. We remarked that states and rewards are environment-related random variables: the agent has no way to interfere with the reward mechanism or modify the state transition resulting as a consequence of one of its actions. Actions are the only domain entirely under the responsibility of the agent - specifying the probability distribution of $A_t$ conditioned on all the possible values of $S_t, \, A_{t-1}, \, \dots, S_{1}$ for every $t\in\mathbb{N}$ is exactly equivalent to a full specification of the agent behaviour - we shall take a closer look at the issue in this post.Reinforcement Learning: a comprehensive introduction [Part 0]
https://lpalmieri.com/posts/rl-introduction-00/
Fri, 11 May 2018 12:00:00 +0000https://lpalmieri.com/posts/rl-introduction-00/You might be tired of hearing of it by now, but it’s impossible to start a blog series on Reinforcement Learning without mentioning the game of Go in the first 5 lines. It all started in May 2016: AlphaGo, a computer program developed by Google, won 4 Go games (in a series of 5) against Lee Sedol, the current World Champion. (link)
Defining the event as “an historic achievement” is an understatement: the game of Go proved to be way more difficult for machines than chess - too many possible configurations of the game board, an impossible task to solve with brute force alone.