Blog posts

2019

Spark introduction

1 minute de lecture

Mis à jour :

This lecture will be an abstract overview, we will discuss:

  • Spark
  • Spark vs MapReduce
  • Spark RDDs
  • Spark DataFrames

Hadoop

1 minute de lecture

Mis à jour :

When data gets too large to be dealt with in memory (most computers have up to 32 GB in RAM usually), it is possible to use a distributed system.

Link prediction

7 minute de lecture

Mis à jour :

In this lecture, we learn the basics of how to perform unsupervised link prediction and supervised ling prediction. We will overview the following techniques:

Link prediction

plus petit que 1 minute de lecture

Mis à jour :

Spreading through Network

2018

Language modelling

7 minute de lecture

Mis à jour :

“But it must be recognized that the notion ’probability of a sentence’ is an entirely useless one, under any known interpretation of this term.”

Dialogue

20 minute de lecture

Mis à jour :

Word2Vec rocks!

First order scattering transform

9 minute de lecture

Mis à jour :

Much like Fourier transform expresses periodic functions as a sum of sinus and cosinus, the Wavelet transform expresses signals as a weighted sum of a special kind of functions, wavelets. Both use some inner product (scalar product and convolution) of an input signal and a given kernel / mask. The difference lies in the kernel the kernel of the transformation.

Statistics basics

plus petit que 1 minute de lecture

Mis à jour :

1. Introduction

2. Generalities

3. Robust estimation

3.1. L-Statistics

3.2. M-estimates M-estimates of Location ### 3.3. M-estimates of scale

4. Robust regression

Statistics basics

13 minute de lecture

Mis à jour :

1. Statistical modelling

Statistics basics

4 minute de lecture

Mis à jour :

1. Convergence of random variables

Gabor filters

2 minute de lecture

Mis à jour :

In image processing, aa Gabor filter, (Dennis Gabor) is a linear filter used for texture analysis.

Language modelling

1 minute de lecture

Mis à jour :

1. Social Media Analytics: Introduction

Low-level vision framework

8 minute de lecture

Mis à jour :

The problem of image analysis and understanding has gained high prominence over the last decade, and has emerged at forefront of signal and image processing research (read more here). In consequence, my first post on computer vision will deal with the basics of image understand.

Filters

7 minute de lecture

Mis à jour :

This post deals with the basics of Filtering (see a family of methods) which is extremely useful in computer vision.

Filters

plus petit que 1 minute de lecture

Mis à jour :


Predictions in Reinforcement Learning

3 minute de lecture

Mis à jour :

In a model-free setting, the transition probabilities are unknown and the agent must interact with the environment. An additional challenge is the possibility that the control policy is different to the one to estimate. This is called off-policy but we will focus on on-policy in this blogpost. We will leverage a simulator and a policy $\pi$ - coupled with our knowledge of $S, A, \gamma$ to run episodes and improve the latter from sampled data.

Introduction to reinfocement learning

7 minute de lecture

Mis à jour :

Reinforcement learning find its roots in several scientific fields, such as Deep learning, Psychology, Control, Statistics (but not limited to!). It typically consists of taking suitable action to maximize reward in a particular situation. Below is a common illustration of its core idea:

Introduction to dynamic programming

7 minute de lecture

Mis à jour :

Dynamic Programming is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving (often recursively) each of those subproblems just once, and storing their solutions using a memory-based data structure (array, map,etc).

Recurrent Neural Networks

1 minute de lecture

Mis à jour :

A major limitation of Vanilla Neural Network and Convolutional Neural Networks is that API is rather constrained:

Utility theory

9 minute de lecture

Mis à jour :

This blogpost is the first one of a series, whose aim is to both introduce what is Decision Modeling and build the corresponding mathematical framework. Decisions model are very important - particularly in business - as they reduce stress and deal with uncertainty. Supported by ever increasing amounts of data and sophisticated algorithms, their growing power have captured plenty of C-suite attention in the recent years. From very accurate predictions to guiding knotty optimization choices, decisions models are essential and worthy of interest.

Social choice theory

7 minute de lecture

Mis à jour :

Social choice theory is an established field and a cornerstone of countless others: Economics, Political science, Computer science, Applied mathematics, Operational Research. As for AI applications, it turns out to be really useful for developing Multiple Agents systems.

Outranking methods

7 minute de lecture

Mis à jour :

A collective decision problem

Multiple criteria decision

9 minute de lecture

Mis à jour :

Multiple-criteria decision analysis (MCDA) is a sub-discipline of operations research that explicitly evaluates multiple conflicting criteria in decision making (both in daily life and in settings such as business, government and medicine).

Interpretability in Deep Learning

22 minute de lecture

Mis à jour :

Deep learning has been all the rage for the last few years. Powered by ever-increasing power, memory and bigger datasets, neural networks are easier to train than before. However, despite the tremendous success of certain applications (computer vision, Go and Dota…), the theoretical understanding of what a neural net actually takes more time to burgeon.

Optimizing Deep Learning

16 minute de lecture

Mis à jour :

  • Architectures as priors on function space, initializations as random nonlinear projections

Word embeddings

5 minute de lecture

Mis à jour :

An embeddings is a representation of an object (word, image) formulated as continuous vectors. They are constructed so that similar objects can have similar embeddings (metric learning). Usually, embeddings are not the final goal but are rather used as features (feature learning).

Word embeddings

5 minute de lecture

Mis à jour :

Goal: Use word embeddings to embed larger chunks of text!

Optimizing Deep Learning

7 minute de lecture

Mis à jour :

First-order methods: gradient descent and variants.

Multi-Layer Perceptron

2 minute de lecture

Mis à jour :

This post covers the history of Deep Learning, from the Perceptron to the Multi-Layer Perceptron Network.

Auto-encoder

4 minute de lecture

Mis à jour :

Autoencoder

Community detection

14 minute de lecture

Mis à jour :

The notion of community structure captures the tendency of nodes to be organized into communities, where members within a community are more similar among each other.

The scale free property

5 minute de lecture

Mis à jour :

Hubs are encountered in most real networks. They represent a signature of a deeper organizing principle that we call the scale-free property.

Barabási-Albert Model

5 minute de lecture

Mis à jour :

Hubs represent the most striking difference between a random and a scale-free network.

The Random Network Model

4 minute de lecture

Mis à jour :

Network science aims to build models that reproduce the properties of real networks. As most encountered networks are irregular alnd look like they were spun randomly.

Naive Bayes

1 minute de lecture

Mis à jour :

Naïve Bayes is a generative learning algorithm for discrete valued input. In particular, it is known to work great on texts classification tasks like spam detection.

Probabilistic classifiers

1 minute de lecture

Mis à jour :

Discriminative learning algorithms are algorithms that try to learn $p(yx)$ directly (such as logistic regression), or algorithms that try to learn mappings directly from the space of inputs $X$ to the labels ${0, 1}$, (such as the perceptron algorithm).

Linear Regression

6 minute de lecture

Mis à jour :

In layman terms, a Linear Regression consists in predicting a continuous dependent variable $Y$ as a linear combination of the independent variables $X = {x_{1}, …, x_{n} }$. The contribution of each variable $x_{i}$ is expressed by a parameter $\beta_{i}$. Altogether, it is a simple weighted sum:

Decision trees

7 minute de lecture

Mis à jour :

Decision Tree Induction

Logistic Regression

1 minute de lecture

Mis à jour :

TL; DR: Logistic Regression is a simple but oftentimes efficient algorithm that tackles binary classification.

Introduction to optimization

4 minute de lecture

Mis à jour :

Gentle introduction to the convexity, derivatives and the taxonomy of problems in optimization.

Agents in AI

7 minute de lecture

Mis à jour :

Agents

Constrained optimization

3 minute de lecture

Mis à jour :

Goal: Make use of the Lagrangian and other methods to accomodate constraints.

Unconstrained optimization

3 minute de lecture

Mis à jour :

Main assumption: $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ is $\mathcal{C}^{1}$ or $\mathcal{C}^{2}$.

Introduction to Machine Learning

9 minute de lecture

Mis à jour :

In a nutshell, Machine Learning is a sub-field of Artificial Intelligence whose aim is to convert experience into expertise of knowledge. Born in the 1960s, it quickly grew as a separated field because of a focus shift from decisional AI (logical and knowledge-based approach); its aim is not General Artificial Intelligence but rather tackle solvable problems of a practical nature. And ever since the 1990s, Machine Learning is progressing and flourishing rapidly. This success can largely be attributed to two factors:

Map Reduce

7 minute de lecture

Mis à jour :

MapReduce is:

  1. A simple programming model for processing huge data sets in a distributed way.
  2. A framework that runs these programs on clusters of commodity servers, automatically handling the details of distributed computing :
    • Division of labor.
    • Distribution.
    • Synchronization.
    • Fault-tolerance.

Graph theory

7 minute de lecture

Mis à jour :

Networks are a central aspect of any systems, and Graph theory - a branch of Mathematics - is fundamental to grasp and represent those networks. From degrees to degree distributions, from paths to distances and learn to distinguish weighted, directed and bipartite networks.