Search This Blog

Featured Post

Machine Learning, Big Data, AI, Deep Learning

Saturday, June 9, 2018

10/06/2018 Convolutional Neural Networks for Visual Recognition

Convolutional Neural Networks for Visual Recognition


Lecture 1 introduction


Lecture 2 Image Classification Pipeline


Lecture 3 Loss Functions and Optimization


Lecture 4 Introduction to Neural Network


Lecture 5 Convolutional Neural Networks


Lecture 6 Training Neural Networks


Lecture 7 Training Neural Networks II


Lecture 8 Deep Learning Software


Lecture 9 CNN Architectures


Lecture 10 Recurrent Neural Networks


Lecture 11 Detection and Segmentation


Lecture 12 Visualizing and Understanding


Lecture 13 Generative Model


Lecture 14 Deep Reinforcement Learning


Lecture 15 Efficient Methods and Hardware for Deep Learning


Lecture 16 Adversarial Examples and adversaial Training



The above materials are provided by http://cs231n.stanford.edu/
This blog post only organized for easier personal learning purpose.

Book:
Deep Learning By Ian Goodfellow and Yoshua Bengio and Aaron Courville
http://www.deeplearningbook.org/



Wednesday, June 6, 2018

06/06/2018 Integrate Github in visual Studio

1. Open github account
2. Create a repository in github
3. Clone the HTTPS link of the repository 
4. in visual studio termina or terminal, type
  • git config --global user.name xxxxx
  • git clone urllink (from step 3)

https://www.theregister.co.uk/2015/12/07/visual_studio_code_git_integration/

5. git pull urllink
6. git remote add name urllink
7. git push name

Saturday, June 2, 2018

Machine Learning, Big Data, AI, Deep Learning

Machine Learning

Machine Learning learn how to combine input to produce useful predictions on never-before-seen data.

Fundamental concept in Machine learning

Feature (x) - variable input
Label (y) - things predicting
example - a particular instance of data x
  • labeled example - instances of features (x) with labels (y)
  • unlabeled example- instances of features (x) without labels (y) 
Models - defines relationship between features and label
Training - creating or learning the model, show the model with labeled exmples to enable the model gradually learn the relationships of features and label
Inference - apply the trained model to unlabeled examples for prediction.

Model types

  • Regression - predict continuous value
  • Classification - predict discrete value

Linear Regression

y = mx +c
y = b +wx

y - labels
x - feature
b - bias
w - weight of the feature

To infer (predict) y, substitute the value into x.

Empirical risk minimization - a process in supervised learning, a machine learning algorithm build a model by examining many examples and attempting to find a model that minimize lose.

Loss - a number show how bad the model's prediction was on a single example. If the prediction is perfect, the loss is zero; otherwise, the loss is greater.

Squared loss - the square of the difference between the label and the prediction

Interpretation
http://www.leeds.ac.uk/educol/documents/00003759.htm
https://www.khanacademy.org/math/calculus-home/taking-derivatives-calc
https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/partial-derivative-and-gradient-articles/a/introduction-to-partial-derivatives

Reducing Loss

An iterative approach to training a model

A Machine Learning model is trained by starting with an initial guess for the weights and bias and iteratively adjusting those guesses until learning the weights and bias with the lowest possible loss.

Iterate until overall loss stops changing or at least changes extremely slowly. When that happens, we say that the model has converged. (examine all possible value of w1)

Convex problems have only one minimum; that is, only one place where the slope is exactly 0.


Gradient descent to training a model

The first stage in gradient descent is to pick a starting value (a starting point) for w1.  The starting point doesn't matter much; therefore, many algorithms simply set w1 to 0 or pick a random value.
The gradient descent algorithm then calculates the gradient of the loss curve at the starting point. The gradient if the loss curve is equal to the derivative (slope) of the curve, and tells you which way is "warmer" or "colder." When there are multiple weights, the gradient is a vector of partial derivatives with respect to the weights.

 A gradient is a vector with two characteristics:

  • a direction
  • a magnitude
Gradient descent algorithms multiply the gradient by a scalar known as the learning rate (also sometimes called step size) to determine the next point.

Hyperparameters are the knobs that programmers tweak in machine learning algorithms.

If  a learning rate that is too small, learning will take too long.
If a learning rate is too large, the next point will perpetually bounce haphazardly across the bottom of the well like a quantum mechanics experiment gone horribly wrong.


Goldilocks value is related to how flat the loss function is. If you know the gradient of the loss function is small then you can safely try a larger learning rate, which compensates for the small gradient and results in a larger step size


Stochastic gradient descent (SGD)

In gradient descent, a batch is the total number of examples you use to calculate the gradient in a single iteration.  A large data set with randomly sampled examples probably contains redundant data. In fact, redundancy becomes more likely as the batch size grows. Some redundancy can be useful to smooth out noisy gradients, but enormous batches tend not to carry much more predictive value than large batches.

Stochastic gradient descent (SGD) get the right gradient on average for much less computation by choosing examples at random from data set. We could estimate (albeit, noisily) a big average from a much smaller one. SGD uses only a single example (a batch size of 1) per iteration. Given enough iterations, SGD works but is very noisy. The term "stochastic" indicates that the one example comprising each batch is chosen at random.

Mini-batch stochastic gradient descent (mini-batch SGD)  is a compromise between full-batch iteration and SGD. A mini-batch is typically between 10 and 1,000 examples, chosen at random. Mini-batch SGD reduces the amount of noise in SGD but is still more efficient than full-batch.


Tensorflow

Big Data


Big data is not about the size of the data, it is about the value within the data.

Big Data Analytics

A collection of frameworks for generate valuable equation (regression)
  • MapReduce Framework
  • Hadoop Distributed File System (HDFS)
  • Cluster

Data in Big Data

  • Structured data
  • Semi-structure data
  • Unstructured data

Big Data 4V 


Justification on deploy big data in 4 directions

  1. Volume  - Data Quantity
  2. Velocity - Data Speed
  3. Variety - Data Types
  4. Varecity (Analytics)

Justification 4V via two level

  1. Distributed Computation
  2. Distributed Storage

Data Charactrristics

  • Activity Data
  • Conversation Data
  • Photo and Video image data
  • Sensor Data
  • The Internet of Things Data

AI - Artificial Intelligence

Machine Learning

Creating algorithms

Deep Learning

Using deep neural networks (NN) to automatically learn hierarchical representations


Machine Learning Task

  1. Classification - Predict a class of an object
  2. Regression - Predict a continuous value for an object 
  3. Clustering - group similar object together
  4. Dimensionality reduction - " compress " data from a high- dimensional representation into a lower-dimensional one
  5. Ranking
  6. Recommendations - filter a small subset of objects from a large collection and recommend them to a user

Deep Learning

Commonly use deep learning toolkits

Caffe
CNTK
Tensorflow
Theano
Torch

Common use networks
  • ConvNets: AlexNet, OxfordNet, GoogleNet
  • RecurrentNets: plain RNN, LSTM/GRU, bidirectional RNN
  • Sequential modeling with attention.
Compare of toolkits reference
https://github.com/zer0n/deepframeworks/blob/master/README.md?utm_source=tuicool&utm_medium=referral

Reference

Convolutional Neural Networks for Visual Recognition
http://derekwaikl.blogspot.com/2018/06/convolutional-neural-networks-for.html


Deep Learning Tutorials
http://deeplearning.net/tutorial/

Stanford Deep Learning
http://deeplearning.stanford.edu/




Books to read:

Bengio Y. Learning Deep Architectures for AI[J]. Foundations & Trends® in Machine Learning, 2009, 2(1):1-127.

Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks[J]. Science, 2006, 313(5786):504-507.

He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2015.

Srivastava R K, Greff K, Schmidhuber J. Highway networks. arXiv:1505.00387, 2015.