Sunday, January 18, 2015

Neural Network And Deep Learning Ebook

http://neuralnetworksanddeeplearning.com/index.html

Neural Networks and Deep Learning is a free online book. The book will teach you about:
  • Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data
  • Deep learning, a powerful set of techniques for learning in neural networks
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you the core concepts behind neural networks and deep learning. The book is currently an incomplete beta draft. More chapters will be added over the coming months. For now, you can:
Neural Networks and Deep Learning is a free online book. The book will teach you about:
  • Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data
  • Deep learning, a powerful set of techniques for learning in neural networks
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you many of the core concepts behind neural networks and deep learning. For more details about the approach taken in the book, see here. Or you can jump directly to Chapter 1 and get started.
 

Fast Domain Generalization with Kernel Methods – Part 3 (PCA, KPCA, TCA)

Walaupun teori tentang Principal Component Analysis (PCA) dari sudut pandang statistics ataupun information theory dapat ditelusuri hingga tahun 1901, penggunaannya secara praktis dimulai pada tahun 1980an seiring dengan perkembangan teknologi komputer. Saat ini PCA merupakan metode yang sangat populer di berbagai bidang seperti signal processing, neurosciencemachine learning, finance, hingga ke social science. Secara umum, PCA digunakan untuk mencari atau menganalisis pola dari data yang berdimensi banyak. Salah satu hal yang disukai dari PCA adalah proses komputasi yang relatif efisien.
Di berbagai aplikasi, seringkali kumpulan data yang berdimensi banyak dapat dijelaskan hanya dengan beberapa variabel saja (mulai saat ini kita sebut variabel tsb dengan istilah variabel latent). Ambil contoh kasus heart disease prediction untuk mendeteksi apakah seseorang mengidap penyakit jantung diberikan sejumlah variabel. Terdapat 76 variabel yang perlu dianalisis dimulai dari usia, jenis kelamin, tekanan darah, detak jantung, dsb. Pertanyaan yang sering diajukan pertama kali adalah mampukah prediksi tersebut dilakukan dengan menggunakan lebih sedikit variabel dengan hanya memilih variabel-variabel paling informatif ? Hal ini mungkin dilakukan secara manual oleh human experts. Namun demikian, akan lebih elegan apabila komputer dapat mengidentifikasi variabel latent secara otomatis. Di Machine Learning, permasalahan ini dikaji secara intensif dalam sub-bidang yang dikenal sebagai Dimensionality Reduction.
Saat ini PCA telah berkembang menjadi berbagai bentuk. Pada kesempatan kali ini saya coba bermain-main dengan PCA beserta 2 variannya: Kernel Principal Component Analysis (KPCA) dan  Transfer Component Analysis (TCA). Yang terakhir ini khusus dirancang untuk aplikasi transfer learning / domain adaptation.

‘Soft’ Artificial Intelligence Is Suddenly Everywhere

http://blogs.wsj.com/cio/2015/01/16/soft-artificial-intelligence-is-suddenly-everywhere/

“Artificial intelligence is suddenly everywhere. It’s still what the experts call soft A.I., but it is proliferating like mad.”  So starts an excellent Vanity Fair article, Enthusiasts and Skeptics Debate Artificial Intelligence, by author and radio host Kurt AndersenArtificial intelligence is indeed everywhere, but these days, the term is used in so many different ways that it’s almost like saying that computers are now everywhere. It’s true, but so general a statement that we must probe a bit deeper to understand its implications, starting with what is meant by soft AI, versus its counterpart, strong AI.
Soft, weak or narrow AI is inspired by, but doesn’t aim to mimic, the human brain. These are generally statistically oriented, computational intelligence methods for addressing complex problems based on the analysis of vast amounts of information using powerful computers and sophisticated algorithms, whose results exhibit qualities we tend to associate with human intelligence.
Soft AI was behind Deep Blue, IBM Corp.IBM +1.66%’s chess playing supercomputer, which in 1997 won a celebrated chess match against then reigning champion Gary Kasparov, as well as Watson, IBM’s question-answering system, which in 2011 won the Jeopardy! Challenge against the two best human Jeopardy! players. And, as Mr. Andersen notes in his article, it’s why “We’re now accustomed to having conversations with computers: to refill a prescription, make a cable-TV-service appointment, cancel an airline reservation – or, when driving, to silently obey the instructions of the voice from the G.P.S.”
This engineering-oriented AI is indeed everywhere, and being increasingly applied to activities requiring intelligence and cognitive capabilities that not long ago were viewed as the exclusive domain of humans. AI-based tools are enhancing our own cognitive powers, helping us process vast amounts of information and make ever more complex decisions.
Soft AI was nicely discussed in a recent Wired article, The Three Breakthroughs That Have Finally Unleashed AI on the World, by author and publisher Kevin Kelly, who called it a kind of “cheap, reliable, industrial-grade digital smartness running behind everything, and almost invisible except when it blinks off.”
“It will enliven inert objects, much as electricity did more than a century ago. Everything that we formerly electrified we will now cognitize. This new utilitarian AI will also augment us individually as people (deepening our memory, speeding our recognition) and collectively as a species. There is almost nothing we can think of that cannot be made new, different, or interesting by infusing it with some extra IQ…  Like all utilities, AI will be supremely boring, even as it transforms the Internet, the global economy, and civilization.”
“In the past, we would have said only a superintelligent AI could drive a car, or beat a human at Jeopardy! or chess,”  writes Mr. Kelly. “But once AI did each of those things, we considered that achievement obviously mechanical and hardly worth the label of true intelligence. Every success in AI redefines it.” Such a redefinition is now taking place with data science, one of hottest new professions and academic disciplines. It’s hard to tell where data science stops and AI starts. We’ve started to view former AI achievements as mere data science applications. The two disciplines are evolving in tandem, with AI leading the way and data science commercializing its advances.
Strong AI, on the other hand, aims to develop machines with a kind of artificial general intelligence that can successfully match or exceed human intelligence in cognitive tasks such as reasoning, planning, learning, vision and natural language conversations on any subject. Mr. Andersen’s Vanity Fair article discusses a group of strong AI advocates he refers to as the Singularitarians, who believe that beyond exceeding human intelligence, machines will some day become sentient–displaying a consciousness or self-awareness and the ability to experience sensations and feelings.
He turned to Siri to help him define Singularity. “What is the Singularity?,” he asked her. Siri answered: “A technological singularity is a predicted point in the development of a civilization at which technological progress accelerates beyond the ability of present-day humans to fully comprehend or predict.”
The term is most closely associated with Ray Kurzweil, author, computer scientist, inventor and presently director of engineering at Google. In his 2005 book The Singularity is Near: When Humans Transcend Biology, Mr. Kurzweil predicted that the Singularity will be reached around 2045, at which time “machine intelligence will be infinitely more powerful than all human intelligence combined.”
“And, if the Singularity is near, will it bring about global techno-Nirvana or civilizational ruin?,” Mr. Andersen asks.  “Since the turn of this century, big-time tech-industry figures have taken sides: ultra-geeky masters of the tech universe versus other ultra-geeky masters of the tech universe. It’s a kind of Great Schism separating skeptics from true believers, dystopians from utopians, the cautious men from the giddy boys.”
Personally, I’m on the side of the skeptics, even though the true believers include a number of brilliant technologists and successful entrepreneurs. But I can neither understand Mr. Kurzweil’s utopian visions, nor the dystopian fears of Tesla’s founder Elon Musk, who calls AI “our biggest existential threat” and “a demon” being summoned by foolish scientists and technologists. A similar concern has been expressed by world-renowned physicist Stephen Hawking, who recently told the BBC:  “The development of full artificial intelligence could spell the end of the human race…  Humans, who are limited by slow biological evolution, couldn’t compete, and would be superseded.”
Frankly, the potential advent of super-intelligent, sentient machines is not high on the list of things I worry about. What really concerns me are the highly complex IT systems that we’re increasingly dependent on in our every day life. I worry whether we’ve taken the proper care in designing the powerful computer systems that have now penetrated  just about every nook and cranny of our economies and societies.
Technology advances have enabled us to develop systems with seemingly unlimited capabilities. Highly sophisticated, software-intensive smart systems are being deployed in industry after industry, from energy and transportation to finance and entertainment. These complex systems are composed of many different kinds of components, intricate organizations and highly different structures, all highly interconnected and interacting with each other. They exhibit dynamic, unpredictable behaviors as a result of the interactions of their various components, making them hard to understand and control.
Even more complex are socio-technical systems, which involve people as well as technology. Such systems have to deal not only with tough hardware and software issues, but with the even tougher issues involved in human behaviors, business organizations and economies. We are increasingly developing highly complex socio-technical systems in areas like health careeducationgovernment and cities.
The very flexibility of software means that all the interactions between their various components, including people, cannot be adequately planned, anticipated or tested. That means that even if all the components are highly reliable, problems can still occur if a rare set of interactions arise that compromise the overall behavior and safety of the system.
How can we best manage the risks involved in the design and operation of complex, software intensive, socio-technical systems?  How do we deal with a system that is working as designed but whose unintended consequences we do not like? How can we protect our mission critical systems from cyberattacks? How can we make these systems as resilient as possible?
Human intelligence has evolved over millions of years. But humans have only been able to survive long enough to develop intelligence because of an even more fundamental evolution-inspired capability that’s been a part of all living organisms for hundreds of millions of years– the autonomic nervous system. This is the largely unconscious biological system that keeps us alive by controlling key vital functions, including  heart rate, digestion, breathing and protections against disease.
Our highly complex IT systems must become much more autonomic and resilient, capable of self-healing when failures occur and self-protecting when attacked. Only then will they be able to evolve and incorporate increasingly advanced capabilities, including those we associate with human-like intelligence.
Irving Wladawsky-Berger worked at IBM for 37 years and was then strategic advisor to Citigroup for 6 years. He is affiliated with MIT, NYU and Imperial College, and is a regular contributor to CIO Journal.

FAIR open sources deep-learning modules for Torch

https://research.facebook.com/blog/879898285375829/fair-open-sources-deep-l20earning-modules-for-torch/


Progress in science and technology accelerates when scientists share not just their results, but also their tools and methods. This is one of the reasons why Facebook AI Research (FAIR) is committed to open science and to open sourcing its tools.
Many research projects on machine learning and AI at FAIR use Torch, an open source development environment for numerics, machine learning, and computer vision, with a particular emphasis on deep learning and convolutional nets. Torch is widely used at a number of academic labs as well as at Google/DeepMind, Twitter, NVIDIA, AMD, Intel, and many other companies.
Today, we're open sourcing optimized deep-learning modules for Torch. These modules are significantly faster than the default ones in Torch and have accelerated our research projects by allowing us to train larger neural nets in less time.
This release includes GPU-optimized modules for large convolutional nets (ConvNets), as well as networks with sparse activations that are commonly used in Natural Language Processing applications. Our ConvNet modules include a fast FFT-based convolutional layer using custom CUDA kernels built around NVIDIA's cuFFT library. We'll discuss a few more details about this module lower in this post; for a deeper dive, have a look at this paper.
In addition to this module, the release includes a number of other CUDA-based modules and containers, including:
  • Containers that allow the user to parallelize the training on multiple GPUs using both the data-parallel model (mini-batch split over GPUs), or the model-parallel model (network split over multiple GPUs).
  • An optimized Lookup Table that is often used when learning embedding of discrete objects (e.g. words) and neural language models.
  • Hierarchical SoftMax module to speed up training over extremely large number of classes.
  • Cross-map pooling (sometimes known as MaxOut) often used for certain types of visual and text models.
  • A GPU implementation of 1-bit SGD based on the paper by Frank Seide, et al.
  • A significantly faster Temporal Convolution layer, which computes the 1-D convolution of an input with a kernel, typically used in ConvNets for speech recognition and natural language applications. Our version improves upon the original Torch implementation by utilizing the same BLAS primitives in a significantly more efficient regime. Observed speedups range from 3x to 10x on a single GPU, depending on the input sizes, kernel sizes, and strides.

FFT-based convolutional layer code

The most significant part of this release involves the FFT-based convolutional layer code because convolutions take up the majority of the compute time in training ConvNets. Since improving training time of these models translates to faster research and development, we've spent considerable engineering effort to improve the GPU convolution layers. The work has produced notable results, achieving speedups of up to 23.5x compared to the fastest publicly available code. As far as we can tell, our code is faster than any other publicly available code when used to train popular architectures such as a typical deep ConvNets for object recognition on the ImageNet data set.
The improvements came from building on insights provided by our partners at NYU who showed in an ICLR 2014 paper , for the first time, that doing convolutions via FFT can give a speedup in the context of ConvNets. It is well known that convolutions turn into point-wise multiplications when performed in the Fourier domain, but exploiting this property in the context of a ConvNet where images are small and convolution kernels are even smaller was not easy because of the overheads involved. The sequence of operations involves taking an FFT of the input and kernel, multiplying them point-wise, and then taking an inverse Fourier transform. The back-propagation phase, being a convolution between the gradient with respect to the output and the transposed convolution kernel, can also be performed in the Fourier domain. The computation of the gradient with respect to the convolution kernels is also a convolution between the input and the gradient with respect to the output (seen as a large kernel).
We've used this core idea and combined it with a dynamic auto-tuning strategy that explores multiple specialized code paths. The current version of our code is built on top of NVIDIA's cuFFT library. We are working on an even faster version using custom FFT CUDA kernels.
The visualizations shown here are color-coded maps that show the relative speed up of Facebook's ConvolutionFFT vs NVIDIA's CuDNN when timed over an entire round trip of the forward and back propagation stages. The heat map is red when we are slower and green when we are faster, with the color amplified according to the magnitude of speedup.
For small kernel sizes (3x3), the speedup is moderate, with a top speed of 1.84x faster than CuDNN.
For larger kernel sizes, starting from (5x5), the speedup is considerable. With larger kernel sizes (13x13), we have a top speed that is 23.5x faster than CuDNN's implementations.
Moreover, when there are use cases where you convolve with fairly large kernels (as in an example in this paper from Jonathan J. Tompson et al, where they use 128x128 convolution kernels), this path is a practically viable strategy.
The result you see is some of the fastest convolutional layer code available (as of the writing of this post), and the code is now open sourced for all to use. For more technical details on this work, you are invited to read our Arxiv paper.

Parallelization over Multiple GPUs

From the engineering side, we've also been working on the ability to parallelize training of neural network models over multiple GPU cards simultaneously. We worked on minimizing the parallelization overhead while making it extremely simple for researchers to use the data-parallel and model-parallel modules (that are part of fbcunn). Once the researchers push their model into these easy-to-use containers, the code automatically schedules the model over multiple GPUs to maximize speedup. We've showcased this in an example that trains a ConvNet over Imagenet using multiple GPUs.

Links

We hope that these high-quality code releases will be a catalyst to the research community and we will continue to update them from time to time.
To get started with using Torch, and our packages for Torch, visit the fbcunn page which has installation instructions, documentation and examples to train classifiers over ImageNet .
In case you missed it, we just released iTorch, a great interface for Torch using iPython. You can also checkout our smaller releases fbnn and fbcuda, as well as our past release fblualib .
Many people worked on this project. Credit goes to: Keith Adams, Tudor Bosman, Soumith Chintala, Jeff Johnson, Yann LeCun, Michael Mathieu, Serkan Piantino, Andrew Tulloch, Pamela Vagata, and Nicolas Vasilache.





Friday, January 16, 2015

How to get started with the Data Science Bowl

http://blog.dominodatalab.com/how-to-get-started-with-the-data-science-bowl/

I am thrilled to share a Domino project we’ve created with starter code in R and Python for participating in the Data Science Bowl. Our starter project can give you a jump start in the competition by letting you train your models on massive hardware and by letting you run multiple experiments in parallel while keeping track of your results.

Introduction

The Data Science Bowl is a Kaggle competition — with $175,000 in prize money and an opportunity to help improve the health of our oceans — to classify images of plankton.
Domino is a platform that lets you build and deploy your models faster, using R, Python, and other languages. To help Data Science Bowl competitors, we have packaged some sample code into a Domino project that you can easily fork and use for your own work.
This post describes how our sample project can help you compete in the Bowl, or do other open-ended machine learning projects. First, we give an overview of the code we've packaged up. Then we describe three capabilities Domino offers: easily scalable infrastructure; a powerful experimentation workflow; and a way to turn your models into self-service web forms.

Contents

  1. Three starter scripts you can use: an IPython Notebook for interactive work, a python script for long-running training, and an R script for long-running training.
  2. Scalable infrastructure and parallelism to train models faster.
  3. Experimenting in parallel while tracking your work so you can iterate on your models faster.
  4. Building a self-service Web diagnostic tool to test the trained model(s).
  5. How to fork our project and use it yourself to jumpstart your own work.


R & Python starter scripts

IPython Notebook

We took Aaron Sander’s fantastic tutorial and turned it into an actual IPython Notebook. You can download the full notebook from our Domino project or view a calculated, rendered version.

Python batch script

Next, we extracted the key training parts of Aaron’s tutorial and turned them into a batch script. Most of the code is the same as what’s in the IPython Notebook, but we excluded the diagnostic code for visualizing sample images.
The result is train.py. You can see the output of running this code here

R batch script

For an R example, we used Jeff Hebert’s PlanktonClassification project. In our Domino project, you can find this code in train.R or see the results from running it here.
As I’ll describe more below, there’s a separate parallel version of this R code, in train_parallel.R


Train faster


Domino lets you train your models much faster by scaling up your hardware with a single click. For example, you can use 8-, 16-, or even 32-core machines. To take advantage of this, we needed to generalize some of the code to better utilize multiple cores.
As you can see from the different experiments we ran, we had some significant speed boosts. For example:
  • The Python code took 50 min on a single core machine. With our parallelized version, it took 6.5 min on a 32-core machine
  • The R code took 14 min on a single core machine. With our parallelized version, it took 4 min on a 32-core machine

Python

Both in the IPython Notebook and in the train.py batch script, we modified the calls that actually train the RF classifier. The original code used n_jobs=3 which would use three cores. We changed this to n_jobs=-1 which will use all cores on the machine.
The original, non-parallel code
kf = KFold(y, n_folds=5)  
y_pred = y * 0  
for train, test in kf:  
    X_train, X_test, y_train, y_test = X[train,:], X[test,:], y[train], y[test]
    clf = RF(n_estimators=100, n_jobs=3)
    clf.fit(X_train, y_train)
    y_pred[test] = clf.predict(X_test)
print classification_report(y, y_pred, target_names=namesClasses)  
Our parallel version
kf = KFold(y, n_folds=5)  
y_pred = y * 0  
for train, test in kf:  
    X_train, X_test, y_train, y_test = X[train,:], X[test,:], y[train], y[test]
    clf = RF(n_estimators=100, n_jobs=-1)
    clf.fit(X_train, y_train)
    y_pred[test] = clf.predict(X_test)
print classification_report(y, y_pred, target_names=namesClasses)  

R

There are two places in the R code that benefited from parallelism.
First, training the random forest classifier. We use the foreach package with the doParallel backend to train parts of the forest in parallel and combine them all. It looks like a lot more code, but most of it is ephemera from loading and initializing the parallel libraries.
The original, non-parallel code
plankton_model <- randomForest(y = y_dat, x = x_dat)  
Our parallel version
library(foreach)  
library(doParallel)  
library(parallel)

numCores <- detectCores()  
registerDoParallel(cores = numCores)

trees_per_core = floor(num_trees / numCores)  
plankton_model <- foreach(num_trees=rep(trees_per_core, numCores), .combine=combine, .multicombine=TRUE, .packages='randomForest') %dopar% {  
                  randomForest(y = y_dat, x = x_dat, ntree = num_trees)
                }
A second part of the R code is also time-consuming and easily parallelized: processing the test images to extract their features before generating test statistics. We use a parallel for loop to process the images across all our cores.
The original, non-parallel code
test_data <- data.frame(image = rep("a",test_cnt), length=0,width=0,density=0,ratio=0, stringsAsFactors = FALSE)  
    idx <- 1
    #Read and process each image
for(fileID in test_file_list){  
    working_file <- paste(test_data_dir,"/",fileID,sep="")
    working_image <- readJPEG(working_file)

    # Calculate model statistics       
    working_stats <- extract_stats(working_image)
    working_summary <- array(c(fileID,working_stats))
    test_data[idx,] <- working_summary
    idx <- idx + 1
    if(idx %% 10000 == 0) cat('Finished processing', idx, 'of', test_cnt, 'test images', '\n')
}
Our parallel version
# assumes cluster is already set up from use above
names_placeholder <- data.frame(image = rep("a",test_cnt), length=0,width=0,density=0,ratio=0, stringsAsFactors = FALSE)  
    #Read and process each image
working_summaries <- foreach(fileID = test_file_list, .packages='jpeg') %dopar% {  
    working_file <- paste(test_data_dir,"/",fileID,sep="")
    working_image <- readJPEG(working_file)

    # Calculate model statistics

    working_stats <- extract_stats(working_image)
    working_summary <- array(c(fileID,working_stats))
}
library(plyr)  
test_data = ldply(working_summaries, .fun = function(x) x, .parallel = TRUE)  
# a bit of a hack -- use the column names from the earlier dummy frame we defined
colnames(test_data) = colnames(names_placeholder)  


Experiment & track results

Domino helps you develop your models faster by letting you experiment in parallel while keeping your results automatically tracked. Whenever you run your code, Domino keeps a record of it, and keeps a record of the result that you produced, so you can track your process and reproduce past work whenever you want.
For example, since our R code saves a submission.csv file when it runs, we get automatic records of each submission we generate, whenever we run our code. If we need to get back to an old one, we can just find the corresponding run and view its results, which will have a copy of the submission.
Each run that you start on Domino gets its own machine, too (of whatever hardware type you selected) so you can try multiple different techniques or parameters in parallel.



Build self-service tools

Have you ever been interrupted by non-technical folks who ask you to run things for them because they can’t use your scripts on their own? We used Domino’s Launchers feature to build a self-service web form to classify different plankton images. Here’s how it works:
  1. Anyone can visit our project and go to the Launchers section, where they’ll find a “Classify plankton image” launcher. This will pop up a form that lets you upload a file from your computer.

  2. When you select a file and click “Run”, Domino will pass your image to a classification script (which uses the RF model trained by the Python code) to predict the class of plankton in the image. Classification just takes a second, and you’ll see results when it finishes, including a diagnostic image and the printout of the predicted class. For example:

Try it yourself

  1. Visit the launcher
  2. Upload an image. If you need an example, download this one to your computer

Implementation

To implement this, we made some additional modifications to the Python training script. Specifically, when the training task finishes, we pickle the model (and class names) so we can load them back later.
joblib.dump(clf, 'dump/classifier.pkl')  
joblib.dump(namesClasses, 'dump/namesClasses.pkl')  
Then we created a separate classify.py script that loads the pickled files and makes a prediction with them. The script also generates a diagnostic image, but the essence of it is this:
file_name = sys.argv[1]  
clf = joblib.load('dump/classifier.pkl')  
namesClasses = joblib.load('dump/namesClasses.pkl')

predictedClassIndex = clf.predict(image_to_features(file_name)).astype(int)  
predictedClassName = namesClasses[predictedClassIndex[0]]

print "most likely class is: " + predictedClassName  
Note that our classify script expects an image file name to be passed at the command line. This lets us easily build a Launcher to expose a UI web form around this script:



Getting started

  1. Sign up for a Domino account and install our command-line tool
  2. Fork the project by clicking the button in the left area of the project.
  3. Clone your new project by running domino get {your_username}/plankton
  4. Run code, e.g., domino run train.py. Or use the web interface to start an IPython Notebook session and open the starter notebook. See our notebook documentation if you need more help.

Implementation notes

  • Our project contains the zipped data sets, but it explicitly ignores the unzipped contents (you can see this inside the .dominoignore file). Because Domino tracks changes whenever run your code, having a huge number of files (160,000 images, in this case) can slow it down. To speed things up, we store the zip files, and let the code unzip them before running. Unzipping takes very little time, so this doesn’t impact performance overall.
  • In the Python code, scikitlearn uses joblib under the hood for parallelizing its random forest training task. joblib, in turn, defaults to using /dev/shm to store pickeled data. On Domino's machines, /dev/shm may not have enough space for these training sets, so we set an environment variable in our project’s settings that tells joblib to use /tmp, which will have plenty of space

Caffe deep learning framework

http://caffe.berkeleyvision.org/

Caffe

Caffe is a deep learning framework developed with cleanliness, readability, and speed in mind. It was created by Yangqing Jia during his PhD at UC Berkeley, and is in active development by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Caffe is released under the BSD 2-Clause license.
Check out our web image classification demo!

Why use Caffe?

Clean architecture enables rapid deployment. Networks are specified in simple config files, with no hard-coded parameters in the code. Switching between CPU and GPU is as simple as setting a flag – so models can be trained on a GPU machine, and then used on commodity clusters.
Readable & modifiable implementation fosters active development. In Caffe’s first year, it has been forked by over 600 developers on Github, and many have pushed significant changes.
Speed makes Caffe perfect for industry use. Caffe can process over 40M images per day with a single NVIDIA K40 or Titan GPU*. That’s 5 ms/image in training, and 2 ms/image in test. We believe that Caffe is the fastest CNN implementation available.
Community: Caffe already powers academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. There is an active discussion and support community on Github.
* When files are properly cached, and using the ILSVRC2012-winning SuperVision model. Consult performance details.

Large Scale Visual Recognition Challenge 2012 (ILSVRC2012)

http://image-net.org/challenges/LSVRC/2012/index

Facebook open sources its cutting-edge deep learning tools

http://venturebeat.com/2015/01/16/facebook-opens-up-about-more-of-its-cutting-edge-deep-learning-tools/

In the spirit of helping everyone do artificial intelligence more efficiently, Facebook is giving away some of its seriously powerful deep learning tools for free in the Torch open-source library.
More specifically, the social networking company is open-sourcing code that helps it run complex algorithms for deep learning, an increasingly popular type of artificial intelligence. The company will announce the news in a blog post today.
Deep learning involves training systems, called artificial neural networks, on lots of information derived from audio, images, and other inputs, and then presenting the systems with new information and receiving inferences about it in response. Torch, an open-source library that’s been around since 2002, contains a framework for building and training neural networks.
Facebook is making several modules available via Torch, including one with a convolutional neural network layer featuring highly customized kernels — templates that slide over images to recognize certain objects — that could help researchers and engineers at other companies catch up with some of the performance improvements Facebook has made internally.
“For everyone out there, these kernels are faster than anything else in the open-source community,” Soumith Chintala, a Facebook artificial intelligence researcher and software engineer, told VentureBeat in an interview.
What’s more, Facebook is releasing “containers” to help distribute the work of training neural network across multiple graphics processors. “You’ll get a good speed-up,” Chintala explained. And the supercharged convolution layer code is as much as 23.5 times faster than the fastest system that was publicly available until this point, Chintala wrote in his blog post.
Facebook has previously contributed a wide variety of other software to the rest of the world under open-source licenses. But these new components demonstrate a serious commitment from within the Facebook Artificial Intelligence Research arm the company first formed in 2013, with the arrival of deep learning luminary Yann LeCun. More recently, that group brought on Vladimir Vapnik, known for his work on the Support Vector Machine algorithm.
Google, Twitter, Spotify, Netflix, and others have been quickly bringing in their own talent in the domain of deep learning. But open-source contributions in the area have been uncommon, making Facebook’s move notable.
Facebook could also get people outside of the company to improve the Torch modules and thus find new researchers to bring aboard.
Hiring motivations aside, Chintala believes the Torch project holds serious merit and that the new components should make it still more powerful.
“It’s like building some kind of electronic contraption or, like, a Lego set,” Chintala said. “You just can plug in and plug out all these blocks that have different dynamics and that have complex algorithms within them.”
At the same time, he said, Torch is actually not extremely difficult to learn — unlike, say, the Theano library.
“We’ve made it incredibly easy to use,” Chintala said. “We introduce someone to Torch, and they start churning out research really fast.”

Thursday, January 15, 2015

Analisa Karya Seni Dengan Artificial Intelligence

The task of classifying pieces of fine art is hugely complex. When examining a painting, an art expert can usually determine its style, its genre, the artist and the period to which it belongs. Art historians often go further by looking for the influences and connections between artists, a task that is even trickier.

https://medium.com/the-physics-arxiv-blog/when-a-machine-learning-algorithm-studied-fine-art-paintings-it-saw-things-art-historians-had-never-b8e4e7bf7d3e

Tutorial pattern classification

A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks

https://github.com/rasbt/pattern_classification

Tuesday, January 13, 2015

Tawaran Kuliah Machine Learning Online di Coursera

Berikut ini beberapa tawaran kuliah online di bidang machine learning di Coursera. Belajarnya gratis, bayar kalau perlu sertifikatnya saja.

Artificial Intelligence Planninghttps://www.coursera.org/course/aiplan

The course aims to provide a foundation in artificial intelligence techniques for planning, with an overview of the wide spectrum of different problems and approaches, including their underlying theory and their applications.

The Data Scientist’s Toolbox: https://www.coursera.org/course/datascitoolbox
Part of the Data Science Specialization »

Get an overview of the data, questions, and tools that data analysts and data scientists work with. This is the first course in the Johns Hopkins Data Science Specialization.

Practical Machine Learning: https://www.coursera.org/course/predmachlearn
Part of the Data Science Specialization »

Learn the basic components of building and applying prediction functions with an emphasis on practical applications. This is the eighth course in the Johns Hopkins Data Science Specialization.

Wednesday, January 7, 2015

Kaggle InClass: Stanford's "Getting a Handel on Data Science" Winners' Report

http://blog.kaggle.com/2015/01/05/kaggle-inclass-stanfords-getting-a-handel-on-data-science-winners-report/

Excerpt:

On December 3rd, Stanford Data Mining & Analysis course (STATS202) wrapped up a heated Kaggle InClass competition, "Getting a Handel on Data Science". Beating out 92 other teams, "TowerProperty" came in first place. Below, TowerProperty outlines the competition and their journey to the top of the leaderboard.
 
Kaggle InClass is provided free of charge to academics as a statistical and data mining learning tool for students. Instructors from any course dealing with data analysis may get involved!

Monday, January 5, 2015

CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając

http://blog.kaggle.com/2015/01/02/cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmunt-zajac/

Dr. Ben Graham

Dr. Ben Graham is an Assistant Professor in Statistics and Complexity at the University of Warwick. With a categorization accuracy of 0.95530 he ranked first place.

Congratulations on winning the CIFAR-10 competition! How do you feel about your victory?

Thank you! I am very pleased to have won, and quite frankly pretty amazed at just how competitive the competition was.
When I first saw the competition, I did not think the test error would go below about 8%. I assumed 32x32 pixels just wasn't enough information to identify objects very reliably. As it turned out, everyone in the top 10 got below 7%, which is roughly on a par with human performance.

Friday, January 2, 2015

Rattle: A Graphical User Interface for Data Mining using R

http://rattle.togaware.com/

Rattle: A Graphical User Interface for Data Mining using R
See OnePageR for a suite of guides for the Data Scientist using R.
Version 3.4.1 dated 2014-12-29.
> install.packages("rattle", repos="http://rattle.togaware.com", type="source")
$ wget http://togaware.com/access/rattle_3.4.1.tar.gz
Rattle (the R Analytical Tool To Learn Easily) presents statistical and visual summaries of data, transforms data into forms that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.
Errata              Brochure

Reviews

"Rattle is a tab-oriented user interface that is similar to Microsoft Office’s ribbon interface. It makes getting started with data mining in R very easy. This book covers both Rattle, the R code that Rattle creates, and writing some R code from scratch. Therefore it will appeal to both people seeking the ease-of-use that is very much missing from R, and people looking to learn R programming."
"The book is very enjoyable reading and is filled with useful information. It is aimed at both students learning data mining and data miners who are using or learning R. People are likely to read it through the first time as a text book and then later use it as a reference, especially about the details of the R language. One of the strongest aspects of this book is Dr. Williams’ ability to simplify complex topics and explain them clearly. His descriptions of bagging and boosting are the most clear that I have ever read."
Bob Muenchen, author of R for SAS and SPSS Users, 30 June 2011
From Amazon:
For anyone looking to learn more about R, this would be a great introduction. Brian Tvenstrup (5 reviewers made a similar statement).
This book covers both Rattle, the R code that Rattle creates, and writing some R code from scratch. Robert A. Muenchen (2 reviewers made a similar statement).
In summary, I found the book very readable, the examples easy to follow, and the explanations and reasons for why different processes are done. G3N1U5 (2 reviewers made a similar statement).

Information

Through a simple and logical graphical user interface based on Gnome, Rattle can be used by itself to deliver data mining projects. Rattle also provides an entry into sophisticated data mining using the open source and free statistical language R.
Rattle runs under GNU/Linux, Macintosh OS/X, and MS/Windows. The aim is to provide an intuitive interface that takes you through the basic steps of data mining, as well as illustrating the R code that is used to achieve this. Whilst the tool itself may be sufficient for all of a user's needs, it also provides a stepping stone to more sophisticated processing and modelling in R itself, for sophisticated and unconstrained data mining.
Rattle is in daily use by Australia's largest team of data miners and by a variety of government and commercial enterprises, world wide. A number of international consultants also use Rattle in their daily business. Users include the Australian Taxation Office, Australian Department of Immigration, Ulster Bank, Toyota Australia, US Geological Survey, Carat Media Network, Institute of Infection and Immunity of the University Hospital of Wales, US National Institutes of Health, AIMIA Loyalty Marketing, Added Value, and many more. It is or has been used for teaching by the McMaster University, Australian National University, University of Canberra, University of Technology Sydney, Yale University, University of Southern Queensland, Revolution Analytics, Habin Institute of Technology Graduate School Shenzhen, and Many more.
The author of Rattle received a 2007 Australia Day Medallion, presented by the Commissioner of Taxation, for leadership and mentoring in Data Mining in the Australian Taxation Office and in Australia, and particularly cited the development and sharing of the Rattle system.
Rattle is also used to teach the practise of data mining. It was the primary tool of instruction for a Data Mining Workshop in Canberra, and at Harbin Institute of Technology, Shenzhen Graduate School (2006), and for the Australian National University's course on Data Mining (since 2006), University of Canberra (since 2010) and University of South Australia (since 2009). It has been used in courses at Yale University, University of Liège Belgium (since 2011), University of Wollongong (since 2010), University of Southern Queensland (since 2010), Australian Consortium for Social and Political Research (2011), University of Technology, Sydney (since 2012), Revolution Analytics (since 2012) and many others.
Further data mining resources are also available from Togaware.
Rattle is open source and freely available from Togaware. You can download Rattle and get familiar with its functionality without any obligation, except for the obligation to freely share! Organisations are also welcome to purchase Rattle, including support for installation and initial training, and ongoing data mining support. Email rattle@togaware.com for details.

Citation

If you use Rattle and report on using it please cite it according to citation("rattle"). You might also reference one of the following:
Graham Williams (2011). Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, Springer, Use R!.
or
Graham Williams (2009). Rattle: A Data Mining GUI for R, Graham J Williams, The R Journal, 1(2):45-55.

Discussion Group and Suggestions

The Rattle Users mailing list is hosted by Google Groups. Questions and suggestions can be posted there. You can [visit the discussion archive] or subscribe by supplying your email address below and clicking the Subscribe button.