Wednesday, September 28, 2016

The Fundamental Limits of Machine Learning

http://nautil.us/blog/the-fundamental-limits-of-machine-learning

few months ago, my aunt sent her colleagues an email with the subject, “Math Problem! What is the answer?” It contained a deceptively simple puzzle:

She thought her solution was obvious. Her colleagues, though, were sure their solution was correct—and the two didn’t match. Was the problem with one of their answers, or with the puzzle itself?
My aunt and her colleagues had stumbled across a fundamental problem in machine learning, the study of computers that learn. Almost all of the learning we expect our computers to do—and much of the learning we ourselves do —is about reducing information to underlying patterns, which can then be used to infer the unknown. Her puzzle was no different.
As a human, the challenge is to find any pattern at all. Of course, we have intuitions that limit our guesses. But computers have no such intuitions. From a computer’s standpoint, the difficulty in pattern recognition is one of surplus: with an endless variety of patterns, all technically valid, what makes one “right” and another “wrong?”
The problem only recently became of practical concern. Before the 1990s, AI systems rarely did much learning at all. For example, the chess-playing Deep Thought, predecessor to Deep Blue, didn’t get good at chess by learning from successes and failures. Instead, chess grandmasters and programming wizards carefully crafted rules to teach it which board positions were good or bad. Such extensive hand-tuning was typical of that era’s “expert systems” approach.
To tackle my aunt’s puzzle, the expert systems approach would need a human to squint at the first three rows and spot the following pattern:
1 * (4 + 1) = 5
2 * (5 + 1) = 12
3 * (6 + 1) = 21
The human could then instruct the computer to follow the pattern x * (y + 1) = z. Applying this rule to the final line yields a solution of 96.
Despite expert systems’ early success, the manual labor required to design, tune, and update them became unwieldy. Instead, researchers turned their attention to designing machines that could infer patterns on their own. A program could inspect, say, thousands of photos or market transactions and tease out statistical signals suggesting a face or an impending price spike. This approach quickly came to dominate, and has since powered everything from automated postal sorting to spam filtering to credit card fraud detection.
And yet. With all their successes, these machine learning systems still needed engineers in the loop. Consider again my aunt’s puzzle. We assumed that each line has three relevant components (the three numbers in the line). But there’s a potential fourth element: the result on the previous line. If that attribute of a line—that feature, in machine learning parlance—is in bounds, then another plausible pattern emerges:
0 + 1 + 4 = 5
5 + 2 + 5 = 12
12 + 3 + 6 = 21
By this logic the final answer should be 40.
So which pattern is right? Both, of course—and neither. It all depends on which patterns are allowed. You could also find a pattern by taking the first number times the second number, adding one-fifth of three more than the previous answer, and rounding to the nearest integer. (It’s weird, but it works!) And if we allow features that consider the visual forms of the numbers, perhaps we could come up with some pattern involving strokes and serifs. The pattern matching hinges on the assumptions of the beholder.

Will machines ever learn so well on their own that external guidance becomes a quaint relic?

The same is true in machine learning. Even when machines teach themselves, the preferred patterns are chosen by humans: Should facial recognition software infer explicit if/then rules, or should it treat each feature as an incremental piece of evidence for/against each possible person? And what features should the system attend to? Should it care about individual pixels? Maybe sharp edges between light and dark regions? These choices constrain what patterns the system deems likely or even possible. Finding that perfect combination has become the new job of the machine learning engineer.

**LAYER CAKE:** In neural networks, data is passed from layer to layer of “neurons,” undergoing simple transformations at each step. The intermediate layers can learn to recognize higher-level features that matter to the final output. Michael Nielsen / NeuralNetworksandDeepLearning.com

The process of automation didn’t stop there, though. Just as they had once tired of writing rules, engineers began to begrudge designing these features. “Wouldn’t it be nice if the computer could just figure out the features on its own?” And so they designed deep neural networks, a machine learning technique most notable for its ability to infer higher-level features from more basic information. Feed a neural network a bunch of pixels, and it will learn to consider edges, curves, and even textures, all without being explicitly instructed to.
So were the engineers put out of business by the One Algorithm to Rule Them All?
No, not yet. Neural networks still aren’t a perfect fit for all problems. Even in the best cases, they require quite a bit of tweaking. A neural network consists of layers of “neurons,” each performing a calculation on an input and spitting out the result to the next layer. But how many neurons and how many layers should there be? Should each neuron take input from every neuron in the previous layer, or should some neurons be more selective? What transformation should each neuron apply to its inputs to produce its output? And so on.
These questions have constrained efforts to apply neural networks to new problems; a network that’s great at facial recognition is totally inept at automatic translation. Once again, human-chosen design elements implicitly sway the network toward certain patterns and away from others. To a well-informed human, not all patterns are created equal. Engineers aren’t out of a job just yet.
Of course, the logical next step is neural networks that automatically figure out how many neurons to include, what type of connectivity to use, and so forth. Research projects to explore these topics have been underway for years.
How far can it go? Will machines ever learn so well on their own that external guidance becomes a quaint relic? In theory, you could imagine an ideal Universal Learner—one that can decide everything for itself, and always prefers the best pattern for the task at hand.
But in 1996, computer scientist David Wolpert proved that no such learner exists. In his famous “No Free Lunch” theorems, he showed that for every pattern a learner is good at learning, there’s another pattern that same learner would be terrible at picking up. The reason brings us back to my aunt’s puzzle—to the infinite patterns that can match any finite amount of data. Choosing a learning algorithm just means choosing which patterns a machine will be bad at. Maybe all tasks of, say, visual pattern recognition will eventually fall to a single all-encompassing algorithm. But no learning algorithm can be good at learning everything.
This makes machine learning surprisingly akin to the human brain. As smart as we like to think we are, our brains don’t learn perfectly, either. Each part of the brain has been delicately tuned by evolution to spot particular kinds of patterns, whether in what we see, in the language we hear, or in the way physical objects behave. But when it comes to finding patterns in the stock market, we’re just not that good; the machines have us beat by far.
The history of machine learning suggests any number of patterns. But the likeliest one is this: We’ll be teaching machines to teach themselves for many years to come.

Monday, September 26, 2016

How do Convolutional Neural Networks work?

http://brohrer.github.io/how_convolutional_neural_networks_work.html

Nine times out of ten, when you hear about deep learning breaking a new technological barrier, Convolutional Neural Networks are involved. Also called CNNs or ConvNets, these are the workhorse of the deep neural network field. They have learned to sort images into categories even better than humans in some cases. If there’s one method out there that justifies the hype, it is CNNs.
What’s especially cool about them is that they are easy to understand, at least when you break them down into their basic parts. I’ll walk you through it. There's a video that talks through these images in greater detail. If at any point you get a bit lost, just click on an image and you'll jump to that part of the video.

X's and O's

To help guide our walk through a Convolutional Neural Network, we’ll stick with a very simplified example: determining whether an image is of an X or an O. This example is just rich enough to illustrate the principles behind CNNs, but still simple enough to avoid getting bogged down in non-essential details. Our CNN has one job. Each time we hand it a picture, it has to decide whether it has an X or an O. It assumes there is always one or the other.

A naïve approach to solving this problem is to save an image of an X and an O and compare every new image to our exemplars to see which is the better match. What makes this task tricky is that computers are extremely literal. To a computer, an image looks like a two-dimensional array of pixels (think giant checkerboard) with a number in each position. In our example a pixel value of 1 is white, and -1 is black. When comparing two images, if any pixel values don’t match, then the images don’t match, at least to the computer. Ideally, we would like to be able to see X’s and O’s even if they’re shifted, shrunken, rotated or deformed. This is where CNNs come in.

Features

CNNs compare images piece by piece. The pieces that it looks for are called features. By finding rough feature matches in roughly the same positions in two images, CNNs get a lot better at seeing similarity than whole-image matching schemes.

Each feature is like a mini-image—a small two-dimensional array of values. Features match common aspects of the images. In the case of X images, features consisting of diagonal lines and a crossing capture all the important characteristics of most X’s. These features will probably match up to the arms and center of any image of an X.

Convolution

When presented with a new image, the CNN doesn’t know exactly where these features will match so it tries them everywhere, in every possible position. In calculating the match to a feature across the whole image, we make it a filter. The math we use to do this is called convolution, from which Convolutional Neural Networks take their name.
The math behind convolution is nothing that would make a sixth-grader uncomfortable. To calculate the match of a feature to a patch of the image, simply multiply each pixel in the feature by the value of the corresponding pixel in the image. Then add up the answers and divide by the total number of pixels in the feature. If both pixels are white (a value of 1) then 1 * 1 = 1. If both are black, then (-1) * (-1) = 1. Either way, every matching pixel results in a 1. Similarly, any mismatch is a -1. If all the pixels in a feature match, then adding them up and dividing by the total number of pixels gives a 1. Similarly, if none of the pixels in a feature match the image patch, then the answer is a -1.

To complete our convolution, we repeat this process, lining up the feature with every possible image patch. We can take the answer from each convolution and make a new two-dimensional array from it, based on where in the image each patch is located. This map of matches is also a filtered version of our original image. It’s a map of where in the image the feature is found. Values close to 1 show strong matches, values close to -1 show strong matches for the photographic negative of our feature, and values near zero show no match of any sort.

The next step is to repeat the convolution process in its entirety for each of the other features. The result is a set of filtered images, one for each of our filters. It’s convenient to think of this whole collection of convolutions operations as a single processing step. In CNNs this is referred to as a convolution layer, hinting at the fact that it will soon have other layers added to it.
It’s easy to see how CNNs get their reputation as computation hogs. Although we can sketch our CNN on the back of a napkin, the number of additions, multiplications and divisions can add up fast. In math speak, they scale linearly with the number of pixels in the image, with the number of pixels in each feature and with the number of features. With so many factors, it’s easy to make this problem many millions of times larger without breaking a sweat. Small wonder that microchip manufacturers are now making specialized chips in an effort to keep up with the demands of CNNs.

Pooling

Another power tool that CNNs use is called pooling. Pooling is a way to take large images and shrink them down while preserving the most important information in them. The math behind pooling is second-grade level at most. It consists of stepping a small window across an image and taking the maximum value from the window at each step. In practice, a window 2 or 3 pixels on a side and steps of 2 pixels work well.
After pooling, an image has about a quarter as many pixels as it started with. Because it keeps the maximum value from each window, it preserves the best fits of each feature within the window. This means that it doesn’t care so much exactly where the feature fit as long as it fit somewhere within the window. The result of this is that CNNs can find whether a feature is in an image without worrying about where it is. This helps solve the problem of computers being hyper-literal.

A pooling layer is just the operation of performing pooling on an image or a collection of images. The output will have the same number of images, but they will each have fewer pixels. This is also helpful in managing the computational load. Taking an 8 megapixel image down to a 2 megapixel image makes life a lot easier for everything downstream.

Rectified Linear Units

A small but important player in this process is the Rectified Linear Unit or ReLU. It’s math is also very simple—wherever a negative number occurs, swap it out for a 0. This helps the CNN stay mathematically healthy by keeping learned values from getting stuck near 0 or blowing up toward infinity. It’s the axle grease of CNNs—not particularly glamorous, but without it they don’t get very far.

The output of a ReLU layer is the same size as whatever is put into it, just with all the negative values removed.

Deep learning

You’ve probably noticed that the input to each layer (two-dimensional arrays) looks a lot like the output (two-dimensional arrays). Because of this, we can stack them like Lego bricks. Raw images get filtered, rectified and pooled to create a set of shrunken, feature-filtered images. These can be filtered and shrunken again and again. Each time, the features become larger and more complex, and the images become more compact. This lets lower layers represent simple aspects of the image, such as edges and bright spots. Higher layers can represent increasingly sophisticated aspects of the image, such as shapes and patterns. These tend to be readily recognizable. For instance, in a CNN trained on human faces, the highest layers represent patterns that are clearly face-like.

Fully connected layers

CNNs have one more arrow in their quiver. Fully connected layers take the high-level filtered images and translate them into votes. In our case, we only have to decide between two categories, X and O. Fully connected layers are the primary building block of traditional neural networks. Instead of treating inputs as a two-dimensional array, they are treated as a single list and all treated identically. Every value gets its own vote on whether the current image is an X or and O. However, the process isn’t entirely democratic. Some values are much better than others at knowing when the image is an X, and some are particularly good at knowing when the image is an O. These get larger votes than the others. These votes are expressed as weights, or connection strengths, between each value and each category.
When a new image is presented to the CNN, it percolates through the lower layers until it reaches the fully connected layer at the end. Then an election is held. The answer with the most votes wins and is declared the category of the input.

Fully connected layers, like the rest, can be stacked because their outputs (a list of votes) look a whole lot like their inputs (a list of values). In practice, several fully connected layers are often stacked together, with each intermediate layer voting on phantom “hidden” categories. In effect, each additional layer lets the network learn ever more sophisticated combinations of features that help it make better decisions.

Backpropagation

Our story is filling in nicely, but it still has a huge hole—Where do features come from? and How do we find the weights in our fully connected layers? If these all had to be chosen by hand, CNNs would be a good deal less popular than they are. Luckily, a bit of machine learning magic called backpropagation does this work for us.
To make use of backpropagation, we need a collection of images that we already know the answer for. This means that some patient soul flipped through thousands of images and assigned them a label of X or O. We use these with an untrained CNN, which means that every pixel of every feature and every weight in every fully connected layer is set to a random value. Then we start feeding images through it, one after other.
Each image the CNN processes results in a vote. The amount of wrongness in the vote, the error, tells us how good our features and weights are. The features and weights can then be adjusted to make the error less. Each value is adjusted a little higher and a little lower, and the new error computed each time. Whichever adjustment makes the error less is kept. After doing this for every feature pixel in every convolutional layer and every weight in every fully connected layer, the new weights give an answer that works slightly better for that image. This is then repeated with each subsequent image in the set of labeled images. Quirks that occur in a single image are quickly forgotten, but patterns that occur in lots of images get baked into the features and connection weights. If you have enough labeled images, these values stabilize to a set that works pretty well across a wide variety of cases.
As is probably apparent, backpropagation is another expensive computing step, and another motivator for specialized computing hardware.

Hyperparameters

Unfortunately, not every aspect of CNNs can be learned in so straightforward a manner. There is still a long list of decisions that a CNN designer must make.

For each convolution layer, How many features? How many pixels in each feature?
For each pooling layer, What window size? What stride?
For each extra fully connected layer, How many hidden neurons?

In addition to these there are also higher level architectural decisions to make: How many of each layer to include? In what order? Some deep neural networks can have over a thousand layers, which opens up a lot of possibilities.
With so many combinations and permutations, only a small fraction of the possible CNN configurations have been tested. CNN designs tend to be driven by accumulated community knowledge, with occasional deviations showing surprising jumps in performance. And while we’ve covered the building blocks of vanilla CNNs, there are lots of other tweaks that have been tried and found effective, such as new layer types and more complex ways to connect layers with each other.

Beyond images

While our X and O example involves images, CNNs can be used to categorize other types of data too. The trick is, whatever data type you start with, to transform it to make it look like an image. For instance, audio signals can be chopped into short time chunks, and then each chunk broken up into bass, midrange, treble, or finer frequency bands. This can be represented as a two-dimensional array where each column is a time chunk and each row is a frequency band. “Pixels” in this fake picture that are close together are closely related. CNNs work well on this. Researchers have gotten quite creative. They have adapted text data for natural language processing and even chemical data for drug discovery.

An example of data that doesn’t fit this format is customer data, where each row in a table represents a customer, and each column represents information about them, such as name, address, email, purchases and browsing history. In this case, the location of rows and columns doesn’t really matter. Rows can be rearranged and columns can be re-ordered without losing any of the usefulness of the data. In contrast, rearranging the rows and columns of an image makes it largely useless.
A rule of thumb: If your data is just as useful after swapping any of your columns with each other, then you can’t use Convolutional Neural Networks.
However if you can make your problem look like finding patterns in an image, then CNNs may be exactly what you need.

Learn more

If you'd like to dig deeper into deep learning, check out my Demystifying Deep Learning post. I also recommend the notes from the Stanford CS 231 course by Justin Johnson and Andrej Karpathy that provided inspiration for this post, as well as the writings of Christopher Olah, an exceptionally clear writer on the subject of neural networks.
If you are one who loves to learn by doing, there are a number of popular deep learning tools available. Try them all! And then tell us what you think.

I hope you've enjoyed our walk through the neighborhood of Convolutional Neural Networks. Feel free to start up a conversation.
Brandon
August 18, 2016

Sunday, September 25, 2016

GPU-accelerated Theano & Keras with Windows 10

http://efavdb.com/gpu-accelerated-theano-keras-with-windows-10/

Download list:

Visual Studio Community Edition 2013 Update 4

Di situs tersebut disarankan untuk mendownload dari https://www.visualstudio.com/en-us/news/vs2013-community-vs.aspx.

Namun diredirect untuk download yang versi 2015. Katanya untuk mendapatkan versi 2013 mesti login dulu (pakai live.com), namun setelah login jadinya malah bengong.

Just a moment, namun kok lama sekali

Jadinya cari-cari di google, akhirnya ketemu URL download di sebuah situs (https://xinyustudio.wordpress.com/2014/11/13/visual-studio-community-2013-with-update-4/ ) . Dari artikel tersebut ternyata URL untuk download Visual Studio Community Edition 2013 Update 4 versi ISO adalah http://go.microsoft.com/?linkid=9863609. Ukurannya 6.9 GB.

Download Visual Studio Community 2013 Update 4 (http://go.microsoft.com/?linkid=9863609)

Installer ISO untuk Visual Studio Community rata-rata dapat diambil langsung dari Microsoft, cuma masalahnya mendapatkan URLnya itu yang rada susah berbelit-belit.

CUDA

https://developer.nvidia.com/cuda-downloads

Ukurannya lumayan 962 MB untuk Windows 7

Proses download sederhana, tidak ribet.

GCC

Disarankan untuk menggunakan TDM-gcc 64 bit. No problem, dapat langsung didownload dari http://tdm-gcc.tdragon.net/download
Ukurannya sekitar 45 MB

Python Library

Ada juga saran untuk instalasi Python Binary dari http://efavdb.com/gpu-accelerated-theano-keras-with-windows-10/

scipy-0.18.1-cp27-cp27m-win_amd64.whl (96.24 MB)
numpy-1.11.1+mkl-cp27-cp27m-win_amd64.whl (26.75 MB)

URL untuk plugin tersebut menggunakan javascript yang nampaknya dibangkitkan secara dinamik, jadi harus diklik dari halaman tersebut.

Friday, September 23, 2016

How to Get a Job In Deep Learning

http://blog.deepgram.com/how-to-get-a-job-in-deep-learning/

How to Get a Job In Deep Learning

22 September 2016

If you’re a software engineer (or someone who’s learning the craft), chances are that you’ve heard about deep learning (which we’ll sometimes abbreviate as “DL”). It’s an interesting and rapidly developing field of research that’s now being used in industry to address a wide range of problems, from image classification and handwriting recognition, to machine translation and, infamously, beating the world champion Go player in four games out of five.
A lot of people think you need a PhD or tons of experience to get a job in deep learning, but if you're already a decent engineer, you can pick up the requisite skills and techniques pretty quickly. At least, that's our philosophy. (So even if you're a beginner with deep learning, you're welcome to apply for one of our open positions.)
Important point: You need to have motivation and be able to code and problem solve well. That's about it.
Here at Deepgram we’re using deep learning to tackle the problem of speech search. Basically, we’re teaching machines to listen to and remember the contents of recorded conversation, phone calls, online videos, podcasts, and anything else that has audio of people talking. But listening is just half of it. We’re also teaching machines to recall key words and phrases from these recordings in a similar way to how our brains search for memories of conversation: by the sound of those key words and phrases you type into the search bar. (In case you haven’t already played around with Deepgram yet, we have a little demo to show some of its capabilities.)
Getting involved in deep learning may seem a bit daunting at first, but the good news is that there are more resources out there now than ever before. (There’s also a huge, pent up demand for engineers who know how to implement deep learning in software.) So, if you want to get yourself a job in deep learning but need to get yourself up to speed first, let this be your guide! (If you already know a lot about deep learning and you’re just looking for information about getting a job in the field, skip to the bottom.)

What Is Deep Learning?

In a nutshell, deep learning involves building and training a large artificial neural network with many hidden layers between the input side of the network and the output side. It's because of these many hidden layers that we call this kind of neural network "deep". Deep neural networks have at least three hidden layers, but some neural networks have hundreds.
Deep Neural Network Example – Image Credit: Texample

Deep Neural Network Example – Image Credit: Texample

Neural networks are complex statistical models that allow computers to create a remarkably accurate abstract representation of information. What kind of information, you ask? Like we mentioned, Deepgram's deep neural network is specifically trained to "understand" and act upon spoken word data, but deep neural networks have been used in plenty of other contexts, from detecting cancers in medical scans to forecasting energy prices and modeling the weather.
There are a number of notable players in the deep learning space. On the academic side, the Geoffrey Hinton's lab at University of Toronto, Yann LeCun's group at New York University and Stanford's AI lab are some of the major leaders in deep learning research. On the private side, Google has led the way in applying deep learning to search and computer vision, and Baidu's Chief Scientist, Andrew Ng, is a major contributor to the scientific literature around deep learning on top of being the cofounder of Coursera.
Why is deep learning so accessible today, even for newcomers to the field? There are two primary factors. First, computing hardware is now fast and cheap enough to make deep learning accessible to just about anyone with a decent graphics card in their PC. (In our own testing, we've found that one GPU server is about as fast as 400 CPU cores for running the algorithms we're using.) Second, new open source deep learning platforms like TensorFlow, Theano and Caffe make spinning up your own deep neural network fairly easy, especially when compared to having to build one from scratch.
There's a lot more to deep learning, of course, but that's what this guide is for!

What You Should Already Know Before Diving Into Deep Learning

Speaking of math, you should have some familiarity with calculus, probability and linear algebra. All will help you understand the theory and principles of DL.
Neural network math – Image Credit: Wikimedia

Neural network math – Image Credit: Wikimedia

Obviously, there is also going to be some programming involved. As you can see from this list of deep learning libraries, most of the popular libraries are written in Python and R, so some knowledge of Python or R would also be helpful.
If you need to bone up on your math or programming skills, there are plenty of very high quality resources online to use.
Also, as we mentioned above, having a decent graphics card (or accessing a GPU instance through a cloud computing platform like Amazon Web Services or one of the other hosting providers listed here).

Where To Learn About Deep Learning

Talks and Articles About DL

If you’re brand new to the field and you’re looking for some high-level explanations of the concepts behind deep learning without getting lost in the math and programming aspects, there are some really informative talks out there to familiarize yourself with the concepts and terminology.

The University of Wisconsin has a nice, one-webpage overview of neural networks.
Brandon Rohrer, Microsoft’s principal data scientist, gave a talk that aims to explain and demystify deep learning without using fancy math or computer jargon at the Boston Open Data Science conference. He has the video and slides on this page.
Deep learning pioneer Geoffrey Hinton was the first to demonstrate the use of backpropogation algorithms for training deep neural networks. He now leads some of Google’s AI research efforts when he’s not attending to academic responsibilities at the University of Toronto. He gave a brief but illuminating talk on "How Neural Networks Really Work" that we really like. You can also find a list of his papers on DL “without much math” on his faculty page.
Steve Jurvetson, the founding partner of DFJ, a large Silicon Valley venture capital firm, led a panel discussion at the Stanford Graduate School of Business on the subject. If you’re interested in learning about deep learning from the perspective of some startup founders and engineers implementing DL in industry, check out the video.

If you just want to dive right in and are comfortable with some math, simple code examples, and discussions of applying DL in practice check out Stanford grad Andrej Karpathy’s blog post on "The Unreasonable Effectiveness of Recurrent Neural Networks".

Online Courses

If you’re the type of person who enjoys and gets a lot out of taking online courses, you’re in luck. There are several good courses in deep learning available online.

Andrew Ng’s Stanford course on machine learning is very popular and generally well-reviewed. It’s considered one of the best introductory courses in machine learning and will give you some rigorous preparation for delving into deep learning.
Udacity has a free, ten week introductory course in machine learning that focuses on both theory and real-world applications. Again, it’s a decent preparatory course for those interested in eventually pursuing deep learning.
Caltech’s Yaser S. Abu-Mostafa’s self-paced course, "Learning From Data" is less mathematically dense, but it’s still a very solid survey of machine learning theory and techniques.
Andrej Karpathy’s "CS231n: Convolutional Neural Networks for Visual Recognition" at Stanford is challenging but well-done course in deep neural networks, and the syllabus and detailed course notes are available online.
Geoffrey Hinton’s course on "Neural Networks for Machine Learning" is good, and it’s taught by one of the godfathers of the field.

Books

Maybe online courses aren’t your thing, or maybe you just prefer reading to watching lectures and reviewing slide decks. There are a few good books out there that are worth checking out. We recommend:

Andrew Trask’s Grokking Deep Learning aims to give a really accessible, practical guide to deep learning techniques. If you know some Python and passed algebra in high school, you’re 100% prepared for this book.
Ian Goodfellow, Yoshua Bengio and Aaron Courville’s book, Deep Learning, which will be published by MIT Press. For now, there is an early version of the book available for free online, plus lecture slides and exercises.

Other Learning Resources & Websites

Metacademy is a very cool site with a very, very solid overview of deep learning and tons of links to specific topics in the field.
Denny Britz of the Google Brain team has a pretty comprehensive glossary of deep learning terminology on his website, WildML. He also curates a weekly newsletter that contains links to both technical and non-technical articles about machine learning and deep learning.

Where to Practice Deep Learning

Once you have some of the basics under your belt, you’ll be ready to sink your teeth into some actual data and exercises. Here are a few websites where you can find sample datasets and coding challenges:

Kaggle has a fairly large collection of datasets ranging from SF/Bay Area Pokemon Go spawn points to Y Combinator companies to the giant text corpus that is Hillary Clinton’s leaked emails.
UC Irvine also has a big collection of datasets to train deep neural networks on.

Where to Find People Interested in Deep Learning

Regardless of whether you’re a rank amateur or a PhD at the bleeding edge of deep learning research, it’s always good to connect with the community. Here are some places to meet other people interested in deep learning:

You should see if your city has a machine learning or deep learning group on a site like Meetup.com. Most major cities have something going on.
There are several online communities devoted to deep learning and deriving insights from data:
- Deeplearning.net is one of the major online hubs for deep learning related information. Resources include: a comprehensive reading list, a list of deep learning research labs, and a collection of nifty demos so you can see DL in practice.
- Datatau is kind of like Hacker News, but specifically focused on data and machine learning. The comments sections aren’t very active but there are new links posted regularly.
- There is a machine learning subreddit that’s fairly active. (They also have a very helpful wiki with even more resources.) The deep learning subreddit is a little quieter.
- There’s a surprisingly active Google Plus group devoted to deep learning with over 30,000 members. (Who knew people still used Google Plus?)

Where To Find A Job in Deep Learning

The good news is that basically everyone is hiring people that understand deep learning.
You probably know all the usual places to go looking: AngelList, the monthly "Who’s Hiring" thread on hacker news, the StackOverflow jobs board, and the dozens of general-purpose job search sites.
One of the few jobs boards that specializes in DL positions is found at Deeplearning.net, and there is a more general machine learning jobs board on Kaggle.
These are definitely great points. Most companies looking for DL/ML talent aren't interested in setting up HR hoops for the applicant to jump through.

What To Do When Applying

Companies want to see if you did cool stuff before you applied for the job.
If you didn't then you won't get an interview, but if you did then you have a chance no matter what your background is. Of course, the question of "what is cool stuff?" comes up.
If your only experience is building small projects with only a little bit of success, that probably won't do it (although it might work for larger companies, or companies that need light machine learning performed). But if it is:
"I built a twitter analysis DNN from scratch using Theano and can predict the number of retweets a tweet will get with alright accuracy:

here's the accuracy I achieved
here's a link to my write up
here's a link to github for the code"

That type of thing will get you in the door. Then you can work your magic with coding chops and problem solving skills during the interview. :)
Deepgram is also hiring, so if you’re interested in solving hard problems and building great tools, give us a holler!