Mesin Belajar: July 2019

Tuesday, July 30, 2019

Kaggle Micro-Courses

https://www.kaggle.com/learn/overview

Faster Data Science Education

Practical data skills you can apply immediately: that's what you'll learn in these free micro-courses.

They're the fastest (and most fun) way to become a data scientist or improve your current skills.

Math Basics for Computer Science and Machine Learning from UPenn

http://www.cis.upenn.edu/~jean/math-basics.pdf

Friday, July 5, 2019

Building the AI-Powered Organization

https://hbr.org/2019/07/building-the-ai-powered-organization

Artificial intelligence is reshaping business—though not at the blistering pace many assume. True, AI is now guiding decisions on everything from crop harvests to bank loans, and once pie-in-the-sky prospects such as totally automated customer service are on the horizon. The technologies that enable AI, like development platforms and vast processing power and data storage, are advancing rapidly and becoming increasingly affordable. The time seems ripe for companies to capitalize on AI. Indeed, we estimate that AI will add $13 trillion to the global economy over the next decade.

Yet, despite the promise of AI, many organizations’ efforts with it are falling short. We’ve surveyed thousands of executives about how their companies use and organize for AI and advanced analytics, and our data shows that only 8% of firms engage in core practices that support widespread adoption. Most firms have run only ad hoc pilots or are applying AI in just a single business process.

Why the slow progress? At the highest level, it’s a reflection of a failure to rewire the organization. In our surveys and our work with hundreds of clients, we’ve seen that AI initiatives face formidable cultural and organizational barriers. But we’ve also seen that leaders who at the outset take steps to break down those barriers can effectively capture AI’s opportunities.
Making the Shift

One of the biggest mistakes leaders make is to view AI as a plug-and-play technology with immediate returns. Deciding to get a few projects up and running, they begin investing millions in data infrastructure, AI software tools, data expertise, and model development. Some of the pilots manage to eke out small gains in pockets of organizations. But then months or years pass without bringing the big wins executives expected. Firms struggle to move from the pilots to companywide programs—and from a focus on discrete business problems, such as improved customer segmentation, to big business challenges, like optimizing the entire customer journey.

Leaders also often think too narrowly about AI requirements. While cutting-edge technology and talent are certainly needed, it’s equally important to align a company’s culture, structure, and ways of working to support broad AI adoption. But at most businesses that aren’t born digital, traditional mindsets and ways of working run counter to those needed for AI.

To scale up AI, companies must make three shifts:
From siloed work to interdisciplinary collaboration.

AI has the biggest impact when it’s developed by cross-functional teams with a mix of skills and perspectives. Having business and operational people work side by side with analytics experts will ensure that initiatives address broad organizational priorities, not just isolated business issues. Diverse teams can also think through the operational changes new applications may require—they’re likelier to recognize, say, that the introduction of an algorithm that predicts maintenance needs should be accompanied by an overhaul of maintenance workflows. And when development teams involve end users in the design of applications, the chances of adoption increase dramatically.
From experience-based, leader-driven decision making to data-driven decision making at the front line.

When AI is adopted broadly, employees up and down the hierarchy will augment their own judgment and intuition with algorithms’ recommendations to arrive at better answers than either humans or machines could reach on their own. But for this approach to work, people at all levels have to trust the algorithms’ suggestions and feel empowered to make decisions—and that means abandoning the traditional top-down approach. If employees have to consult a higher-up before taking action, that will inhibit the use of AI.

Leonardo Ulian

Decision processes shifted dramatically at one organization when it replaced a complex manual method for scheduling events with a new AI system. Historically, the firm’s event planners had used colored tags, pins, and stickers to track conflicts, participants’ preferences, and other considerations. They’d often relied on gut instinct and on input from senior managers, who also were operating on their instincts, to make decisions. The new system rapidly analyzed the vast range of scheduling permutations, using first one algorithm to distill hundreds of millions of options into millions of scenarios, and then another algorithm to boil down those millions into just hundreds, ranking the optimal schedules for each participant. Experienced human planners then applied their expertise to make final decisions supported by the data, without the need to get input from their leaders. The planners adopted the tool readily, trusting its output because they’d helped set its parameters and constraints and knew that they themselves would make the final call.
From rigid and risk-averse to agile, experimental, and adaptable.

Organizations must shed the mindset that an idea needs to be fully baked or a business tool must have every bell and whistle before it’s deployed. On the first iteration, AI applications rarely have all their desired functionality. A test-and-learn mentality will reframe mistakes as a source of discoveries, reducing the fear of failure. Getting early user feedback and incorporating it into the next version will allow firms to correct minor issues before they become costly problems. Development will speed up, enabling small AI teams to create minimum viable products in a matter of weeks rather than months.
Such fundamental shifts don’t come easily. They require leaders to prepare, motivate, and equip the workforce to make a change. But leaders must first be prepared themselves. We’ve seen failure after failure caused by the lack of a foundational understanding of AI among senior executives. (Further on, we’ll discuss how analytics academies can help leaders acquire that understanding.)

Setting Up for Success

To get employees on board and smooth the way for successful AI launches, leaders should devote early attention to several tasks:
Explaining why.

A compelling story helps organizations understand the urgency of change initiatives and how all will benefit from them. This is particularly critical with AI projects, because fear that AI will take away jobs increases employees’ resistance to it.

Leaders have to provide a vision that rallies everyone around a common goal. Workers must understand why AI is important to the business and how they’ll fit into a new, AI-oriented culture. In particular, they need reassurance that AI will enhance rather than diminish or even eliminate their roles. (Our research shows that the majority of workers will need to adapt to using AI rather than be replaced by AI.)

At most firms that aren’t born digital, mindsets run counter to those needed for AI.

When a large retail conglomerate wanted to get its employees behind its AI strategy, management presented it as an existential imperative. Leaders described the threat that digital retailers posed and how AI could help fend it off by improving the firm’s operational efficiency and responsiveness. By issuing a call to arms in a fight for survival, management underscored the critical role that employees had to play.

In sharing their vision, the company’s leaders put a spotlight on workers who had piloted a new AI tool that helped them optimize stores’ product assortments and increase revenue. That inspired other workers to imagine how AI could augment and elevate their performance.
Anticipating unique barriers to change.

Some obstacles, such as workers’ fear of becoming obsolete, are common across organizations. But a company’s culture may also have distinctive characteristics that contribute to resistance. For example, if a company has relationship managers who pride themselves on being attuned to customer needs, they may reject the notion that a machine could have better ideas about what customers want and ignore an AI tool’s tailored product recommendations. And managers in large organizations who believe their status is based on the number of people they oversee might object to the decentralized decision making or reduction in reports that AI could allow.

In other cases, siloed processes can inhibit the broad adoption of AI. Organizations that assign budgets by function or business unit may struggle to assemble interdisciplinary agile teams, for example.

Some solutions can be found by reviewing how past change initiatives overcame barriers. Others may involve aligning AI initiatives with the very cultural values that seem like obstacles. At one financial institution with a strong emphasis on relationship banking, for example, leaders highlighted AI’s ability to enhance ties with customers. The bank created a booklet for relationship managers that showed how combining their expertise and skills with AI’s tailored product recommendations could improve customers’ experiences and increase revenue and profit. The AI adoption program also included a contest for sales conversions driven by using the new tool; the winners’ achievements were showcased in the CEO’s monthly newsletter to employees.

Leonardo Ulian

A relatively new class of expert, analytics translators, can play a role in identifying roadblocks. These people bridge the data engineers and scientists from the technical realm with the people from the business realm—marketing, supply chain, manufacturing, risk personnel, and so on. Translators help ensure that the AI applications developed address business needs and that adoption goes smoothly. Early in the implementation process, they may survey end users, observe their habits, and study workflows to diagnose and fix problems.

Understanding the barriers to change can not only inform leaders about how to communicate with the workforce but also help them determine where to invest, what AI initiatives are most feasible, what training should be offered, what incentives may be necessary, and more.
Budgeting as much for integration and adoption as for technology (if not more).

In one of our surveys nearly 90% of the companies that had engaged in successful scaling practices had spent more than half of their analytics budgets on activities that drove adoption, such as workflow redesign, communication, and training. Only 23% of the remaining companies had committed similar resources.

Relationship managers may reject the notion that a machine knows what customers want.
Consider one telecom provider that was launching a new AI-driven customer-retention program in its call center. The company invested simultaneously in AI model development and in helping the center’s employees transition to the new approach. Instead of just reacting to calls canceling service, they would proactively reach out to customers at risk of defection, giving them AI-generated recommendations on new offers they’d be likely to accept. The employees got training and on-the-job coaching in the sales skills needed to close the business. Coaches and managers listened in on their calls, gave them individualized feedback, and continually updated the training materials and call scripts. Thanks to those coordinated efforts, the new program reduced customer attrition by 10%.

Balancing feasibility, time investment, and value.

Pursuing initiatives that are unduly difficult to implement or require more than a year to launch can sabotage both current and future AI projects.

Organizations needn’t focus solely on quick wins; they should develop a portfolio of initiatives with different time horizons. Automated processes that don’t need human intervention, such as AI-assisted fraud detection, can deliver a return in months, while projects that require human involvement, such as AI-supported customer service, are likely to pay off over a longer period. Prioritization should be based on a long-term (typically three-year) view and take into consideration how several initiatives with different time lines could be combined to maximize value. For example, to achieve a view of customers detailed enough to allow AI to do microsegmentation, a company might need to set up a number of sales and marketing initiatives. Some, such as targeted offers, might deliver value in a few months, while it might take 12 to 18 months for the entire suite of capabilities to achieve full impact.

An Asian Pacific retailer determined that an AI initiative to optimize floor space and inventory placement wouldn’t yield its complete value unless the company refurbished all its stores, reallocating the space for each category of goods. After much debate, the firm’s executives decided the project was important enough to future profitability to proceed—but not without splitting it in two. Part one produced an AI tool that gave store managers recommendations for a few incremental items that would sell well in their outlets. The tool provided only a small fraction of the total return anticipated, but the managers could get the new items into stores immediately, demonstrating the project’s benefits and building enthusiasm for the multiyear journey ahead.
Organizing for Scale

There’s a lot of debate about where AI and analytics capabilities should reside within organizations. Often leaders simply ask, “What organizational model works best?” and then, after hearing what succeeded at other companies, do one of three things: consolidate the majority of AI and analytics capabilities within a central “hub”; decentralize them and embed them mostly in the business units (“the spokes”); or distribute them across both, using a hybrid (“hub-and-spoke”) model. We’ve found that none of these models is always better than the others at getting AI up to scale; the right choice depends on a firm’s individual situation.

Companies with good scaling practices spent half their analytics budgets on adoption.

Consider two large financial institutions we’ve worked with. One consolidated its AI and analytics teams in a central hub, with all analytics staff reporting to the chief data and analytics officer and being deployed to business units as needed. The second decentralized nearly all its analytics talent, having teams reside in and report to the business units. Both firms developed AI on a scale at the top of their industry; the second organization grew from 30 to 200 profitable AI initiatives in just two years. And both selected their model after taking into account their organizations’ structure, capabilities, strategy, and unique characteristics.
The hub.

A small handful of responsibilities are always best handled by a hub and led by the chief analytics or chief data officer. These include data governance, AI recruiting and training strategy, and work with third-party providers of data and AI services and software. Hubs should nurture AI talent, create communities where AI experts can share best practices, and lay out processes for AI development across the organization. Our research shows that companies that have implemented AI on a large scale are three times as likely as their peers to have a hub and 2.5 times as likely to have a clear methodology for creating models, interpreting insights, and deploying new AI capabilities.

Hubs should also be responsible for systems and standards related to AI. These should be driven by the needs of a firm’s initiatives, which means they should be developed gradually, rather than set up in one fell swoop, before business cases have been determined. We’ve seen many organizations squander significant time and money—spending hundreds of millions of dollars—up front on companywide data-cleaning and data-integration projects, only to abort those efforts midway, realizing little or no benefits.

In contrast, when a European bank found that conflicting data-management strategies were hindering its development of new AI tools, it took a slower approach, making a plan to unify its data architecture and management over the next four years as it built various business cases for its AI transformation. This multiphase program, which also includes an organizational redesign and a revised talent strategy, is expected to have an annual impact of more than $900 million.
The spokes.

Another handful of responsibilities should almost always be owned by the spokes, because they’re closest to those who will be using the AI systems. Among them are tasks related to adoption, including end-user training, workflow redesign, incentive programs, performance management, and impact tracking.

To encourage customers to embrace the AI-enabled services offered with its smart, connected equipment, one manufacturer’s sales and service organization created a “SWAT team” that supported customers using the product and developed a pricing plan to boost adoption. Such work is clearly the bailiwick of a spoke and can’t be delegated to an analytics hub.

Organizing AI for Scale

AI-enabled companies divide key roles between a hub and spokes. A few tasks are always owned by the hub, and the spokes always own execution. The rest of the work falls into a gray area, and a firm’s individual characteristics determine where it should be done.

The gray area.

Much of the work in successful AI transformations falls into a gray area in terms of responsibility. Key tasks—setting the direction for AI projects, analyzing the problems they’ll solve, building the algorithms, designing the tools, testing them with end users, managing the change, and creating the supporting IT infrastructure—can be owned by either the hub or the spoke, shared by both, or shared with IT. Deciding where responsibility should lie within an organization is not an exact science, but it should be influenced by three factors:

The maturity of AI capabilities. When a company is early in its AI journey, it often makes sense for analytics executives, data scientists, data engineers, user interface designers, visualization specialists who graphically interpret analytics findings, and the like to sit within a hub and be deployed as needed to the spokes. Working together, these players can establish the company’s core AI assets and capabilities, such as common analytics tools, data processes, and delivery methodologies. But as time passes and processes become standardized, these experts can reside within the spokes just as (or more) effectively.

Business model complexity. The greater the number of business functions, lines of business, or geographies AI tools will support, the greater the need to build guilds of AI experts (of, say, data scientists or designers). Companies with complex businesses often consolidate these guilds in the hub and then assign them out as needed to business units, functions, or geographies.

The pace and level of technical innovation required. When they need to innovate rapidly, some companies put more gray-area strategy and capability building in the hub, so they can monitor industry and technology changes better and quickly deploy AI resources to head off competitive challenges.

Let’s return to the two financial institutions we discussed earlier. Both faced competitive pressures that required rapid innovation. However, their analytics maturity and business complexity differed.

The institution that placed its analytics teams within its hub had a much more complex business model and relatively low AI maturity. Its existing AI expertise was primarily in risk management. By concentrating its data scientists, engineers, and many other gray-area experts within the hub, the company ensured that all business units and functions could rapidly access essential know-how when needed.

The second financial institution had a much simpler business model that involved specializing in fewer financial services. This bank also had substantial AI experience and expertise. So it was able to decentralize its AI talent, embedding many of its gray-area analytics, strategy, and technology experts within the business-unit spokes.

As these examples suggest, some art is involved in deciding where responsibilities should live. Every organization has distinctive capabilities and competitive pressures, and the three key factors must be considered in totality, rather than individually. For example, an organization might have high business complexity and need very rapid innovation (suggesting it should shift more responsibilities to the hub) but also have very mature AI capabilities (suggesting it should move them to the spokes). Its leaders would have to weigh the relative importance of all three factors to determine where, on balance, talent would most effectively be deployed. Talent levels (an element of AI maturity) often have an outsize influence on the decision. Does the organization have enough data experts that, if it moved them permanently to the spokes, it could still fill the needs of all business units, functions, and geographies? If not, it would probably be better to house them in the hub and share them throughout the organization.
Oversight and execution.

While the distribution of AI and analytics responsibilities varies from one organization to the next, those that scale up AI have two things in common:

A governing coalition of business, IT, and analytics leaders. Fully integrating AI is a long journey. Creating a joint task force to oversee it will ensure that the three functions collaborate and share accountability, regardless of how roles and responsibilities are divided. This group, which is often convened by the chief analytics officer, can also be instrumental in building momentum for AI initiatives, especially early on.

Assignment-based execution teams. Organizations that scale up AI are twice as likely to set up interdisciplinary teams within the spokes. Such teams bring a diversity of perspectives together and solicit input from frontline staff as they build, deploy, and monitor new AI capabilities. The teams are usually assembled at the outset of each initiative and draw skills from both the hub and the spokes. Each generally includes the manager in charge of the new AI tool’s success (the “product owner”), translators, data architects, engineers and scientists, designers, visualization specialists, and business analysts. These teams address implementation issues early and extract value faster.

Some art is involved in deciding where AI responsibilities and roles should live.

For example, at the Asian Pacific retailer that was using AI to optimize store space and inventory placement, an interdisciplinary execution team helped break down walls between merchandisers (who determined how items would be displayed in stores) and buyers (who chose the range of products). Previously, each group had worked independently, with the buyers altering the AI recommendations as they saw fit. That led to a mismatch between inventory purchased and space available. By inviting both groups to collaborate on the further development of the AI tool, the team created a more effective model that provided a range of weighted options to the buyers, who could then choose the best ones with input from the merchandisers. At the end of the process, gross margins on each product category that had applied the tool increased by 4% to 7%.
Educating Everyone

To ensure the adoption of AI, companies need to educate everyone, from the top leaders down. To this end some are launching internal AI academies, which typically incorporate classroom work (online or in person), workshops, on-the-job training, and even site visits to experienced industry peers. Most academies initially hire external faculty to write the curricula and deliver training, but they also usually put in place processes to build in-house capabilities.

Every academy is different, but most offer four broad types of instruction:
Leadership.

Most academies strive to give senior executives and business-unit leaders a high-level understanding of how AI works and ways to identify and prioritize AI opportunities. They also provide discussions of the impact on workers’ roles, barriers to adoption, and talent development, and offer guidance on instilling the underlying cultural changes required.
Analytics.

Here the focus is on constantly sharpening the hard and soft skills of data scientists, engineers, architects, and other employees who are responsible for data analytics, data governance, and building the AI solutions.
Translator.

Analytics translators often come from the business staff and need fundamental technical training—for instance, in how to apply analytical approaches to business problems and develop AI use cases. Their instruction may include online tutorials, hands-on experience shadowing veteran translators, and a final “exam” in which they must successfully implement an AI initiative.
10 Ways to Derail an AI Program
Despite big investments, many organizations get disappointing results from their AI and analytics efforts. What makes programs go off track? Companies set themselves up to fail when:

    They lack a clear understanding of advanced analytics, staffing up with data scientists, engineers, and other key players without realizing how advanced and traditional analytics differ.
    They don’t assess feasibility, business value, and time horizons, and launch pilots without thinking through how to balance short-term wins in the first year with longer-term payoffs.
    They have no strategy beyond a few use cases, tackling AI in an ad hoc way without considering the big-picture opportunities and threats AI presents in their industry.
    They don’t clearly define key roles, because they don’t understand the tapestry of skill sets and tasks that a strong AI program requires.
    They lack “translators,” or experts who can bridge the business and analytics realms by identifying high-value use cases, communicating business needs to tech experts, and generating buy-in with business users.
    They isolate analytics from the business, rigidly centralizing it or locking it in poorly coordinated silos, rather than organizing it in ways that allow analytics and business experts to work closely together.
    They squander time and money on enterprisewide data cleaning instead of aligning data consolidation and cleanup with their most valuable use cases.
    They fully build out analytics platforms before identifying business cases, setting up architectures like data lakes without knowing what they’ll be needed for and often integrating platforms with legacy systems unnecessarily.
    They neglect to quantify analytics’ bottom-line impact, lacking a performance management framework with clear metrics for tracking each initiative.
    They fail to focus on ethical, social, and regulatory implications, leaving themselves vulnerable to potential missteps when it comes to data acquisition and use, algorithmic bias, and other risks, and exposing themselves to social and legal consequences.

For more details, read “Ten Red Flags Signaling Your Analytics Program Will Fail” on McKinsey.com.
Read more
End user.

Frontline workers may need only a general introduction to new AI tools, followed by on-the-job training and coaching in how to use them. Strategic decision makers, such as marketers and finance staff, may require higher-level training sessions that incorporate real business scenarios in which new tools improve decisions about, say, product launches.
Reinforcing the Change

Most AI transformations take 18 to 36 months to complete, with some taking as long as five years. To prevent them from losing momentum, leaders need to do four things:
Walk the talk.

Role modeling is essential. For starters, leaders can demonstrate their commitment to AI by attending academy training.

But they also must actively encourage new ways of working. AI requires experimentation, and often early iterations don’t work out as planned. When that happens, leaders should highlight what was learned from the pilots. That will help encourage appropriate risk taking.

The most effective role models we’ve seen are humble. They ask questions and reinforce the value of diverse perspectives. They regularly meet with staff to discuss the data, asking questions such as “How often are we right?” and “What data do we have to support today’s decision?”

The CEO of one specialty retailer we know is a good example. At every meeting she goes to, she invites attendees to share their experience and opinions—and offers hers last. She also makes time to meet with business and analytics employees every few weeks to see what they’ve done—whether it’s launching a new pilot or scaling up an existing one.
Make businesses accountable.

It’s not uncommon to see analytics staff made the owners of AI products. However, because analytics are simply a means of solving business problems, it’s the business units that must lead projects and be responsible for their success. Ownership ought to be assigned to someone from the relevant business, who should map out roles and guide a project from start to finish. Sometimes organizations assign different owners at different points in the development life cycle (for instance, for proof of value, deployment, and scaling). That’s a mistake too, because it can result in loose ends or missed opportunities.

A scorecard that captures project performance metrics for all stakeholders is an excellent way to align the goals of analytics and business teams. One airline company, for instance, used a shared scorecard to measure rate of adoption, speed to full capability, and business outcomes for an AI solution that optimized pricing and booking.
Track and facilitate adoption.

Comparing the results of decisions made with and without AI can encourage employees to use it. For example, at one commodity company, traders learned that their non-AI-supported forecasts were typically right only half the time—no better than guessing. That discovery made them more open to AI tools for improved forecasting.

The business units must lead AI projects and be responsible for their success.

Teams that monitor implementation can correct course as needed. At one North American retailer, an AI project owner saw store managers struggling to incorporate a pilot’s output into their tracking of store performance results. The AI’s user interface was difficult to navigate, and the AI insights generated weren’t integrated into the dashboards the managers relied on every day to make decisions. To fix the issue, the AI team simplified the interface and reconfigured the output so that the new data stream appeared in the dashboard.
Provide incentives for change.

Acknowledgment inspires employees for the long haul. The CEO of the specialty retailer starts meetings by shining a spotlight on an employee (such as a product manager, a data scientist, or a frontline worker) who has helped make the company’s AI program a success. At the large retail conglomerate, the CEO created new roles for top performers who participated in the AI transformation. For instance, he promoted the category manager who helped test the optimization solution during its pilot to lead its rollout across stores—visibly demonstrating the career impact that embracing AI could have.

Finally, firms have to check that employees’ incentives are truly aligned with AI use. This was not the case at a brick-and-mortar retailer that had developed an AI model to optimize discount pricing so that it could clear out old stock. The model revealed that sometimes it was more profitable to dispose of old stock than to sell it at a discount, but the store personnel had incentives to sell everything, even at steep discounts. Because the AI recommendations contradicted their standard, rewarded practice, employees became suspicious of the tool and ignored it. Since their sales incentives were also closely tied to contracts and couldn’t easily be changed, the organization ultimately updated the AI model to recognize the trade-off between profits and the incentives, which helped drive user adoption and lifted the bottom line.
CONCLUSION

The actions that promote scale in AI create a virtuous circle. The move from functional to interdisciplinary teams initially brings together the diverse skills and perspectives and the user input needed to build effective tools. In time, workers across the organization absorb new collaborative practices. As they work more closely with colleagues in other functions and geographies, employees begin to think bigger—they move from trying to solve discrete problems to completely reimagining business and operating models. The speed of innovation picks up as the rest of the organization begins to adopt the test-and-learn approaches that successfully propelled the pilots.

As AI tools spread throughout the organization, those closest to the action become increasingly able to make decisions once made by those above them, flattening organizational hierarchies. That encourages further collaboration and even bigger thinking.

The ways AI can be used to augment decision making keep expanding. New applications will create fundamental and sometimes difficult changes in workflows, roles, and culture, which leaders will need to shepherd their organizations through carefully. Companies that excel at implementing AI throughout the organization will find themselves at a great advantage in a world where humans and machines working together outperform either humans or machines working on their own.

Top-11 Artificial Intelligence Startups in Indonesia

https://www.nanalyze.com/2019/01/artificial-intelligence-indonesia/

For a country with 17,508 islands, it’s quite surprising to think that only three other countries in the world have more people than Indonesia does. It’s a country with a rich history, a growing population, and four unicorns grazing its lush island pastures. That’s right, of the world’s 311 unicorns (that number is accurate according to CB Insights as of today), four of them have roots in Indonesia. Late last year, we sent one of our MBAs over to spend a few weeks in the capital city, Jakarta, looking at the Indonesian tech scene. That research spawned an article on GO-JEK, a simply fascinating company. It also led to lots of research around how Indonesia is competing in the global artificial intelligence (AI) race.
At first, we were thinking about naming this article “all the AI startups we could find in Indonesia,” but then we’d get dozens of emails for the rest of the year about all the hidden gems we “missed.” Instead, we sat down and did some Crunchbase searches, combed through company websites, did some asking around, talked to some of the local startup founders, and as a result, we have below what is our best estimation of the top AI startups in Indonesia today. If you are one of the below startups, feel free to celebrate your acceptance to this top-11 list by emailing this article to every single person you know.

Name	Application	City	Funding (USD millions)
Snapcart	Smart receipts	Jakarta	14.7
Kata.ai	Conversational AI	Jakarta	3.5
BJtech	Conversational AI	Jakarta	1.2
Sonar Platform	Social Media Monitoring	Jakarta	.15
Nodeflux	Computing Platform	Jakarta	N/A
Bahasa.ai	Conversational AI	Jakarta	N/A
Prosai.ai	Conversational AI	Jakarta	N/A
Dattabot	General Big Data	Jakarta	N/A
Eureka.ai	Telcom Big Data	Jakarta	N/A
AiSensum	Robotic Process Automation	Jakarta	N/A
Deligence.ai	General AI	Jakarta	N/A

The above startups should be proud of what they’ve accomplished because each of them stood out, in some way, against the total number of companies our foreign correspondent pored over while relaxing in some of North Jakarta’s finest health spas. Let’s take a closer look at each of these startups.

If the name Snapcart rings a bell, it could be because you read about them in our article last month on Smart Receipts and Why We Should Use Them. Founded in 2015, Indonesian startup Snapcart has taken in $14.7 million in funding so far to create a mobile application that gives shoppers cashback for scanning their receipts. This allows the company to collect massive amounts of purchase data, then analyze it and offer real-time insights to big names like Johnson & Johnson, Unilever, P&G, and Nestle. Snapcart currently operates in Indonesia, Philippines, Singapore, and Brazil. (Sounds like something GO-JEK might be interested in getting their hands on.) With high retention and engagement rates, Snapcart is also able to send targeted surveys to customers asking them relevant questions at the right time.

A survey on face wash – Source: Snapcart

The system can also capture transactions from independent chains where existing solutions do not capture, and to-date they’ve processed over a half a billion receipts.

Founded in 2015, Jakarta startup Kata.ai has taken in $3.5 million to build Indonesia’s number one conversational AI platform. A case study they published talks about the success Unilever had when deploying a chatbot to engage with customers. The female chatbot persona was named Jemma, and was deployed on Line messenger, one of Indonesia’s most popular messaging apps. Less than a year after its deployment, Jemma managed to acquire 1.5 million friends, with more than 50 million incoming messages in 17 million sessions. “Some of them even tried to confide their dreams and problems to her,” said the case study, and the longest conversation recorded exceeded four hours.
Another case study discusses a chatbot deployment by Telkomsel, Indonesia’s largest cellular operator with more than 120 million subscribers (that’s almost half of Indonesia’s population). Turns out 96% of customer inquiries can actually be handled by the chatbot with minimal human interaction. In order to scale more quickly, the company built a very slick platform that makes it easy for anyone to build a bot.

A tool for building chatbots – Source: Kata.ai

We talked with Kata.ai’s CEO and Co-Founder, Irzan Raditya, about why conversational AI is so popular in Indonesia. He said it’s largely because the big tech players are behind the game when it comes to Natural Language Processing (NLP) for Bahasa Indonesia (that’s the language they speak in most of Indonesia). It’s not an easy task when you’re trying to understand a language that has 13 different ways to say “I.” When companies like Accenture partner up with a “small” firm like Kata.ai to bid on projects, it helps demonstrate that they’re best-of-breed.

Moving on to our second conversational AI startup that speaks Bahasa Indonesia, we have BJtech. Founded in 2015, the company has taken in $1.5 million in funding so far to develop an easy-to-use platform that helps you create chatbots for your business. Their first product is a virtual friend that does things for you and expects nothing in return, and an intelligent banking app. Clients include Uber, Skyscanner, and Zomato, though we have no idea what Uber is doing speaking the Indonesian language after GO-JEK showed them the door. There’s a fair amount of Engrish on their website, so they may want to sort that out because that’s not the best look for a language processing company.
Founded in 2015, Sonar Platform has taken in just $150,000 in funding to develop a social media monitoring platform that – you guessed it – speaks Bahasa Indonesia. As an example,

Unilever Indonesia certainly doesn’t want some loudmouth influencer bad-mouthing their latest skin-whitening product, and in order to see what people are saying about their products, they might use a platform like this one. The platform allows you to monitor social media in real-time, and they process over 1 million conversations a day, all of which can be mined later for insights. Their platform can gauge sentiment as well, and Air Asia uses it to monitor how pissed off people get when their flights are delayed.

Moving away from the Bahasa Indonesia theme for a moment, we have a startup called Nodeflux that was founded in 2016 with an undisclosed amount of funding which they’re using to develop Indonesia’s first intelligent video analytics platform. Backed by Telkom Indonesia, they’ve also partnered with NVIDIA to offer video analytics services to companies like GO-JEK which uses their service to monitor CCTV cameras on the streets of Jakarta to track where the 1 million plus fleet of GO-JEK scooters is at during any given time.

They also offer services like facial recognition, license plate reading, flood monitoring, and trash detection.

And we’re back, on to more conversational AI for Bahasa Indonesia with the aptly named Bahasa.ai, a startup that was founded in 2017 and which has taken in an undisclosed amount of funding to “build the most robust NLP modules for Bahasa Indonesia.” Based on the AI research focus we observed at Kata.ai, they have their work cut out for them. Since our own Bahasa skills are lacking, and they haven’t translated their website (can’t they get some of their algos to do it?), that’s about all we can tell you about Bahasa.ia. Oh, and one of their competitors vouched for their capabilities which was awful nice of them. In other words, they’re not just a company that creates chatbot scripts and says they use AI when they actually don’t. (We’re told there are some of those out there in Jakarta but we’re not naming names.)

Our next company we know little about because they’re so new. Founded in 2018, Prosa.ai was founded by Indonesian experts in AI for NLP in text and speech. They already have subscription pricing on their website, so we can only assume that they have developed a product. We saw that they’re backed by a notable Indonesian venture capitalist, so we can also assume that someone vetted their business model against the plethora of NLP startups that are already tackling this problem.

Founded in 2003, Indonesian startup Dattabot – formerly known as Mediatrac- is big data analytics company with an undisclosed amount of funding that has assembled the most comprehensive data library in Indonesia. We sat down with the founders, Regi Wahyu and Imron Zuhri, who told us how they started out scanning Indonesia’s dark world of data, largely offline and in printed form. In 2010, they began scaling their data offering and in 2015, pivoted to become the company they are today that targets a number of industry verticals.

Dattabot’s core technology – Source: Dattabot

Their first project involved a large FMCG company with three databases of data and no desire to spend money on building a data warehouse. Dattabot used some clever AI algorithms to solve that problem, and revenues soared as they optimized various aspects of the operation like the “traveling salesman problem” we discussed before. Then came one of Indonesia’s largest telcom providers with a big problem. More than 90% of accounts were prepaid. How can you know the customer? Dattabot used AI to solve that problem too. That’s when they realized that an even bigger opportunity could be found in Indonesian farming, an industry that consists of 49 million farmers that represent 41% of the country’s total labor force. Their subsidiary Hara.ag was then born, and the story behind it is so interesting we’re going to dedicate an entire article to it. Stay tuned.

We actually don’t know when our next company was founded, or how much funding they’ve taken in, but we do know their PR company is asleep at the wheel because they never responded to our email asking for more info. That’s okay though, because when you’re busy kicking a33 and taking names, who needs PR anyways? The man at the helm is Benjamin Soemartopo, previously with McKinsey & Company for 12 years as Managing Partner and CEO for Indonesia and before that, Managing Director for Standard Chartered’s Bank Private Equity in Indonesia for six years. The company enables partnerships between mobile operators and companies in industries including banking, insurance, transportation, and consumer goods with a global presence:

That’s the who/what/where, and about all we can tell you for now.

Our second to the last startup was somewhat difficult to understand until the company emailed us to clear things up. Their main source of revenue is data monetization partnerships through their platform called Octopi, a machine learning driven SaaS dashboard that creates business intelligence insights. The firm also offers Robotic Process Automation (RPA) that they describe as “low cost bets for companies who are unwilling or unable to invest in fully automated AI platforms.” They also let us know that they didn’t appreciate us making fun of their octopus, something we blamed on our ethnocentric tendencies to make fun of things we don’t understand – like this diagram.

If what you do is tough to explain, try using a cephalopod to make things more clearer – Source: AiSensum

Joking aside, they’re enthusiastic about what they’re doing so we may go visit them when we’re back in Jakarta. They also have a sister company called Neurosensum which uses AI for consumer research and which may have some toys we can play with.

Last but not least is a startup called Deligence.ai. We know almost nothing about them because they’ve been so busy doing AI stuff that they haven’t even created a profile on Crunchbase. The only reason they made this top-11 list is because a founder we talked to vouched for them. (See how important networking is kids?) According to the website, they provide “organizations the most optimal access to the cutting-edge computer vision, machine learning, and big data technology.” We’ve also reached our word limit on this article so time for a conclusion.

Conclusion

Forgetting about AI for a minute, we were simply floored by the opportunity that we saw in the world’s fourth largest country, the talented and passionate people we spoke to who could see the opportunity, the astounding success of startups like GO-JEK, and conversely, how isolated and relatively untapped the tech scene seemed. (We’re trying desperately to find emerging technology startups of any kind in the country’s second largest city, Surabaya, and have come up empty handed so far.) In the future, we’re going to take a closer look at what sort of investment opportunities might exist for retail investors in Indonesia – largely in the area of ETFs – and also deep-dive into the fascinating world of Indonesia’s “big” data problem and how it’s being solved.
Are you paying too much in transaction fees to your broker? Check out a brokerage firm called Zacks Trade that's offering $1 trades on U.S. stocks and options until 2020. After that, you'll pay just $3 a trade or a penny a share, whichever is greater. It's one of the cheapest brokers out there and you can also trade stocks on 91 foreign stock exchanges. Click here to trade US stocks and options for as low as $1 per order until 2020.

Wednesday, July 3, 2019

What it really takes to scale artificial intelligence

https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/digital-blog/what-it-really-takes-to-scale-artificial-intelligence

Changing company culture is the key—and often the biggest challenge—to scaling artificial intelligence across your organization.

June 18, 2019It’s an exciting time for leaders. Artificial intelligence (AI) capabilities are on the precipice of revolutionizing the way we work, reshaping businesses, industries, economies, the labor force, and our everyday lives. We estimate AI-powered applications will add $13 trillion in value to the global economy in the coming decade, and leaders are energizing their agendas and investing handsomely in AI to capitalize on the opportunity—to the tune of $26 billion to $39 billion in 2016 alone.

Meanwhile, AI enablers such as data generation, storage capacity, computer processing power, and modeling techniques are all on exponential upswings and becoming increasingly affordable and accessible via the cloud.

Conditions seem ripe for companies to succeed with AI. Yet, the reality is that many organizations’ efforts are falling short, with a majority of companies only piloting AI or using it in a single business process—and thus gaining only incremental benefits.
Why the disappointing results?
Many organizations aren’t spending the necessary (and significant) time and resources on the cultural and organizational changes required to bring AI to a level of scale capable of delivering meaningful value—where every pilot enjoys widespread end-user adoption and pilots across the organization are produced in a consistent, fast, and repeatable manner. Without addressing these changes up front, efforts to scale AI can quickly derail.

Making the shift

To scale up AI, companies must make three shifts. First, they must transition from siloed work to interdisciplinary collaboration, where business, operational, and analytics experts work side by side, bringing a diversity of perspectives to ensure initiatives address broad organizational priorities and to surface user needs and necessary operational changes early on.
Second, they must switch from experience-based, leader-driven decision making to data-driven decision making, where employees augment their judgment and intuition with algorithms’ recommendations to arrive at better answers than either humans or machines could reach on their own.
Finally, they must move from rigid and risk averse to agile, experimental, and adaptable, embracing the test-and-learn mentality that’s critical for creating a minimum viable product in weeks rather than months.
Such fundamental shifts don’t come easily. In our recent article, “Building the AI-powered organization,” published in Harvard Business Review, we discuss in depth how leaders can prepare, motivate, and equip their workforce to make a change. Here we summarize the four key areas in which leaders should focus their efforts.

Set up for success

To get employees on board and smooth the way for successful AI launches, leaders should devote early attention to several tasks, including the following:

Explaining why AI is important and how workers will fit into a new AI-oriented culture.
Anticipating and addressing from the start their firm’s unique barriers to change.
Budgeting as much for AI integration and adoption as for technology (if not more). One of our surveys revealed that 90 percent of the companies that engaged in critical scaling practices spent more than half of their analytics budgets on activities that drove adoption, such as workflow redesign, communication, and training.
Balancing feasibility, time investment, and value to pursue a portfolio of AI initiatives with different time horizons (typically over three years) and combining complementary efforts with different timelines for maximum value.

Organize for scale

In our experience, AI-enabled companies have two things in common when it comes to structuring roles and responsibilities—both in terms of who “owns” the work and how the work is executed.
First, they divide key roles between a central analytics “hub” (typically led by a chief analytics officer or chief data officer) and “spokes” (business units, functions, or geographies). A few tasks—such as data governance, managing AI systems and standards, and establishing AI recruiting and training strategies—are always best owned by the hub. And a handful of responsibilities, including end-user training, workflow redesign, and impact tracking, are almost always best owned by the spokes. The rest of the work—which includes, among other responsibilities, setting the direction for AI projects; building, designing, and testing the tools; and managing the change—falls in a gray area and is assigned to either the hub or spokes based on each firm’s AI maturity, business-model complexity, and pace of innovation. (Generally speaking, the greater the AI maturity and more data experts available, the more these responsibilities can be shifted to the spokes, while higher complexity and a need to innovate rapidly may shift these responsibilities to the hub).
Second, when it comes to execution, they put in place a governing coalition of business, IT, and analytics leaders that shares accountability for AI initiatives and sets up interdisciplinary teams within the spokes—drawing from talent in both the hub and spokes to build, deploy, and monitor new AI capabilities.

Educate everyone

To ensure the adoption of AI, companies need to educate everyone, from the top leaders down. To this end, some companies are launching internal “analytics academies,” which provide leaders a foundational understanding of AI, enable analytics experts to continue sharpening their hard and soft skills, build translator expertise to bridge technical and business requirements, and prepare both frontline workers and strategic decision makers, such as marketers, to use new AI tools in their daily work.

Reinforce the change

With most AI transformations taking 18 to 36 months to complete (and some lasting up to five years), leaders must also take steps to keep the momentum for AI going. Following are some of the best ways we’ve found to do this:

Role modeling. For example, leaders can (and should) attend analytics academies as well as actively encourage new agile ways of working and appropriate risk taking by highlighting what was learned from pilots.
Making the businesses accountable. A scorecard that captures project-performance metrics for all stakeholders, for example, is an excellent way to align the goals of analytics and business teams.
Tracking adoption so teams can correct course as needed.
Providing incentives for change, such as shining a spotlight on employees who have helped make the company’s AI program a success.

All this work (from the initial setup activities to the reinforcement mechanisms) not only helps organizations get more value from AI in the near term but also creates a virtuous cycle: the growth of interdisciplinary teams, test-and-learn approaches, and data-driven decision making that comes with the building and adoption of new AI capabilities leads to more collaborative practices among employees, flatter organizations, and greater agility. This provides fertile ground for even greater innovation, enabling companies to thrive as AI advancements barrel full speed ahead.
For a deeper look at how leaders can drive the cultural and organizational changes necessary for scaling AI, read “Building the AI-powered organization,” on hbr.org.
Tim Fountaine is a partner in McKinsey’s Sydney office and leads QuantumBlack, a McKinsey company, in Australia; Brian McCarthy is a partner in the Atlanta office and coleads the knowledge development agenda for McKinsey Analytics; and Tamim Saleh is a senior partner in the London office and heads McKinsey Analytics in Europe.

AI and machine learning will require retraining your entire organization

https://www.oreilly.com/ideas/ai-and-machine-learning-will-require-retraining-your-entire-organization

In our recent surveys AI Adoption in the Enterprise and Machine Learning Adoption in the Enterprise, we found growing interest in AI technologies among companies across a variety of industries and geographic locations. Our findings align with other surveys and studies—in fact, a recent study by the World Intellectual Patent Office (WIPO) found that the surge in research in AI and machine learning (ML) has been accompanied by an even stronger growth in AI-related patent applications. Patents are one sign that companies are beginning to take these technologies very seriously.

When we asked what held back their adoption of AI technologies, respondents cited a few reasons, including some that pertained to culture, organization, and skills:

The O'Reilly AI Conference

The O'Reilly AI Conference in San Jose, September 9-12, 2019

Learn more

[23%] Company culture does not yet recognize needs for AI
[18%] Lack of skilled people / difficulting hiring the required roles
[17%] Difficulties in identifying appropriate business use cases

Implementing and incorporating AI and machine learning technologies will require retraining across an organization, not just technical teams. Recall that the rise of big data and data science necessitated a certain amount of retraining across an entire organization: technologists and analysts needed to familiarize themselves with new tools and architectures, but business experts and managers also needed to reorient their workflows to adjust to data-driven processes and data-intensive systems. AI and machine learning will require a similar holistic approach to training. Here are a few reasons why:

As noted from our survey, identifying appropriate business use cases remains an ongoing challenge. Domain experts and business owners need to develop an understanding of these technologies in order to be able to highlight areas where they are likely to make an impact within a company.
Members of an organization will need to understand—even at a high-level—the current state of AI and ML technologies so they know the strengths and limitations of these new tools. For instance, in the case of robotic process automation (RPA), it’s really the people closest to tasks (“bottoms up”) who can best identify areas where it is most suitable.
AI and machine learning depend on data (usually labeled training data for machine learning models), and in many instances, a certain amount of domain knowledge will be needed to assemble high-quality data.
Machine learning and AI involve end-to-end pipelines, so development/testing/integration will often cut across technical roles and technical teams.
AI and machine learning applications and solutions often interact with (or augment) users and domain experts, so UX/design remains critical.
Security, privacy, ethics, and other risk and compliance issues will increasingly require that companies set up cross-functional teams when they build AI and machine learning systems and products.

At our upcoming Artificial Intelligence conferences in San Jose and London, we have assembled a roster of two-day training sessions, tutorials, and presentations to help individuals (across job roles and functions) sharpen their skills and understanding of AI and machine learning. We return to San Jose with a two-day Business Summit designed specifically for executives, business leaders, and strategists. This Business Summit includes a popular two-day training—AI for Managers—and tutorials—Bringing AI into the enterprise and Design Thinking for AI—along with 12 executive briefings designed to provide in-depth overviews into important topics in AI. We are also debuting a new half-day tutorial that will be taught by Ira Cohen (Product management in the Machine Learning era), which given the growing importance of AI and ML, is one that every manager should consider attending.
We will also have our usual strong slate of technical training, tutorials, and talks. Here are some two-day training sessions and tutorials that I am excited about:

Two-day training sessions on TensorFlow, PyTorch, NLP with deep learning, and MLflow
Given the recent research progress in natural language models, companies are eager to learn how to put these research results to work into their domains. We are happy to announce two new half-day tutorials that will be taught by Lukas Biewald (“Using Keras to classify text with LSTMs and other ML techniques”) and Joel Grus (“Putting cutting-edge modern NLP into practice”).
Deep learning remains a new topic for many companies, and organizations are interested in augmenting or replacing their existing ML systems with this class of techniques. Neil Conway and Yoav Zimmerman are teaching an important new half-day tutorial—Modern Deep Learning: Tools and Techniques—designed to provide concrete takeaways and best practices for developers, researchers, ML engineers, and technical managers. If your organization is serious about using deep learning, this is a tutorial that you and your colleagues should consider attending.
Reinforcement learning (RL) remains a popular topic at our AI conference. We have a new tutorial—ML problem-solving with a game engine—that will help participants get started using RL with the Unity engine. A team from RISE Lab will teach an updated tutorial on Ray, an open source distributed computing framework that includes a popular library for RL (RLlib). As I noted in a recent post, Ray continues to grow impressively along multiple fronts, including number of users, contributors, and libraries.

O'Reilly AI Newsletter

Get the O'Reilly AI Newsletter

Receive weekly insight from industry insiders—plus exclusive content, offers, and more on the topic of AI.

Please read our Privacy Policy.

AI and ML are going to impact and permeate most aspects of a company’s operations, products, and services. To succeed in implementing and incorporating AI and machine learning technologies, companies need to take a more holistic approach toward retraining their workforces. This will be an ongoing endeavor as research results continue to be translated into practical systems that companies can use. Individuals will need to continue to learn new skills as technologies continue to evolve and because many areas of AI and ML are increasingly becoming democratized.
Related training and tutorial links:

Monday, July 1, 2019

Lessons from becoming a Machine Learning Engineer in 12 months, without a CS/Math degree

https://hackernoon.com/lessons-from-becoming-a-machine-learning-engineer-in-12-months-without-a-cs-math-degree-acb308886c95

Included: Learning Machine Learning from scratch, hardware options, finding mentorship, who’s important to know in the field, freelancing as a machine learning engineer, concepts that make you difficult to replace, preparing for interviews, interviewing with big silicon valley tech companies, adopting the best productivity habits, and a few other things.

Credentials: I graduated with a degree in molecular biology and worked in biotech after college. Within a year of leaving that industry, I was working with the Tensorflow team at Google on probabilistic programming tools. I later joined a security startup as a machine learning engineer.

Disclaimer: Much of this is based on my own experience, peppered with insights from friends of mine who have been in similar boats. Your experience might not be identical. The main value is giving you a roadmap of the space so you can navigate it if you have no idea what you’re doing. If you have your own methods for learning ML that are working better than the ones listed here (like, if you’re literally in school learning about this stuff), keep on using them.

In a span of about one year year, I went from quitting biomedical research to becoming a paid Machine Learning Engineer, all without having a degree in CS or Math. I’ve worked on side-projects that have been shared with tens of thousands on Twitter, worked with startups in facial recognition and distributed apps, sold a side-project, and even worked with Google’s Tensorflow Team on new additions to Tensorflow. Again, this was all without having a computer science degree.

This post, while long, is a compilation of all the important concepts, tips, and resources for getting into a machine learning career. From readers who are not yet in College, to readers who have been out of college for a while and are looking to make a switch, I’ve tried to distil the most generally applicable points from my own journey that would be beneficial to a wide array of people.

Enjoy.

Part 1: Introductions, Motivations, and Roadmap

Part 2: Skills of a (Marketable) Machine Learning Engineer

Part 3: Immersion and Finding Mentors

Part 4: Software and Hardware Resources

Part 5: Reading Research Papers (and a few that everyone should know)

Part 6: Groups and People you should be Familiar with

Part 7: Problem-Solving Approaches and Workflows

Part 8: Building your portfolio

Part 9: Freelancing as an ML developer

Part 10: Interviewing for Full-time Machine Learning Engineer Positions

Part 11: Career trajectory and future steps

Part 12: Habits for Improved Productivity & Learning

Part 1: Introductions, Motivations, and Roadmap

Introductions

If you’ve been following the news at all, chances are you’ve seen the headlines about how much demand there is for machine learning talent. In the recent LinkedIn Economic Graph report, “Machine Learning Engineer” and “Data Scientist” were the two fastest growing jobs of 2018 (9.8x and 6.5x growth, respectively). Medium itself is rife with example projects, tutorials, reviews of software, and tales of interesting applications. Despite the apparent demand, there seem to be few resources on actually entering this field as an outsider, as compared the resources available for other areas of software engineering. That’s why I’m writing this mega-post: to serve as condensed resource for the lessons of my journey to becoming a Machine Learning Engineer from a non-CS background.

“But Matt”, you must be saying, “That’s not at all unusual, lots of people go into machine learning from other fields.”

It’s true that many non-CS majors go into the field. However, I was not a declared statistics, mathematics, physics, or electrical engineering major in college. My background is in molecular biology, which some of you may have noticed is frequently omitted from lists of examples of STEM fields.

Credit to Randall Munroe and XKCD (had a tough time deciding between this comic and this one)

While I was slightly more focused on statistics and programming during my undergrad than most bio majors, this is still an unusual path compared to a physicist entering the field (as this lovely post from Nathan Yau’s FlowingData illustrates).

Backstory

I don’t think it’s wise to focus too much on narratives (outside of preparing for interviews, which we will get to). There’s many ways I could spin a narrative for my first steps into the machine learning field, both heroic and anti-heroic, so here’s one of the more common ones I use:

Since high school, I had an almost single-minded obsession with diseases of aging. A lot of my introduction to machine learning was during my undergraduate research in this area. This was in a lab that was fitting discrete fruit fly death data to continuous equations like gompertz and weibull distributions, as well as using image-tracking to measure the amounts of physical activity of said fruit flies. Outside of this research, I was working on projects like a Google Scholar scraper to expedite the search for papers for literature reviews. Machine learning seemed like just another useful tool at the time for applying to biomedical research. Like everyone else, I eventually realized that this was going to become much bigger, an integral technology of everyday life in the coming decade. I knew I had to get serious about becoming as skilled as I could in this area.

But why switch away from aging completely? To answer that, I’d like to bring up a presentation I saw by Dr. David Sinclair from Harvard Medical School. Before getting to talking about his lab’s exciting research developments, he described a common struggle in the field of aging. Many labs are focused on narrow aspects of the process, whether it be specific enzyme activity, nutrient signalling, genetic changes, or any of the other countless areas. Dr. Sinclair brought up the analogy of the blind men and the elephant, with respect to many researchers looking at narrow aspects of aging, without spending as much time recognizing how different the whole is from the part. I felt like the reality was slightly different (that it was more like sighted people trying identify an elephant in the dark while using laser pointers instead of flashlights), but the conclusion was still spot-on: we need better tools and approaches to addressing problems like aging.

Everyone always focuses on the blind men. Nobody cares how the elephant feels about all this.

This, along with several other factors, made me realize that using the wet-lab approach to the biological sciences alone was incredibly inefficient. Much of the low-hanging fruit in the search space of cures and treatments has been acquired long ago. The challenges that remain encompass diseases and conditions that might require troves of data to even diagnose, let alone treat (e.g., genomically diverse cancers, rapidly mutating viruses like HIV). Yes, I agree with many others that aging is definitely a disease, but it is also a nebulously defined one that affects people in wildly varying ways.

I decided that if I was going to make a large contribution to this, or any other field I decided to go into, the most productive approach would be working on the tools for augmenting and automating data analysis. At least for the near future, I had to focus on making sure my foundation in Machine Learning was solid before I could return my focus to specific cases like aging.

“So…what exactly is this long-a** post about again?”

There are plenty of listicles and video tutorials for specific machine learning techniques, but there isn’t quite the same level of career-guide-style support like there is for web or mobile developers. That’s why this is more than just compiling lists of resources I have turned to for studying. I also tried to document the best practices I’ve found for creating portfolio projects, finding both short-term and long-term work in the field, and keeping up with the rapidly-changing research landscape. I will also compile nuggets of wisdom from others I have interviewed who are further along this path than I am.

The level of technical ability you need to show is not lowered, it’s even higher when you don’t have the educational background, but it’s totally possible.

— Dario Amodei, PhD, Researcher at OpenAI, on entering the field without a doctorate in machine learning

Ultimately, I want whoever reads this to get a detailed map of the space, so if they decide to go down my path, they can get through the valley of the Dunning-Kruger effect much more quickly.

In truth the actual Dunning-Kruger effect is a bit more noisy than this.

With that in mind, we’ll start with a rough overview of the skills needed to master in order to become an (employable) machine learning engineer:

Part 2: Skills of a (Marketable) Machine Learning Engineer

Becoming a machine learning engineer still isn’t quite as straightforward as becoming a web or mobile engineer, as we discussed in the previous section. This is despite all of the new programs geared toward machine learning both inside and outside of traditional schools. If you ask many people with the title of “Machine Learning Engineer” what they do, you’ll often get wildly different answers.

The goal of this section is to help you put together the beginnings of a mental semantic tree (Khan Academy’s example of such a tree) for learning machine learning (à la Elon Musk’s now famous method). Based on my own experiences, as well as reaching out to hundreds of machine learning engineers in both academia and industry, here’s an overview of the soft skills, basic technical skills, and more specialized skills you’ll need.

Soft Skills

We need to cover a few non-technical skills that you should keep in mind before diving into the deep end. Yes, machine learning is mainly math and computer science knowledge. However, you’ll most likely need to find ways of applying this to solve real problems.

Learning new skills: The field is rapidly changing. Every month new neural network models come out that outperform previous architecture. GPU-manufacturers are in an arms race. 2017 saw just about every major tech giant release their own machine learning frameworks. There’s a lot to keep up with, but luckily the ability to quickly learn things is something you can improve on (Growth mindsets for the win!). Classes like Coursera Learning how to Learn are great for this. If you have Dyslexia, ADD, or anything similar, the Speechify app can offer a bit of a productivity boost (this is one app that I used a bunch to make as much use of my time reading and re-reading papers).

Muad’Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It’s shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad’Dib knew that every experience carries its lesson.

- Dune, by Frank Herbert

Time-management: A lot of my friends have gone to Ivy League schools like Brown, Harvard, and MIT. Out of the ones that made it there and continued to succeed afterwards, it seemed that skill in time management was a much bigger factor in their success than any natural talent or innate intellect. The same pattern will likely apply to you. When it comes to a cognitively-demanding task like learning machine learning, RESIST THE URGE TO MULTI-TASK. Yes, at some point you may need to run model-trainings in parallel if you have the compute resources, but you should put your phone on airplane mode when studying and avoid doing multiple tasks at the same time. I cannot recommend highly enough Cal Newport’s book “Deep Work” (or his Study Hacks Blog). If you’re still in college or high school, Jessica Pointing’s Optimize Guide is also a great resource. I’ll go into more resources like this in the next post in this series.

Business/Domain knowledge: The most successful machine learning projects out there are going to be those that address real pain points. It will be up to you to make sure your project is not the machine learning equivalent of Juicero. In academia, the emphasis is more on the side of improving metrics of algorithms. In industry, the focus is all about making those improvements count towards solving customer or company problems. Beyond taking classes in entrepreneurship while you’re in school, there are plenty of classes online that can also help (Coursera has a pretty decent selection). If you want a more comprehensive overview, you can try the Smartly MBA. It’s creators impose an artificially low acceptance rate, but if you get in it’s free. At the very least, business or domain knowledge helps a lot with feature engineering (many of the top-ranking Kaggle teams often have at least one member whose role it is to focus on feature engineering).

Communication: You’ll need to explain ML concepts to people with little to no expertise in the field. Chances are you’ll need to work with a team of engineers, as well as many other teams. Oh, and you’ll need to get past the dreaded interviews eventually. Communication is going to make all of this much easier. If you’re still in school, I recommend taking at least one course in rhetoric, acting, or speech. If you’re out of school, I can personally attest to the usefulness of Toastmasters International.

Rapid Prototyping: Iterating on ideas as quickly as possible is mandatory for finding one that works. Throughout your learning process you should maximize the amount of new, useful, and actionable information you are getting. In machine learning, this applies to everything from picking the right model, to working on projects such as A/B testing. I had the pleasure of learning a lot about rapid prototyping from one of Tom Chi’s prototyping workshops (he’s the former Head of Experience at GoogleX, and he now has an online class version of his workshop). Udacity also has a great free class on rapid prototyping that I highly recommend.

Okay, now that we’ve got the soft skills out of the way, let’s get to the technical checklist you were most likely looking for when you first clicked on this article.

The Basic Technical Skills

Python (at least intermediate level) — Python is the lingua franca of Machine Learning. You may have had exposure to Python even if you weren’t previously in a programming or CS-related field (it’s commonly used across the STEM fields and is easy to self-teach). However, it’s important to have a solid understanding of classes and data structures (this will be the main focus of most coding interviews). MITx’s Introduction to Computer Science is a great place to start, or fill in any gaps. In addition to intermediate Python, I also recommend familiarizing yourself with libraries like Scikit-learn, Tensorflow (or Keras if you’re a beginner), and PyTorch, as well as how to use Jupyter notebooks.

C++ (at least intermediate level) — Sometimes Python won’t be enough. Often you’ll encounter projects that need to leverage hardware for speed improvements. Make sure you’re familiar with basic algorithms, as well as classes, memory management, and linking. If you also choose to do any machine learning involving Unity, knowing C++ will make learning C# much easier. At the very least, having decent knowledge of a statically-typed language like C++ will really help with interviews. Even if you’re mostly using Python, understanding C++ will make using performance-boosting Python libraries like Numba a lot easier. Learn C++ has been one of my favorite resources. I would also recommend Programming: Principles and Practice Using C++ by Bjarne Stroustrup.

Once you have the basics of either Python or C++ down, I would recommend checking out Leetcode or HackerRank for algorithm practice. Quickly solving basic algorithms is kind of like lifting weights. If you do a lot of manual labor (e.g., programming by day), you might not necessarily be lifting a lot of weights. But, if you can lift weights well, most people won’t doubt that you can do manual labor.

I have yet to find reliable estimates on the market size projections for “AI-augmented weightlifting”, but something tells there’s at least one angel investor out there who would fund it just for fear of missing out.

Onward to the math…

Linear Algebra (at least basic level) — You’ll need to be intimately familiar with matrices, vectors, and matrix multiplication. Khan Academy has some good exercises for linear algebra. I also recommend 3blue1brown’s YouTube series Essence of Linear Algebra for getting a better intuition for linear algebra. As for textbooks, I would recommend Linear Algebra and Its Applications by Strang & Gilbert (for getting started), Applied Linear Algebra by B. Noble & J.W. Daniel (for applied linear algebra), and Linear Algebra, Graduate Texts in Mathematics by Werner H. Greub (for more advanced theoretical aspects).

Calculus (at least basic level) — If you have an understanding of derivatives and integrals, you should be in the clear. Otherwise even simpler concepts like gradient descent will elude you. If you need more practice, Khan Academy is likely the best source of online practice problems out there for differential, integral, and multivariable calculus. Differential equations are also helpful for machine learning.

Statistics (at least basic level) — Statistics is going to come up a lot. At least make sure you’re familiar with Gaussian distributions, means, and standard deviations. Every bit of statistical understanding beyond this helps. Some good resources on statistics can be found at, you probably guessed it, Khan Academy. Elements of Statistical Learning, by Hastie, Tibshirani, & Friedman, is also great if you’re looking for applications of statistics to machine learning.

Some machine learning engineers will laugh at this. Others will be offended. Both camps, however, will agree that statistics is very important to learn for this field.

BONUS: Physics (at least basic level) — You might be in a situation where you’d like to apply machine learning techniques to systems that will interact with the real world. Having some knowledge of physics will take you very far, especially when it comes to understanding concepts like Nesterov momentum or energy-based models. For learning physics online, I would point to Physics for the 21st Century, MIT’s online physics courses, UC Berkeley’s Physics for Future Presidents, and Khan Academy. For textbooks, I would look at Frank Firk’s Essential Physics 1.

BONUS: Numerical Analysis (at least basic level) — A lot of machine learning techniques out there are just fancy types of function approximation. These often get developed by theoretical mathematicians, and then get applied by people who don’t understand the theory at all. The result is that many developers might have a hard time finding the best technique for their problem. If they do find a technique, they might have trouble fine-tuning it to get the best results. Even a basic understanding of numerical analysis will give you a huge edge. I would seriously look into Deturk’s Lectures on Numerical Analysis from UPenn, which covers the important topics and also provides code examples.

All this math might seem intimidating at first if you’ve been away from it for a while. Yes, machine learning is much more math-intensive than something like front-end development. Just like with any skill, getting better at Math is a matter of focused practice. There are plenty of tools you can use to get a more intuitive understanding of these concepts even if you’re out of school. In addition to Khan Academy, Brilliant.org is a great place to go for practicing concepts such as linear algebra, differential equations, and discrete mathematics.

Before we go further, we need to make sure we’re on the same page that machine learning, deep learning, and artificial intelligence are not completely synonymous.

Common non-neural network Machine Learning Concepts — You may have decided to go into machine learning because you saw a really cool neural network demonstration, or wanted to build an artificial general intelligence (AGI) someday. It’s important to know that there’s a lot more to machine learning than neural networks. Many algorithms like random forests, support vector machines (SVMs), and Naive Bayes Classifiers can yield better performance for your hardware on some tasks. For example, if you have an application where the priority is fast classification of new test data, and you don’t have a lot of training data at the start, an SVM might be the best approach for this. Even if you are using a neural network for your main training, you might use a clustering or dimensionality-reduction technique first to improve the accuracy. Definitely check out Andrew Ng’s Machine Learning, as well as the Scikit-learn documentation.

There’s **a lot** you can do without neural networks.

Common Neural Network Architectures — Of course, there are still good reasons for the surge in popularity of neural networks. Neural networks have been by far the most accurate way of approaching many problems, like translation, speech recognition, and image classification. Andrew Ng’s Machine Learning (and his more up-to-date Deep Learning specialization) are great starting points. Udacity’s Deep Learning is also a great resource that’s more focused on Python implementations.

Bear in mind, these are mainly the skills you would need to meet the minimum requirements for any machine learning job. However, chances are you’ll be working on a very specific problem within Machine Learning. If you really want to add value, it will help to specialize in some way beyond the minimum qualifications.

Specialized Skills and Subdisciplines

Computer Vision — Out of all the disciplines out there, there are by far the most resources available for learning computer vision. Getting a convolutional neural network to get high accuracy on MNIST is the “hello world” of machine learning. This field appears to have the lowest barriers to entry, but of course this likely means you’ll face slightly more competition. A variant of Georgia Tech’s Introduction to Computer vision is available for free on Udacity. This is great if you supplement this course with O’Reilly Learning OpenCV and Richard Szeliski’s Computer Vision: Algorithms and Applications (he’s the founding director of the Computational Photography group at Facebook). I also recommend checking out the Kaggle kernels for Digit recognition, Dogs vs Cats classification, and Iceberg recognition.

Natural Language Processing (NLP) — Since it combines computer science and linguistics, there are a bunch of libraries (Gensim, NLTK) and techniques (word2vec, sentiment analysis, summarization) that are unique to NLP. The materials for Stanford’s CS224n: Natural Language Processing with Deep Learning class are readily available to non-Stanford students. I also recommend checking out the Kaggle kernels for the Quora Question Pairschallenge and Toxic Comment Classification Challenge.

Voice and Audio Processing — This field has frequent overlap with natural language processing. However, natural language processing can be applied to non-audio data like text. Voice and Audio analysis involves extracting useful information from the audio signals themselves. Being well versed in math will get you far in this one (you should at least be be familiar with concepts like fast Fourier transforms). Knowledge of music theory also helps. I recommend checking out the Kaggle kernels for the MLSP 2013 Bird Classification Challenge and TensorFlow Speech Recognition Challenge, as well as Google’s NSynth project.

Reinforcement Learning — Reinforcement learning has been a driver behind many of the most exciting developments in deep learning and artificial intelligence in 2017, from AlphaGo Zero to OpenAI’s Dota 2 bot to Boston Dynamics’s Backflipping Atlas. This is will be critical to understand if you want to go into robotics, Self-driving cars, or any other AI-related area. Georgia Tech has a great primer course on this available on Udacity. However, there are so many different applications, that I’ll need to write a more in-depth article later in this series.

I’m not sure how much this aligns with Ray Kurzweil’s predictions of when machines would outperform human parkour instructors. I‘m also not entirely certain Kurzweil’s ever made predictions about this specifically, but I really hope he’s at least given it thought.

There are definitely more subdisciplines to ML than this. Some are larger and some have yet to reach maturity. Generative Adversarial Networks are one of these. While, there is definitely a lot of promise for their use in creative fields and drug discovery, they haven’t quite reached the same level of industry maturity as these other areas.

BONUS: Automatic Machine Learning (Auto-ML) — Tuning networks with many different parameters can be a laborious process (in fact, the phrase “graduate student descent” refers to getting hordes of graduate students to tune a model over the course of months). Companies like Nutonian (bought by DataRobot) and H2O.ai have recognized a massive need for this. At the very least, knowing how to use techniques like grid search (like scikit-learn’s GridSearchCV)and random search will be helpful no matter your subdiscipline. Bonus points if you can implement techniques like bayesian optimization or genetic algorithms.

The biggest well of meta memes, In case the mere presence of experts in Auto-ML wasn’t enough to convince you to follow them on Twitter…

Conclusions

With this overview of machine learning skills, you should hopefully have a better grasp on how the different parts of the field relate to one another. If you want to get a quick, high-level understanding of any of these technical skills, Siraj Raval’s YouTube channel and KDnuggets are good places to start.

It’s not enough to just have this list of subjects in you head though. Certain approaches to studying this are more effective than others.

Part 3: Immersion and Finding Mentors

Self study can be tricky, even for those of us without any kind of attention deficit disorder. It’s especially important to note that not all self study is equal in quality. Take studying a language, for example. Many people have had the experience of learning a language for years in a classroom setting. When they go spend a few weeks or months in a country where that language is all that is spoken, they often describe themselves as learning much more quickly than in the classroom setting. This is often referred to as learning a language by immersion. This means that even the instructions for what you need to do with a language are in the language itself.

While learning a subject like machine learning might be functionally different than learning another spoken language (you’re not going to be speaking in classes and functions, after all), the principle of surrounding yourself with a subject and filling as many hours of the day with it is important here. That is what we’re talking about when we talk about immersion with respect to machine learning. What Cal Newport might say is that the reason formal institutions often consistently result in higher quality is immersion for non-language subjects. People spend many hours per day in structured settings where it’s almost difficult NOT to study a particular subject. The ones that find more immersion (i.e., taking additional more advanced classes, spending more time studying the subject with others, involving themselves in original research efforts) are the ones that succeed more.

If you’re studying machine learning in a formal setting, good for you. Much of the rest of the advice in this post still applies, but you’ve got an edge. If you’re not studying machine learning in a formal setting, or if you’re entering into the space from a different field, your challenge is going to be building your own habits, commitments, structures, and environments that make you spend as much time studying machine learning.

How do you do this? First, you’re going to need to put together a schedule for learning the different subjects listed in the previous section. Fow varied this is or how long it will take will depend on your previous familiarity with the mathematical concepts involved (try starting with 1 week for reviewing each of the subjects to get a sense for the space, and spend more or less time based on your previous familiarity).

You should try to fit at least 2 hours into each day studying. EVERY. SINGLE DAY. This spaced repetition will become stronger as your learning streaks get longer (and you will be surprised at how rusty you can get after taking just a single day off). If you can fit more than 2 hours into certain days, like on weekends, that’s even better. Even when I was working full time, I was making sure to fit at least 2 hours of studying each day (part of this was the result of learning how to effectively read papers, books, and tutorials while also riding a train or bus). While there were occasionally holidays that I would use for structured study-sessions, most of this found time came from relentlessly optimizing what I spent my time doing.

You should make sure to have a minimum amount of time each day scheduled in your calendar (and I mean actually reserved in your calendar, in a slot where nothing else can be scheduled over). Set up alerts for these times, and find an accountability buddy (someone who can keep you accountable if you do not study during these times. In my case I had other friends that were studying subjects in machine learning and we would present each other with our notes and/or github commits). 2 hours a day minimum can sound like a lot, but if you remove the items from your schedule that are less important (*cough* social media), you will be amazed at how much time you can find.

Now at this point, much of the content has focused on what you as an individual can to do improve your studying. There’s one more thing to keep in mind when studying:

DON’T GO IT ALONE.

You’re probably inexperienced in machine learning if you’re looking for advice form this post. For the self study, it is absolutely critical that you find a network of mentors (or at the very least one incredibly experienced mentor). If you don’t find a mentor, you will have to put a lot more time and effort into self-study to get the same results as someone that had a mentor and put in less practice. Our culture is flooded with the trope of the lone Genius. Many may correctly point out that people like Mozart and Einstein became masters in their fields by putting in thousands of man-hours while they were still young. However, many of the same people often ignore the critical roles that mentors played in their careers (Mozart had his father, and Einstein had professors in one of the best physics departments on the planet at the time).

Why is finding a mentor so important? Chances are they may have been down the same road you’re travelling. They have a better map of the space, and will probably have a better grasp of the common pitfalls that plague people earlier in their careers. They’ll be able to tell you whether the machine learning idea you’re working on is truly novel, or whether it’s been done countless times with a non-ML implementation

There are a few possible steps to acquiring a mentor

Create a list of prospective mentors: Create a list of experienced people in the field of interest (in this case it might be computer science or machine learning). This list can be updated as time goes on, and you get a better feel for how everyone is connected in the space. You might be surprised at what a small world the machine learning space is.
Be indirect at first: If you’re talking to a potential mentor for the first time, start out with very specific questions about their work. Demonstrate your interest by showing you’ve put thought into your questions (ask the kinds of questions where it seems like you’ve exhausted other research resources, and are coming to them because nobody else would have a good answer). Also, for those on your list, I would avoid asking literally “will you be my mentor?”. If the person in question is qualified to be a mentor, then they may not have a lot of time to spare in their schedule (especially not for something that sounds like it would require committing a lot of time to a person they just met). That leads me to the much better strategy…
Demonstrate value: Again, if a person is experienced enough to be a good mentor, chances are they will also have very little spare time on their hands. However, they will often be willing to provide advice or mentorship if you’re willing to help them out with a project of theirs. Offer to volunteer. yes, I know unpaid internships can be considered dubious, but at this point in time getting a good mentor at all is more important. This can be a short term project that could turn into a referral for a much more rewarding one.
Use youth to your advantage (if possible): If you are young, you might have an advantage. People are a lot more willing to help simply if you are younger. You might be nervous about approaching people a lot older, but you actually have a lot less to fear than you realize. The worst they can do is say no.
Be open to reverse mentors: By reverse mentors, I mean people that are younger than you, but that are also much further ahead in their machine learning journeys. You may have come across people that have been programming since they were 5 years old, built their first computer not from a kit, but completely from scratch. They’ve started ML companies. They’re grad students at top CS programs. They’re Thiel fellows or Forbes 30 under 30s. If you happen to run into these people, do not be intimidated or envious. If you have someone like this in your network as a machine learning expert, try to learn from them. I’ve been fortunate enough to meet a bunch of people like this, and they were invaluable in helping me find the next steps.
Be humble and obedient: It’s important that you remember you are coming to them for advice. They are taking time out of their busy schedule to give you recommendations. If you want someone to remain your mentor, then you should defer to their judgement. If you do something different than what they say or don’t do it at all, that will be a pretty good signal them that you either don’t value their advice, or that you aren’t that serious about becoming a machine learning engineer.

If you focus on making sure you get as much immersion as possible, and you are able to find experienced machine learning engineers to provide advice and guidance, you’re off to a fantastic start.

Never forget the bottom right one. You will be doing a LOT of just that.

There is one, last, minor detail to consider before you begin your learning journey…. you need an actual computer to program on.

Part 4: Software and Hardware Resources

Programming for machine learning often distinguishes itself from web programming by the fact that it can be much more demanding in terms of hardware. When I started out on my machine learning journey, I originally used a 3-year-old Windows laptop. For basic machine learning tutorials this may be adequate, but once you try spending 28 hours training a simple low-resolution GAN, hearing your CPU scream in agony the whole time, like me you will realize you need to expand your options.

The choice of environments can be daunting at first, but it can easily be split up into a parseable list.

The first thing you may be wondering is whether you should pick Windows, Mac, or Linux. A lot more packages, like you would see with Anaconda, are compatible with Mac and Linux rather than Windows. Tensorflow and PyTorch are available on all 3, but some less common but still useful packages like XGBoost may be trickier to install on Windows. Windows has becoming more popular in recent years as a development platform for machine learning, though this has largely been due to the emergence of more cloud resources with Azure. You can still use a Windows machine to run software that was developed for Mac or Linux, such as by setting up a VirtualBox virtual machine. It’s also possible you could use a Kaggle kernel or a databricks kernel, but that of course is dependent on having a great internet connection. For the operating system, if you’re already used to using a Mac you should be fine. Regardless of which operating system you choose, you should still try to add an understanding of Linux to your skill set (in part because you will probably want to deploy trained models to servers or larger systems of some kind.

For your machine learning set up, you have four main options: 1) the Laptop Option, 2) Cloud Resources/Options, 3) Desktop Option and 4) Custom/DIY Machine Learning Rigs.

Laptop Option: Favoring portability & Flexibility — If you’re going for the machine learning freelancing route, this can be an attractive option. It can be your best friend if you’re a digital nomad (albeit, one that might feel like a panini press if you’re keeping it in your actual lap when you’re using it for model training). With that in mind, here are some features and system settings you should make sure you have if you’re using your Laptop for Machine Learning.

RAM: Don’t settle for anything less than 16 GB of RAM.
GPU: This is probably the most important feature. Having the right GPU (or having any GPU instead of just CPUs) could mean the difference between model training taking an hour or taking weeks or months (and making lots of heat and noise in the process). Since many ML libraries make use of Cuda, go with an NVIDIA graphics card (like a GeForce)for the least amount of trouble. You may need to write some low-level code for getting your projects to run on an AMD card.
Processor: Go with an Intel i7 (if you’re a Mr. Krabs-esque penny-pincher, make sure you don’t go below an Intel i5).
Storage: Chances are you’re going to be working on projects that require a lot of data. Even if you have extra storage on something like Dropbox or an external drive, make sure you have at least 1 TB.
Operating System: Sorry Mac and Windows cultists, but Skynet is probably going to be running on Linux when it comes out. If you don’t have a computer with Linux as the main OS yet, you can make a temporary substitute by setting up a virtual machine with Ubuntu on either your Mac or Windows machine.

When it comes to specific brands there are many choices. Everyone from Acer to NVIDIA has Laptops.

GeForce GTX 10 Series Laptops from NVIDIA GeForce
Laptops with 10 Series graphics cards turn your mobile rig into a sleek, powerful gaming weapon, powered by the…www.nvidia.com

My personal choice? I went with a laptop specialized for machine learning from Lambda Labs.

Deep Learning Laptop — RTX 2070 & RTX 2080. New 2019 Laptop | TensorBook
In Stock. Ships 1–2 Days. TensorFlow, PyTorch, Keras Pre-Installed. Customizable: Up to 32 GB RAM, 1 TB NVMe, Intel…lambdalabs.com

Of course, if you insist on using a Mac, you could always connect your machine to an external GPU (using a device like an NVIDIA Pascal)

But if you’re strapped for cash, don’t fear. You can always go with one of the cheapest laptops out there (e.g., a $249 Chromebook), and then use all the money you saved for cloud computing time. You won’t be able to do much on your local machine, but as long as you have a decent internet connection you should be able to do plenty with cloud computing.

Speaking of cloud computing…

Cloud Resources/Options — It’s possible that even your powerful out-of-the-box or custom build won’t be enough for a really big project. You’ve probably seen papers or press releases on massive AI projects that use 32 GPUs over many days or weeks. While most of your projects won’t be quite that demanding, it will be nice to have at least some options for expanding your computing resources. These also have the benefit of being combined with whatever laptop you have, combining it

Microsoft Azure is usually the cheapest for compute time (I might be fanning the flames of the Github/Gitlab holy war here). Amazon usually has a lot more options (including more obscure options like combining it with data streaming Kinesis or long term storage in S3 or Glacier). If you’re using a Tensorflow Model, Google Cloud’s TPUs (i.e., re-marketed GPUs) are optimized for models built using this. They also offer tools and services for optimizing your hyperparameters, so you don’t have to set up the bayesian Optimization yourself.

*“I’m a Mac” “and I’m a PC” “but since you’re only using us to access linux VMs on AWS or use Google Cloud, there’s no real difference between us”*

If you’re relatively new to using Cloud Services, Floydhub is the simplest to use in terms of user experience. If you’re a beginner, this is by far the easiest one to set up.

Then again, you might not find the idea of shelling out a bunch of money for GPU compute-time for every project you want to do. At some point, you may decide to yourself that you only want to concern yourself with the electric bill when it comes to compute power, nothing else.

Desktop Option: Powerful and reliable — If you don’t want to have variable costs due to cloud computing bills, and you don’t want your important machine learning work to be at risk for environmental damage, another option could be to set up a Desktop environment. Lambda Labs and Puget Systems make some really great high-end desktops as well. The hardware options for a desktop can take a bit more skill to navigate, but here are some general principles to keep in mind:

For the GPU, go with an RTX 2070 or RTX 2080 Ti. If cost is a concern, going with a cheaper GTX 1070, GTX 1080, GTX 1070 Ti, or GTX 1080 Ti can also be a good option. However many GPUs you have, make sure you have 1–2 GPU cores per GPU (more of you’re doing a lot of preprocessing). As long as you buy at least as much CPU RAM to match the RAM of your largest GPU, go with the cheapest RAM that you can (Tim Dettmers has a post explaining how the clock rates make little meaningful difference). Make sure the hard drive is at least 3 TB in size.

Unless you splurge on the memory, you’ll probably find yourself saying this a lot.

If possible, go with a solid-state drive to improve the speed for preprocessing small datasets. Make sure your setup has adequate cooling (this is a bigger concern for Desktops than for laptops). As for monitors, I’d recommend putting together a dual-monitor setup (3 may be excessive, but knock yourself out. I don’t know what you would use 4 for though).

Downside? Basic models will cost about $2,000 to $3,000, with high-end machines costing around $8,897 to $23,000. This is much steeper than the laptop option, and unless you’re training complex models on massive datasets, this is probably outside your initial budget for cloud computing.

However, there is a big advantage that desktops have over laptops: Since desktop computers are less restricted by design constraints such as portability, or not turning your lap into a panini press from the heat radiating from it, it is far easier to build and customize your own. This can also be a fantastic way to cheaply build your ideal machine.

Custom/DIY Machine Learning Rigs: For the enthusiast — Chances are if you’re in the field for a while, you’re going to start wanting to build your own custom computer. It’s an inevitable consequence of thinking about how much GPU resources you’re spending on a project for this long. Given that the development of the GPUs that made cheap and effective machine learning was pretty much subsidized by the gaming industry, there are plenty of resources out there on building your own PC. Here is an example breakdown of a few components and their prices. These were intentionally selected for being cheap, so you could easily replace any of the parts with something higher-end.

I took the list of parts I used for my first rig, removed the bells and whistles, and switched out as much as I could for cheaper parts. Here is the result.

It’s entirely possible that your level of comfort with hardware might not be on the same level as your software comfort. If that’s the case, building your PC is certainly going to be a lot (and I mean A LOT) trickier than it is in PC Building Simulator. If you do succeed, this can be a fun project, and you’ll also save money on a desktop machine learning rig

With the custom build, you also have the option for some pretty out-there options as well…

A guy adding an $8,000 NVIDIA Tesla V100 to his machine

A gaming PC with a rope and wood case, optimized for both cooling and decoration

A rig specifically designed to disperse its excess heat as a replacement for your space heater.

a…machine learning rig made to look like a Helicarrier? Okay, at least the fans on this one aren’t powerful enough to suck in a plane launching from that top runway (can’t say the same about the one from the movie).

Whichever setup you choose, whether it be mainly Laptop, Cloud-based, Desktop, or custom build, you should now be more than ready to run your own machine learning projects.

Of course, becoming a machine engineer is about more than just setting up your hardware/software environment correctly. Since the field is changing so much, you’re going to need to learn how to read research papers on the subject.

Part 5: Reading Research Papers (and a few that everyone should know)

In order to have a proper understanding of machine learning, you need to get acquainted with the current research in the space. It’s not enough to agree with claims of what AI can do, just because it got enough hype on social media. If you have GPU resources, you need to know how to properly utilize them or else they’ll be useless. You need to learn to be critical and balanced in your assessment. This is what PhD students learn how to do, but luckily you can also learn how to do this.

For finding new papers to read, you can often find them by following machine learning engineers and researchers on Twitter. The machine learning subreddit is also another fantastic resource. Countless papers are available for free on Arxiv (and if navigating that is too intimidating, Andrej Karpathy put together archive sanity to make I easier to navigate). Journals like Nature and Science can also be good sources, too (though Arxiv often has much more, and without paywalls).

Usually there about 2 or 3 papers that are particularly popular in any given week.

For getting through a paper, it usually helps if you have some kind of motivation for getting through it. For example, if I want to learn about influence functions or Neural ODEs, I will search through the papers and read them until I understand them. As was mentioned before with the immersion, how far you get is going to be a function of discipline, which in turn is going to be influenced even further by your motivation.

For any given paper, there are certain techniques you can use to make the information easier to digest and understand. The book “How to read a book” is a fantastic resource that described this in detail. In short, you can use what is known as a three-pass approach. In the first pass through the paper, you can just skim through the paper to see if it is interesting. This means you first read the title, and if it’s appealing move onto the abstract. The abstract is the short summary at the beginning that covers the main points in the papers. If that seems good, you move onto the introduction, read through that, then read the section and subsection headers, but not the content of those sections. In the first pass, you can temporarily ignore the math (assume it’s sound for now). Once you go through the section headers, you read the conclusion, and then skim through the references. In the references, if you see any papers that you’ve read before, you can mark those. The whole purpose of this first pass is to understand what the purpose of the paper is, what the authors are trying to do, what problem they are trying to solve. After the first pass, I will usually turn to Twitter (or whatever source the paper came from), and compare what others are saying about the paper to my initial assumptions.

If after all this I have determined that the paper is interesting enough to read more in-depth, I’ll take another pass through it. I’ll try to get a high level understanding of the math in the paper. I’ll read the thicker descriptions, the plots, and try to understand the high-level algorithm. I’ll usually pay more attention to the math this time around. However, there may be times where the author tries to factor out all the math. On the second pass I’m still not going through these factorizations and derivations just yet. When I read the experiments, I will try to evaluate whether the experiments seem reproducible. If there is code available on Github for this paper, I will usually follow the Github link and read through the code, and perhaps even try running some part of it on my own device. Usually comments in the code help with understanding. I will also read through other online resources online that help with understanding (the more popular papers often have plenty of high-level summaries, such as on sites like ML Explained).

On the 3rd pass, this is when you try to understand the math itself. At this point, you will be going through the paper with a pen and notepad, and following along with the math itself. If there are any mathematical terms or concepts that you do not understand, this is the point where you search online for better explanations. If you’re really ambitious, you can also try replicating the paper in code form, complete with the parameters and data that they use in the paper.

If at any point you feel stuck or frustrated, just remember to not give up. persistence will get you very far, and reading papers gets much easier the more times you do it. If you’re still stuck on the math, don’t hesitate to turn to Khan Academy or Wikipedia. If you’re looking for even more help, try reaching out on the Machine Learning Subreddit, or join a journal club meetup group in your city.

As for which papers to start with, I would try applying the technique above to some of the classic papers in machine learning. A lot of the papers you read (especially the avalanche of GAN papers out there) will have many concepts from these. I’ve listed a few of the big ones by subject and included links to the papers.

For Computer Vision, AlexNet (2012), ZF Net (2013), VGG Net (2014), GoogLeNet (2015), and Microsoft ResNet (2015) are the big ones
For image segmentation, Region Based CNNs (R-CNN — 2013, Fast R-CNN — 2015, Faster R-CNN — 2015) and YOLO
Generative Adversarial Networks (2014)
Generating Image Descriptions (2014)
Spatial Transformer Networks (2015)

These papers are a great starting point for a conceptual understanding of where these large, daunting, machine learning models come from. While this will take you very far in building projects and following the latest developments, it also helps to know who is creating these developments.

Part 6: Groups and People you should be Familiar with

As I mentioned before, finding mentors and reading papers are important. However, it’s also worth paying attention to the work of specific researchers.

Depending on which subfield you go into, following certain individuals might be more important than others, but generally speaking being familiar with these ones will reduce the risk of you getting into an awkward moment at NIPS. Since many of these groups are also the most heavily-connected, you can probably navigate the increasingly crowded machine learning research space by traversing a mental graph of who is connected to who, and through whom.

For companies, there are the big ones you should be aware of: Deepmind (Google), Google Brain, Facebook (AI Lab), Microsoft Research (AI Lab), OpenAI. These are the ones that get the most press, but it’s also worth keeping in mind some of the smaller groups. These include but are not limited to Vicarious, Numenta, MIRI, Allen Institute, IBM (Watson), Vision Factory (acquired by Google DeepMind), Dark Blue Labs (also acquired by Google DeepMind), DNNresearch (not acquired by Google DeepMind, but acquired by Google Brain), NNAiSene, Twitter Cortex, Baidu (AI Lab), Amazon (AI Lab), and Wolfram Alpha.

These companies often get a lot attention for research in the ML space because they often have much more computing resources (and can pay the researchers more) than in academia. However, that’s not to say there aren’t plenty of Academic research centers you should be aware of. These include (but again, are not limited to) IDSIA (Dalle Molle Institute for Artificial Intelligence Research, Juergen Schmidhuber’s Lab), MILA — Montreal Institute of Learning Algorithms, University of Toronto (as a whole, since so many researchers like Ian GoodFellow and Geoffrey Hinton have come out of there), and Gatsby.

For researchers, Demis Hassabis (Co-founder of DeepMind), Shane Legg (Co-founder of DeepMind), Mustafa Suleyman (head of product at DeepMind), Jeff Dean (Google), Greg Corrado (Google AI Research Scientist), Andrew Ng (Stanford, Coursera), Ray Kurzweil (Transhumanism, computer vision, and too much else to list here), Dileep George (Vicarious), D. Scott Phoenix (Vicarious and Numenta), Yann Lecun (creator of CNNs, you should probably make sure you know this guy), Jeff Hawkins (Numenta, Palm Computing, and Handspring), and Richard Socher (Salesforce, Stanford) are good ones to keep in mind. Like the list of companies, this should not be considered a comprehensive list. Rather, since many of these people are superconnectors within the machine learning space, you can gradually build up a graph to connect the most prominent people. If you want to stay connected and aware without information overwhelm, twitter is a fantastic tool (just keep the number of people you’re following to under 1,500 and triage accordingly), as well as newsletters like Papers with Code, O’Reilly Data Newsletter, KDNuggets News, and the Artificial Intelligence Podcast by Lex Fridman.

Ever since LeNet, so many people seem to have forgotten about Haar transforms and geometric heuristics like they were snapped away by Swole Grimace. This is why I stressed the non-neural-network ML methods earlier.

Of course, it’s not enough to be familiar with the current celebrities of machine learning. You should probably also make yourself familiar with historical figures such as Charles Babbage, Ada Lovelace, Alan Turing. I recommend Walter Isaacson’s “The Innovators” for an overview of the connection for all of them.

Again, I should stress that your map of the organizations and prominent researchers here should not be limited by this list. As with anything in machine learning, you are going to need to continually update your knowledge-base, and figure things out for yourself.

Speaking of figuring things out for yourself…

Part 7: Problem-Solving Approaches and Workflows

The ultimate goal behind reading many research papers, working on many projects, and understanding the works of top researchers is to better develop your own approaches. While the workflows of top researchers can be attributed at least partially to intuition from having seen so much, there are still some general patterns and steps you can take for undertaking a machine learning project. Many of these apply for everything from original research to developing models for freelance clients.

Determine if Machine learning is actually necessary: It’s of course not as simple as throwing a neural network at everything. First off, you might want to make sure that for the problem you’re working on Machine learning will actually be an improvement over some other algorithm. You wouldn’t use a neural network to solve FizzBuzz, riiiiiiiight?

Understanding the type of problem: Once you’ve determined that using machine learning would be beneficial, you probably want to determine what specific type of machine learning is useful (or even if a pipeline with multiple steps would be useful). Are you trying to get a model that matches patterns in known data? You’re probably using Supervised learning. Are you trying to uncover patterns you’re not sure exist? It’s likely unsupervised learning that you’re working on. Are you working with data that changes after each output from your model? You’re probably going down the reinforcement learning path.

Check Previous work: It is a wise precaution to see what other previous work has been done on a problem. Take a scan of Github to get some ideas. It’s also worth looking into existing literature on a specific problem. Image-processing, for example, has so many solutions that some refer to it as a solved problem. Facebook’s AIs can already recognize human faces with much greater accuracy than most humans. That being said, it’s likely you will get to a point where even the best existing solutions are inadequate (i.e., pretty much the state of the entire field of NLP for many tasks). When it comes to that, there are a variety of different steps you can incorporate into solving a problem.

Preprocessing and Exploratory Data Analysis: Before you input the data into your model, you should always stop to make sure your dataset is up to snuff. This can involve everything from checking for missing data, to rescaling and filtering the data, to looking at the relationships between parts of the data at a basic level.

For preprocessing, one common technique is to use a zero mean (subtract the mean from each predictor) to center the data, which can be combined with dividing by standard deviation to scale the data. This can be used for anything from tabular data to RGB values in images. Dates and times should be put into a consistent DateTime format. If you have a lot of categorical variables, it is more often than not crucially important to One-Hot encode them. At this stage you should also strive to resolve any outliers (and if possible understand their meaning). If your model is sensitive to outliers, you can try applying a spatial sign. You should also make the effort to eliminate any missing data. This can obviously be problematic if missingness is somehow predictive. Tree-based models are great for deal with missing data, or if you don’t have time for that you can use imputation/interpolation (KNN or intermediate regression model).

The exploratory data analysis can also be useful for getting an intuitive sense for what kinds of models or data reduction techniques could be useful. This is important for finding possible relationships between any and all of the features you might be working with. Calculations such as Maximal Information Coefficients can be useful. Building correlation matrices for the features (i.e., box-charting everything), scatter-plotting and histogram-plotting every combination of features can expand this even more. Don’t get so excited about jumping into using a k-NN classifier that you forget the techniques from simple excel tables, such as using pivot tables and grouping by particular features. Some of your variables might need to be transformed (square, cube, inverse, log, Optimus…wait…what?) before they can be plotted or models can be trained on them. For example, if you’re looking at river flow events or cryptocurrency prices, it will probably be wise to plot values on a log scale. While you’re putting together the boilerplate for automatically doing all these steps for whatever dataset you find, don’t forget the classic summary statistics (mean, mode, minimum, maximum, upper/lower quartiles, identification of >2.5 SD outliers).

Data Reduction: When beginning a project, it’s a good first step to see if reducing the amount of data to be processed will help with the training. There are many techniques for this. You’ve probably heard of using Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA, in the case of classification). Feature Selection, or only using the components that account for a majority of the information when Modeling, can be another easy way to focus on the important information to the model. How do you decide what to remove? Removing low/zero variance predictors (ones that don’t vary with the correct classification), or removing multicollinear heavily correlated features (if there’s a 99% correlation between two features, one of them is possibly useless) can be good heuristics. Other techniques like Isomap or Lasso (in the case of regression) can help even more.

Parameter tuning: Once you do have your model running, i may not be performing exactly as you wanted. Fortunately this can be solved with clever parameter tuning. Unfortunately there are often many parameters for models like neural networks, so some techniques like grid search may take longer than anticipated. Unintuitively, using random search can give improvements over grid search, but even then the dimensionality problem can remain. There is a field focused on efficiently tuning large models. This can involve anything from bayesian optimization, to training SVMs on data of model parameters, to genetic algorithms for architecture search. That being said, it is often the case that once you learn enough about the techniques you’re using in a model (such as an Adam or AdaDelta optimizer), you’ll begin to have an intuition for how to quickly converge on ideal parameters based on the output of the training graphs.

Higher-level modelling techniques: We covered the importance of feature engineering. This can cover everything from basis expansions, to combining features, to properly scaling features based on average values, median values, variances, sums, differences, maximums or minimums, and counts. Algorithms such as Random forest, boosters, and other tree-based models for finding the important features. Clustering, or any models based on distances to class centroids, can also be useful for problems where a lot of feature engineering is needed.

Another advanced technique is the use of stacking or blending. Stacking and Blending are two similar approaches of combining classifiers (ensembling). I recommend reading the Kaggle Ensembling Guide for more detailed information.

However sophisticated your modelling techniques get, don’t forget the importance of acquiring domain knowledge for feature engineering. This is a common strategy among Kaggle competition winners: thoroughly researching the subject of the competition to better influence their decisions for how to build their model. Even if you do not have a lot of domain knowledge, you should be able to account for missing data (It can be information), or add on additional external data (such as with APIs).

Reproducibility: This one is more a quality of workflows than problem-solving strategies. You’re probably not going to do an entire project in one sitting. It’s important to be able to pick up where you left off, or easily be able to start from the beginning with only a few clicks. For model training, make sure you set up your code to have the proper checkpointing and weight-saving. Reproducibility is one of the big reasons why Jupyter notebooks have gotten so popular in machine learning.

Number of jupyter notebooks on github, with projections. Behold the most meta jupyter notebook.

That was a bit of a mouthful. I encourage you to follow the links within there to learn more about the subjects. Once you have gotten the grasp of these different strategies and workflows, the inevitable question is what you should apply them to. If you’re reading this, your goal might be to enter into machine learning as a career. Whether you do this as a freelancer or a full-time engineer, you’re going to need some kind of track record of projects. That’s where the portfolio comes in.

Part 8: Building your portfolio

When you’re transitioning into a new career as a machine learning engineer (or any kind of software-tangential career, not just ML), you may be faced with an all too common conundrum: you’re trying to get work to get experience, but you need experience before you can get the work to get experience. How does one solve this Catch-22? Simple: Portfolio projects.

You often hear about portfolios being a thing that front-end developers or designers put together. It turns out this can be a crucial career-booster for Data Scientists and Machine Learning Engineers. Even if you’re not in the position of looking for work just yet, the goal of building a portfolio can be incredibly useful on its own for learning machine learning

What NOT to include in your portfolio

Before we get into examples, it’s important to make it clear what should not be included in your ML portfolio. For the most part, you have a lot of flexibility when it comes to your portfolio. However, when it comes to projects that could result in your resume being thrown in the trash, there are 3 big ones that come to mind: Survival classification on the Titanic dataset. Handwritten digit classification on the MNIST dataset. Flower species classification using the iris dataset.

The (A) Titanic Dataset, (B) MNIST Dataset, and ( C ) Iris datasets. Working with these is probably only one level above “Hello World” when it comes to machine learning.

These datasets are used so heavily in introductory machine learning and data science courses, that having project based on these will probably hurt you more than help you. These are the types of projects that are already used in the example folders in many machine learning libraries, so there’s probably not that many original uses for them.

Machine learning portfolio ideas

Now that we have that warning out of the way, here’s some suggestions of projects you CAN add to your machine learning portfolio.

Kaggle Competitions

Beyond Kaggle, there are other similar competitions out there. Halite is an AI programming competition created by Two-Sigma investing. This is somewhat more niche than Kaggle competitions, but it can be great if you want to test your skills in reinforcement learning problems. The only downside is that the competition is seasonal, and doesn’t have as many frequent competitions as Kaggle, but if you can get your bot high into the leaderboards when the next competition comes around, this can be a great addition to your portfolio.

It also goes without saying that it’s usually much more impressive to anyone viewing your portfolio if you can use a dataset other than MNIST or CIFAR10.

Implementations of Algorithms in Papers

Many of the newer machine learning algorithms out there are first reported in the form of scientific papers. Reproducing a paper, or reimplementing a paper in a novel setting or on an interesting dataset is a fantastic way to demonstrate your command of the material. Being able to code the usual ML algorithms is one thing, but being able to take a description of an algorithm and then turn it into a working project is a skill that’s far too low in supply. This could involve reimplementing the project in a different language (e.g., Python to C++), a different framework (e.g., if the code for the paper was written in tensorflow, try reimplementing in PyTorch or MXNet), or on different datasets (e.g., bigger datasets or less publically available datasets).

Mobile Apps with Machine Learning (e.g., Not Hotdog Spinoffs)

Before you start with the jokes about NSFW-content-tagging, that was the plotline of several episodes. HBO’s writers already beat you to the punch there.

If you’re looking for work in machine learning, chances are you won’t just be making standalone JuPyter notebooks. If you can demonstrate that you can integrate . Since libraries like tensorflow.js have come out for doing machine learning in javascript, this is also a fantastic opportunity to try integrating ML into react or react native applications. If you’re really scraping the bottom of the barrel for ideas, there’s always the classic “Not Hotdog” from HBO’s Silicon Valley.

Of course, copying the exact app probably won’t be enough (after all, the joke was how poorly the app was prepared to handle anything other than hotdog and not hotdog. What additional features can you add? Can you increase the accuracy? Can you make it classify condiments as well? How big of a variety of foods can you get it to classify? Can you also get it to provide nutritional or allergy information?

Hackathons and other competitions

In the absence of anything else, projects are often judged based on the impact they’ve had or the notoriety they’ve received. One of the easiest ways to get an impressive project in this regard is to put a hackathon project into your portfolio. I’ve taken this approach in the past with projects I’ve done as part of hackathons at MassChallenge or the MIT Policy Hackathon. Being a track or prize-winner can be a fantastic addition to your portfolio. The only downside is that hackathon projects (including the edge cases) are basically glorified demos. They are often terrible at standing up to much scrutiny or edge cases. You may want to polish you code a bit before adding it to your portfolio.

Hackathons don’t always involve a lot of looking under the hood. If you want a project to be representative of your coding, maybe put a little bit more than just 24 hours into your portfolio piece.

Don’t feel the need to restrict yourself to these ideas too much. You can also add any talks you’ve given, livestream demos you’ve recorded, or even online classes you’ve taught. If you’re looking for any other inspiration, you can take a look at my portfolio site as an example.

A screenshot from my site. Clickable links, images, and descriptions of what was so special about the projects I’ve built & contributed to all go really far.

Above all else, it’s important to remember that a portfolio is always a work in process. It’s never something that you will 100%. If you wait until that point before you start applying to jobs and advertising your skills, you’ve waited too long.

Part 9: Freelancing as an ML developer

There may be many areas of Machine Learning you might be interested in doing research in. When it comes to getting hands-on experience and immersion. Working on paid ML work is the next level up. It’s also incredibly easy to get started.

For sites to do freelancing on, I recommend turning to Upwork or Freelancer. Freelancer requires payment for taking the skill tests on their site, so Upwork may be superior in that sense (at least, that’s why I chose it).

If you’re looking to delegate more on the side of project management and screening potential clients, Toptal might be a good option. Toptal Screens potential clients for you, as well as provides support on project management. The only downside is that they also heavily screen freelancers as well (They advertise that they only hire the “Top 3% of freelancers”. Whether or not that exact statistic is true, they are nonetheless very selective). Becoming a freelancer with Toptal will require passing a timed coding test, as well as passing a few interviews.

You may have also built up a neat portfolio geared towards the ML subfield you’re interested in. This portfolio solves one problem with places hiring “junior” machine learning developers, but another remains. Few people/organizations are looking for anything other than “Senior” ML developers. I’ve seen job postings that require +5 years of experience with libraries like Tensorflow, despite the fact that Tensorflow has only been out for 3 years. Why does this happen? Most places that are hiring for ML work, regardless of specifics of the job description, are pretty much looking for the same thing: a Machine Learning Mary Poppins to come in and solve all their problems.

…except replace the dancing penguins & singing chimney-sweeps with data cleaning & hyperparameter optimization.

To increase your chances of convincing an organization you’re the solution to their problems, it helps to build up a track record of successful projects. In my case, I met with my first clients in person and agreed on a project with them first, before the payment and contract was set up on Upwork. The advantage of this method is that if your first client is someone you know, you can get a starting reputation on the site, and potentially get some constructive criticism at the same time.

The work you DO end up getting may be slightly different from the goals you had in mind when creating your portfolio. With that, your goal may have been to demonstrate that you could code well, or implement a research paper in code, or do a cool project. Freelance clients will only care about one thing: Can you use ML to solve their problems.

They say it’s better to learn from the mistakes of others instead of just relying on your own. You can find such freelancing horror stories curated at Clients From Hell. While most of these examples are from freelance artists, designers, and web developers, you may encounter some similar types (e.g., poor communicators, clients who overestimate the capabilities of even state-of-the-art machine learning, people with tiny or even nonexistent budgets, and even the occasional racist).

Roughly the same look on my face when I first saw this. Be aware that searching for clients comes with it a non-zero risk of blindness…….from rolling your eyes so hard that your retinas detach.

While it’s amusing to poke fun at some of the more extreme cases, it’s also important to hold yourself to a high standard when it comes to working for your clients. If your client is proposing something that is not possible with the current state of ML as a field, do not try and prey on their ignorance (that WILL come back to bite you). For example, I had one client reach out to me about original content summarization, and how they wanted to integrate it into their project. After doing some research, I presented them with the performance results of some of Google Brain’s summarization experiments. I told them that even with the resources of Google, these results were still far below human performance on summarization, and that I could not guarantee better performance than the world’s state of the art. The interviewer thanked me for my honesty. Obviously I did not get that particular contract, but if I had lied and said that it was possible, then I would have been faced with an impossible task, that likely would have resulted in an incomplete project (and it would have taken a long time to get that stain off my reputation). When it comes to expectations, be absolutely transparent.

They say that trust has a half-life of 6 weeks. This actually is false. This applies when you are working in an office environment, but if you’re doing remote work, trust can have a much shorter half life. Think 6 days instead of 6 weeks.

Over time, as you get new clients and grow your reputation, you will be able to earn more as a freelancer and transition to more and more interesting projects.

At some point, however, you may decide that you prefer something with more stability. This is a conclusion I eventually came to, even after working with a company like Google as a contractor (the very first machine learning contractor that the Tensorflow team ever hired). When I did, I decided to take the leap to interview for full-time machine learning engineer positions.

Part 10: Interviewing for Full-time Machine Learning Engineer Positions

This is by far the most intense part of the machine learning journey. Interviewing with companies is often much more intense than interviewing with individual freelance clients (though most companies that hire freelancers will do pretty thorough interviews for contract work as well). If you’re interviewing with smaller startups, then they may be much more flexible with their hiring process (compared to companies like Facebook or Amazon, where an entire sub-industry has sprang up around teaching people how to interview for those). Regardless of who you’re interviewing with, just remember the following general steps.

The first step is to to come up with a compelling “why”, as in what do you want. Take time to reflect on your own thoughts and motivations. This will allow you to focus on what you are looking for, and will probably help you with answering questions about what you’re looking for.

The next phase is to put together a study plan for your interview. I would plan for about 3 to 6 months of studying the subjects from earlier in this post. This assumes you’ve already put together some kind of portfolio from either projects, or doing freelance work. For this phase, you should spend at least 2 hours per day studying algorithms and data structures, as well as additional time for reviewing the requisite math, machine learning concepts, and statistics. Put together flashcards for important concepts, but make sure to combine it with solving actual coding problems.

Make sure you put together a resume and portfolio. The resume should be one page. You can follow the steps from earlier to put together your portfolio. Once your resume is together, you can start reaching out to companies.

Sites like Angel.co and VentureLoop can provide listings of openings available at startups. Y Combinator also has a page with job listings for their companies. Don’t feel like you just need to rely on these listing sites. Ask friends on social media if they’re aware of companies looking for machine learning engineers, or perhaps even ask if they know about specific companies. You can also find technical recruiters for specific companies by searching “site:linkedin.com <COMPANY NAME> technical recruiter”. It’s also possible that, depending on how much prior freelancing you’ve done before applying, you may get far more recruiters reaching out to you. This was my case, as after many months of freelancing for clients like Google, I was getting on average 3.5 messages from recruiters per day. This is one advantage of transitioning from freelancing to full-time when becoming a machine learning engineer.

Once you’ve got an interview, or several, with your company of choice, now you need to pass the actual interview. In the early stages, there will likely be a lot of behavioral questions. Questions along the lines such as what motivates you, what you would do in a variety of given scenarios, examples of times you’ve struggled and overcome said struggle. If you pass this part, you’ll often come to the technical interview. For the most part, do the technical interview in whichever language is strongest for you. Answering the questions in python should be more tolerable in this case, as this is the lingua-franca of machine-learning. You will likely need to be able to do both standard data structures and algorithms questions, as well as things like implementing certain machine learning algorithms like linear regression or image convolution from scratch. Much like an iOS engineer would be asked about model-object-view, or how a back-end developer would be asked about system design, you’re going to be asked a lot about how to approach specific problems in ML. For the machine-learning specific questions, if you’ve studied enough of the material referred-to in the previous parts of this blog post, you should have some level of preparation. For the algorithms interviews, I recommend practicing these interviews every day. Some great resources for this include LeetCode, InterviewCake, and interviewing.io (the latter of which provides mock interviews with actual engineers).

Interviews are often intentionally intense because for the company, a false positive is far riskier than a false negative. But yes…. it does sometimes feel like this.

The interview process can take a long time. For small companies it may be 2 weeks or less. For larger companies it can take longer. Try to set up as many interviews with companies you are less interested in for the sake of practice. It’s often the case that someone will interview with 10 companies, and then by the 9th interview have gotten so used to the interview process itself that the 10th ends up being a breeze.

Once you do pass the interview, you will come to the negotiation phase. Don’t be afraid to negotiate (a pretty compelling overview on how and why you should negotiate in this blog post). You will be surprised at how flexible many companies are. Just make sure you don’t try to negotiate AFTER you’ve already signed an agreement.

Once you’re past the negotiation stage and you’ve accepted an offer, congratulations!

Part 11: Career trajectory and future steps

So you’ve now got an established career as a machine learning engineer. After months or years in this space, you then might begin to ask yourself,

“What comes next?”

Meanwhile, one of my friends has just lost a bet that I couldn’t find a way to put a ham-fisted Hamilton reference in this article.

Tenure at Tech companies is often notoriously short. I believe the average for companies like Google is about 3.2 years. For many companies it’s even less. At some point, as you’re figuring out new ways of solving data problems for whatever company or group you’re part of, you’ll start to wonder what you want to do with your new skills for the next decade or so.

If you’re feeling like you want to apply your skills towards public good, there are many options there as well. Check out Code for America, or CodeFellows if you have your eyes outside the U.S..

Effective Altruism may also be a good resource. If you cannot decide on a specific issue, or you prefer to just focus on the fun machine-learning tasks in front of you, you could always take up the earning-to-give pledge. Machine Learning Engineers are often high-earners, so you could do a lot of good by pledging a certain amount to optimal charities.

Whichever path you take, keep in mind that Machine learning is one of those areas where you can learn the landscape in a very short amount of time, but true mastery takes much longer. Make sure you have an attitude of always being a student, always looking to improve, and no matter how far you get in your ML career, NEVER resting on your laurels (I recommend reading Peter Norvig’s post (Peter Norvig of Google) Teach Yourself Programming in 10 Years).

Credit goes to xsullo for this awesome image, originally made for Quanta Magazine (which often has interesting articles on research at the frontiers of Machine learning, among other subjects)

Part 12: Habits for Improved Productivity & Learning

We’re not quite done here. It’s worth also listing some general habits that are important to keep while studying, even after you’ve attained whatever academic or professional status you were looking for. Learning machine learning is going to be a marathon, not a sprint. This applies whether you’re in or out of school.

Get a full night’s sleep

If you follow any advice from this post, even if you ignore the machine learning checklist from earlier, follow this: make sure you get your sleep cycle in order. Becoming a machine learning engineer is as much about stamina as it is about speed & efficiency. Not only will your mood and cognitive abilities increase, but you’ll have a much better chance of staving off dementia and Alzheimer’s in the long term. as you maintain your sleep schedule even as your daily schedule gets more complex you’ll find that it will become much more easier and satisfying.

I remember a friend of mine recommended Qualia to help with productivity. One of the recommendations was that I use it while getting a full night’s sleep. Using Qualia while also getting a full night’s sleep definitely yielded interesting results. However, it is unknown how much of this productivity is due to the Qualia, or is due to the Sleep. It is entirely possible that most if not all was due to sleep, and that this is more of a “Stone Soup” situation. Nonetheless, if you want to experiment with it in greater rigor than I had time to, go ahead.

Stay away from Social Media

This might be controversial, considering that so many machine learning developers and researchers are often on Twitter, but you should probably limit the amount of time you spend on sites like Facebook. Ask yourself this: “When was the last time any of the news articles shared in my feed impacted my life?” It quite possibly hasn’t been ever. If you’re worried about keeping in touch with friends and family members, chances are you can give the close ones to you other contact info like your phone number or email address. Those other connections that are effectively ghosts? You can reconnect with them later if you want. If you don’t want to fall into temptation, use a chrome extension to block your wall in Facebook (messenger might be a lot more helpful). Delete Snapchat if you have it (or if you haven’t deleted it already).

Granted, this is not a universal approach. Twitter is often a useful feed, and often features many useful resources. Here are some of the people I am following, whom I highly recommend. Quora is also another maybe. Definitely take it in strides. If you can, spend more time answering questions related to deep learning rather than reading the 1001st motivational post from another 20-year-old self-proclaimed “millionaire entrepreneur” trying to sell you “5 secrets to becoming just like them” (a possible goal for you: getting a job at Quora and helping them cut down on spam posts).

One of the best ways I’ve found to deal with the short-term social media withdrawal that came early was to replace it with something similar yet more in line with my long-term goals. Specifically, I replaced the time I used to spend on Facebook with time spent on Github, finding interesting developers and projects to follow, cool repos to fork, and working on. If you need some more time to fully wean yourself off of your Facebook feed, this (disclaimer: sample size of n=1, your results may vary).

Eat a healthier diet

Another important consideration for optimizing your learning is to maintain a healthy diet. If you’re subsisting on junk food, it’s going to catch up to you. The sugar rushes and sluggishness are going to hinder you in the long run (and in many cases, in the short-run as well). If you’re just eating nothing but the cheapest coffee and ramen that you can get, guess what, you’re going to get what you pay for (which is not going to be much at all).

As a general rule, stay away from carbohydrates. There are many variants on this strategy (e.g., the increasingly popular ketogenic diet, the Bulletproof diet, etc.), but the idea is basically the same. If you can get your body to rely more on proteins and fats for energy than sugars, you will be less subject to the insulin spikes that can mess with your energy levels throughout the day, and take you out of the state of flow and concentration that helps you perform your best.

Of course, completely going cold turkey on anything carbohydrate-related might not be as practical if your machine learning work. The temptation for stress-eating might be pretty strong. One compromise might be to go with the “Slow-carb” diet that Tim Ferriss famously described. This approach may sound great, but a word of warning: this approach works because you’re consuming massive amounts of fiber, i.e., whatever you eat on your cheat day, be prepared for it to come out the other end in roughly the same quantity…and probably all at once…the next day. If you’re mentally prepared for that, go right ahead.