Artificial intelligence is reshaping business—though not at the blistering pace many assume. True, AI is now guiding decisions on everything from crop harvests to bank loans, and once pie-in-the-sky prospects such as totally automated customer service are on the horizon. The technologies that enable AI, like development platforms and vast processing power and data storage, are advancing rapidly and becoming increasingly affordable. The time seems ripe for companies to capitalize on AI. Indeed, we estimate that AI will add $13 trillion to the global economy over the next decade.
Yet, despite the promise of AI, many organizations’ efforts with it are falling short. We’ve surveyed thousands of executives about how their companies use and organize for AI and advanced analytics, and our data shows that only 8% of firms engage in core practices that support widespread adoption. Most firms have run only ad hoc pilots or are applying AI in just a single business process.
Why the slow progress? At the highest level, it’s a reflection of a failure to rewire the organization. In our surveys and our work with hundreds of clients, we’ve seen that AI initiatives face formidable cultural and organizational barriers. But we’ve also seen that leaders who at the outset take steps to break down those barriers can effectively capture AI’s opportunities. Making the Shift
One of the biggest mistakes leaders make is to view AI as a plug-and-play technology with immediate returns. Deciding to get a few projects up and running, they begin investing millions in data infrastructure, AI software tools, data expertise, and model development. Some of the pilots manage to eke out small gains in pockets of organizations. But then months or years pass without bringing the big wins executives expected. Firms struggle to move from the pilots to companywide programs—and from a focus on discrete business problems, such as improved customer segmentation, to big business challenges, like optimizing the entire customer journey.
Leaders also often think too narrowly about AI requirements. While cutting-edge technology and talent are certainly needed, it’s equally important to align a company’s culture, structure, and ways of working to support broad AI adoption. But at most businesses that aren’t born digital, traditional mindsets and ways of working run counter to those needed for AI.
To scale up AI, companies must make three shifts: From siloed work to interdisciplinary collaboration.
AI has the biggest impact when it’s developed by cross-functional teams with a mix of skills and perspectives. Having business and operational people work side by side with analytics experts will ensure that initiatives address broad organizational priorities, not just isolated business issues. Diverse teams can also think through the operational changes new applications may require—they’re likelier to recognize, say, that the introduction of an algorithm that predicts maintenance needs should be accompanied by an overhaul of maintenance workflows. And when development teams involve end users in the design of applications, the chances of adoption increase dramatically. From experience-based, leader-driven decision making to data-driven decision making at the front line.
When AI is adopted broadly, employees up and down the hierarchy will augment their own judgment and intuition with algorithms’ recommendations to arrive at better answers than either humans or machines could reach on their own. But for this approach to work, people at all levels have to trust the algorithms’ suggestions and feel empowered to make decisions—and that means abandoning the traditional top-down approach. If employees have to consult a higher-up before taking action, that will inhibit the use of AI.
Leonardo Ulian
Decision processes shifted dramatically at one organization when it replaced a complex manual method for scheduling events with a new AI system. Historically, the firm’s event planners had used colored tags, pins, and stickers to track conflicts, participants’ preferences, and other considerations. They’d often relied on gut instinct and on input from senior managers, who also were operating on their instincts, to make decisions. The new system rapidly analyzed the vast range of scheduling permutations, using first one algorithm to distill hundreds of millions of options into millions of scenarios, and then another algorithm to boil down those millions into just hundreds, ranking the optimal schedules for each participant. Experienced human planners then applied their expertise to make final decisions supported by the data, without the need to get input from their leaders. The planners adopted the tool readily, trusting its output because they’d helped set its parameters and constraints and knew that they themselves would make the final call. From rigid and risk-averse to agile, experimental, and adaptable.
Organizations must shed the mindset that an idea needs to be fully baked or a business tool must have every bell and whistle before it’s deployed. On the first iteration, AI applications rarely have all their desired functionality. A test-and-learn mentality will reframe mistakes as a source of discoveries, reducing the fear of failure. Getting early user feedback and incorporating it into the next version will allow firms to correct minor issues before they become costly problems. Development will speed up, enabling small AI teams to create minimum viable products in a matter of weeks rather than months. Such fundamental shifts don’t come easily. They require leaders to prepare, motivate, and equip the workforce to make a change. But leaders must first be prepared themselves. We’ve seen failure after failure caused by the lack of a foundational understanding of AI among senior executives. (Further on, we’ll discuss how analytics academies can help leaders acquire that understanding.)
Setting Up for Success
To get employees on board and smooth the way for successful AI launches, leaders should devote early attention to several tasks: Explaining why.
A compelling story helps organizations understand the urgency of change initiatives and how all will benefit from them. This is particularly critical with AI projects, because fear that AI will take away jobs increases employees’ resistance to it.
Leaders have to provide a vision that rallies everyone around a common goal. Workers must understand why AI is important to the business and how they’ll fit into a new, AI-oriented culture. In particular, they need reassurance that AI will enhance rather than diminish or even eliminate their roles. (Our research shows that the majority of workers will need to adapt to using AI rather than be replaced by AI.)
At most firms that aren’t born digital, mindsets run counter to those needed for AI.
When a large retail conglomerate wanted to get its employees behind its AI strategy, management presented it as an existential imperative. Leaders described the threat that digital retailers posed and how AI could help fend it off by improving the firm’s operational efficiency and responsiveness. By issuing a call to arms in a fight for survival, management underscored the critical role that employees had to play.
In sharing their vision, the company’s leaders put a spotlight on workers who had piloted a new AI tool that helped them optimize stores’ product assortments and increase revenue. That inspired other workers to imagine how AI could augment and elevate their performance. Anticipating unique barriers to change.
Some obstacles, such as workers’ fear of becoming obsolete, are common across organizations. But a company’s culture may also have distinctive characteristics that contribute to resistance. For example, if a company has relationship managers who pride themselves on being attuned to customer needs, they may reject the notion that a machine could have better ideas about what customers want and ignore an AI tool’s tailored product recommendations. And managers in large organizations who believe their status is based on the number of people they oversee might object to the decentralized decision making or reduction in reports that AI could allow.
In other cases, siloed processes can inhibit the broad adoption of AI. Organizations that assign budgets by function or business unit may struggle to assemble interdisciplinary agile teams, for example.
Some solutions can be found by reviewing how past change initiatives overcame barriers. Others may involve aligning AI initiatives with the very cultural values that seem like obstacles. At one financial institution with a strong emphasis on relationship banking, for example, leaders highlighted AI’s ability to enhance ties with customers. The bank created a booklet for relationship managers that showed how combining their expertise and skills with AI’s tailored product recommendations could improve customers’ experiences and increase revenue and profit. The AI adoption program also included a contest for sales conversions driven by using the new tool; the winners’ achievements were showcased in the CEO’s monthly newsletter to employees.
Leonardo Ulian
A relatively new class of expert, analytics translators, can play a role in identifying roadblocks. These people bridge the data engineers and scientists from the technical realm with the people from the business realm—marketing, supply chain, manufacturing, risk personnel, and so on. Translators help ensure that the AI applications developed address business needs and that adoption goes smoothly. Early in the implementation process, they may survey end users, observe their habits, and study workflows to diagnose and fix problems.
Understanding the barriers to change can not only inform leaders about how to communicate with the workforce but also help them determine where to invest, what AI initiatives are most feasible, what training should be offered, what incentives may be necessary, and more. Budgeting as much for integration and adoption as for technology (if not more).
In one of our surveys nearly 90% of the companies that had engaged in successful scaling practices had spent more than half of their analytics budgets on activities that drove adoption, such as workflow redesign, communication, and training. Only 23% of the remaining companies had committed similar resources.
Relationship managers may reject the notion that a machine knows what customers want. Consider one telecom provider that was launching a new AI-driven customer-retention program in its call center. The company invested simultaneously in AI model development and in helping the center’s employees transition to the new approach. Instead of just reacting to calls canceling service, they would proactively reach out to customers at risk of defection, giving them AI-generated recommendations on new offers they’d be likely to accept. The employees got training and on-the-job coaching in the sales skills needed to close the business. Coaches and managers listened in on their calls, gave them individualized feedback, and continually updated the training materials and call scripts. Thanks to those coordinated efforts, the new program reduced customer attrition by 10%.
Balancing feasibility, time investment, and value.
Pursuing initiatives that are unduly difficult to implement or require more than a year to launch can sabotage both current and future AI projects.
Organizations needn’t focus solely on quick wins; they should develop a portfolio of initiatives with different time horizons. Automated processes that don’t need human intervention, such as AI-assisted fraud detection, can deliver a return in months, while projects that require human involvement, such as AI-supported customer service, are likely to pay off over a longer period. Prioritization should be based on a long-term (typically three-year) view and take into consideration how several initiatives with different time lines could be combined to maximize value. For example, to achieve a view of customers detailed enough to allow AI to do microsegmentation, a company might need to set up a number of sales and marketing initiatives. Some, such as targeted offers, might deliver value in a few months, while it might take 12 to 18 months for the entire suite of capabilities to achieve full impact.
An Asian Pacific retailer determined that an AI initiative to optimize floor space and inventory placement wouldn’t yield its complete value unless the company refurbished all its stores, reallocating the space for each category of goods. After much debate, the firm’s executives decided the project was important enough to future profitability to proceed—but not without splitting it in two. Part one produced an AI tool that gave store managers recommendations for a few incremental items that would sell well in their outlets. The tool provided only a small fraction of the total return anticipated, but the managers could get the new items into stores immediately, demonstrating the project’s benefits and building enthusiasm for the multiyear journey ahead. Organizing for Scale
There’s a lot of debate about where AI and analytics capabilities should reside within organizations. Often leaders simply ask, “What organizational model works best?” and then, after hearing what succeeded at other companies, do one of three things: consolidate the majority of AI and analytics capabilities within a central “hub”; decentralize them and embed them mostly in the business units (“the spokes”); or distribute them across both, using a hybrid (“hub-and-spoke”) model. We’ve found that none of these models is always better than the others at getting AI up to scale; the right choice depends on a firm’s individual situation.
Companies with good scaling practices spent half their analytics budgets on adoption.
Consider two large financial institutions we’ve worked with. One consolidated its AI and analytics teams in a central hub, with all analytics staff reporting to the chief data and analytics officer and being deployed to business units as needed. The second decentralized nearly all its analytics talent, having teams reside in and report to the business units. Both firms developed AI on a scale at the top of their industry; the second organization grew from 30 to 200 profitable AI initiatives in just two years. And both selected their model after taking into account their organizations’ structure, capabilities, strategy, and unique characteristics. The hub.
A small handful of responsibilities are always best handled by a hub and led by the chief analytics or chief data officer. These include data governance, AI recruiting and training strategy, and work with third-party providers of data and AI services and software. Hubs should nurture AI talent, create communities where AI experts can share best practices, and lay out processes for AI development across the organization. Our research shows that companies that have implemented AI on a large scale are three times as likely as their peers to have a hub and 2.5 times as likely to have a clear methodology for creating models, interpreting insights, and deploying new AI capabilities.
Hubs should also be responsible for systems and standards related to AI. These should be driven by the needs of a firm’s initiatives, which means they should be developed gradually, rather than set up in one fell swoop, before business cases have been determined. We’ve seen many organizations squander significant time and money—spending hundreds of millions of dollars—up front on companywide data-cleaning and data-integration projects, only to abort those efforts midway, realizing little or no benefits.
In contrast, when a European bank found that conflicting data-management strategies were hindering its development of new AI tools, it took a slower approach, making a plan to unify its data architecture and management over the next four years as it built various business cases for its AI transformation. This multiphase program, which also includes an organizational redesign and a revised talent strategy, is expected to have an annual impact of more than $900 million. The spokes.
Another handful of responsibilities should almost always be owned by the spokes, because they’re closest to those who will be using the AI systems. Among them are tasks related to adoption, including end-user training, workflow redesign, incentive programs, performance management, and impact tracking.
To encourage customers to embrace the AI-enabled services offered with its smart, connected equipment, one manufacturer’s sales and service organization created a “SWAT team” that supported customers using the product and developed a pricing plan to boost adoption. Such work is clearly the bailiwick of a spoke and can’t be delegated to an analytics hub.
Organizing AI for Scale
AI-enabled companies divide key roles between a hub and spokes. A few tasks are always owned by the hub, and the spokes always own execution. The rest of the work falls into a gray area, and a firm’s individual characteristics determine where it should be done.
The gray area.
Much of the work in successful AI transformations falls into a gray area in terms of responsibility. Key tasks—setting the direction for AI projects, analyzing the problems they’ll solve, building the algorithms, designing the tools, testing them with end users, managing the change, and creating the supporting IT infrastructure—can be owned by either the hub or the spoke, shared by both, or shared with IT. Deciding where responsibility should lie within an organization is not an exact science, but it should be influenced by three factors:
The maturity of AI capabilities. When a company is early in its AI journey, it often makes sense for analytics executives, data scientists, data engineers, user interface designers, visualization specialists who graphically interpret analytics findings, and the like to sit within a hub and be deployed as needed to the spokes. Working together, these players can establish the company’s core AI assets and capabilities, such as common analytics tools, data processes, and delivery methodologies. But as time passes and processes become standardized, these experts can reside within the spokes just as (or more) effectively.
Business model complexity. The greater the number of business functions, lines of business, or geographies AI tools will support, the greater the need to build guilds of AI experts (of, say, data scientists or designers). Companies with complex businesses often consolidate these guilds in the hub and then assign them out as needed to business units, functions, or geographies.
The pace and level of technical innovation required. When they need to innovate rapidly, some companies put more gray-area strategy and capability building in the hub, so they can monitor industry and technology changes better and quickly deploy AI resources to head off competitive challenges.
Let’s return to the two financial institutions we discussed earlier. Both faced competitive pressures that required rapid innovation. However, their analytics maturity and business complexity differed.
The institution that placed its analytics teams within its hub had a much more complex business model and relatively low AI maturity. Its existing AI expertise was primarily in risk management. By concentrating its data scientists, engineers, and many other gray-area experts within the hub, the company ensured that all business units and functions could rapidly access essential know-how when needed.
The second financial institution had a much simpler business model that involved specializing in fewer financial services. This bank also had substantial AI experience and expertise. So it was able to decentralize its AI talent, embedding many of its gray-area analytics, strategy, and technology experts within the business-unit spokes.
As these examples suggest, some art is involved in deciding where responsibilities should live. Every organization has distinctive capabilities and competitive pressures, and the three key factors must be considered in totality, rather than individually. For example, an organization might have high business complexity and need very rapid innovation (suggesting it should shift more responsibilities to the hub) but also have very mature AI capabilities (suggesting it should move them to the spokes). Its leaders would have to weigh the relative importance of all three factors to determine where, on balance, talent would most effectively be deployed. Talent levels (an element of AI maturity) often have an outsize influence on the decision. Does the organization have enough data experts that, if it moved them permanently to the spokes, it could still fill the needs of all business units, functions, and geographies? If not, it would probably be better to house them in the hub and share them throughout the organization. Oversight and execution.
While the distribution of AI and analytics responsibilities varies from one organization to the next, those that scale up AI have two things in common:
A governing coalition of business, IT, and analytics leaders. Fully integrating AI is a long journey. Creating a joint task force to oversee it will ensure that the three functions collaborate and share accountability, regardless of how roles and responsibilities are divided. This group, which is often convened by the chief analytics officer, can also be instrumental in building momentum for AI initiatives, especially early on.
Assignment-based execution teams. Organizations that scale up AI are twice as likely to set up interdisciplinary teams within the spokes. Such teams bring a diversity of perspectives together and solicit input from frontline staff as they build, deploy, and monitor new AI capabilities. The teams are usually assembled at the outset of each initiative and draw skills from both the hub and the spokes. Each generally includes the manager in charge of the new AI tool’s success (the “product owner”), translators, data architects, engineers and scientists, designers, visualization specialists, and business analysts. These teams address implementation issues early and extract value faster.
Some art is involved in deciding where AI responsibilities and roles should live.
For example, at the Asian Pacific retailer that was using AI to optimize store space and inventory placement, an interdisciplinary execution team helped break down walls between merchandisers (who determined how items would be displayed in stores) and buyers (who chose the range of products). Previously, each group had worked independently, with the buyers altering the AI recommendations as they saw fit. That led to a mismatch between inventory purchased and space available. By inviting both groups to collaborate on the further development of the AI tool, the team created a more effective model that provided a range of weighted options to the buyers, who could then choose the best ones with input from the merchandisers. At the end of the process, gross margins on each product category that had applied the tool increased by 4% to 7%. Educating Everyone
To ensure the adoption of AI, companies need to educate everyone, from the top leaders down. To this end some are launching internal AI academies, which typically incorporate classroom work (online or in person), workshops, on-the-job training, and even site visits to experienced industry peers. Most academies initially hire external faculty to write the curricula and deliver training, but they also usually put in place processes to build in-house capabilities.
Every academy is different, but most offer four broad types of instruction: Leadership.
Most academies strive to give senior executives and business-unit leaders a high-level understanding of how AI works and ways to identify and prioritize AI opportunities. They also provide discussions of the impact on workers’ roles, barriers to adoption, and talent development, and offer guidance on instilling the underlying cultural changes required. Analytics.
Here the focus is on constantly sharpening the hard and soft skills of data scientists, engineers, architects, and other employees who are responsible for data analytics, data governance, and building the AI solutions. Translator.
Analytics translators often come from the business staff and need fundamental technical training—for instance, in how to apply analytical approaches to business problems and develop AI use cases. Their instruction may include online tutorials, hands-on experience shadowing veteran translators, and a final “exam” in which they must successfully implement an AI initiative. 10 Ways to Derail an AI Program Despite big investments, many organizations get disappointing results from their AI and analytics efforts. What makes programs go off track? Companies set themselves up to fail when:
They lack a clear understanding of advanced analytics, staffing up with data scientists, engineers, and other key players without realizing how advanced and traditional analytics differ. They don’t assess feasibility, business value, and time horizons, and launch pilots without thinking through how to balance short-term wins in the first year with longer-term payoffs. They have no strategy beyond a few use cases, tackling AI in an ad hoc way without considering the big-picture opportunities and threats AI presents in their industry. They don’t clearly define key roles, because they don’t understand the tapestry of skill sets and tasks that a strong AI program requires. They lack “translators,” or experts who can bridge the business and analytics realms by identifying high-value use cases, communicating business needs to tech experts, and generating buy-in with business users. They isolate analytics from the business, rigidly centralizing it or locking it in poorly coordinated silos, rather than organizing it in ways that allow analytics and business experts to work closely together. They squander time and money on enterprisewide data cleaning instead of aligning data consolidation and cleanup with their most valuable use cases. They fully build out analytics platforms before identifying business cases, setting up architectures like data lakes without knowing what they’ll be needed for and often integrating platforms with legacy systems unnecessarily. They neglect to quantify analytics’ bottom-line impact, lacking a performance management framework with clear metrics for tracking each initiative. They fail to focus on ethical, social, and regulatory implications, leaving themselves vulnerable to potential missteps when it comes to data acquisition and use, algorithmic bias, and other risks, and exposing themselves to social and legal consequences.
For more details, read “Ten Red Flags Signaling Your Analytics Program Will Fail” on McKinsey.com. Read more End user.
Frontline workers may need only a general introduction to new AI tools, followed by on-the-job training and coaching in how to use them. Strategic decision makers, such as marketers and finance staff, may require higher-level training sessions that incorporate real business scenarios in which new tools improve decisions about, say, product launches. Reinforcing the Change
Most AI transformations take 18 to 36 months to complete, with some taking as long as five years. To prevent them from losing momentum, leaders need to do four things: Walk the talk.
Role modeling is essential. For starters, leaders can demonstrate their commitment to AI by attending academy training.
But they also must actively encourage new ways of working. AI requires experimentation, and often early iterations don’t work out as planned. When that happens, leaders should highlight what was learned from the pilots. That will help encourage appropriate risk taking.
The most effective role models we’ve seen are humble. They ask questions and reinforce the value of diverse perspectives. They regularly meet with staff to discuss the data, asking questions such as “How often are we right?” and “What data do we have to support today’s decision?”
The CEO of one specialty retailer we know is a good example. At every meeting she goes to, she invites attendees to share their experience and opinions—and offers hers last. She also makes time to meet with business and analytics employees every few weeks to see what they’ve done—whether it’s launching a new pilot or scaling up an existing one. Make businesses accountable.
It’s not uncommon to see analytics staff made the owners of AI products. However, because analytics are simply a means of solving business problems, it’s the business units that must lead projects and be responsible for their success. Ownership ought to be assigned to someone from the relevant business, who should map out roles and guide a project from start to finish. Sometimes organizations assign different owners at different points in the development life cycle (for instance, for proof of value, deployment, and scaling). That’s a mistake too, because it can result in loose ends or missed opportunities.
A scorecard that captures project performance metrics for all stakeholders is an excellent way to align the goals of analytics and business teams. One airline company, for instance, used a shared scorecard to measure rate of adoption, speed to full capability, and business outcomes for an AI solution that optimized pricing and booking. Track and facilitate adoption.
Comparing the results of decisions made with and without AI can encourage employees to use it. For example, at one commodity company, traders learned that their non-AI-supported forecasts were typically right only half the time—no better than guessing. That discovery made them more open to AI tools for improved forecasting.
The business units must lead AI projects and be responsible for their success.
Teams that monitor implementation can correct course as needed. At one North American retailer, an AI project owner saw store managers struggling to incorporate a pilot’s output into their tracking of store performance results. The AI’s user interface was difficult to navigate, and the AI insights generated weren’t integrated into the dashboards the managers relied on every day to make decisions. To fix the issue, the AI team simplified the interface and reconfigured the output so that the new data stream appeared in the dashboard. Provide incentives for change.
Acknowledgment inspires employees for the long haul. The CEO of the specialty retailer starts meetings by shining a spotlight on an employee (such as a product manager, a data scientist, or a frontline worker) who has helped make the company’s AI program a success. At the large retail conglomerate, the CEO created new roles for top performers who participated in the AI transformation. For instance, he promoted the category manager who helped test the optimization solution during its pilot to lead its rollout across stores—visibly demonstrating the career impact that embracing AI could have.
Finally, firms have to check that employees’ incentives are truly aligned with AI use. This was not the case at a brick-and-mortar retailer that had developed an AI model to optimize discount pricing so that it could clear out old stock. The model revealed that sometimes it was more profitable to dispose of old stock than to sell it at a discount, but the store personnel had incentives to sell everything, even at steep discounts. Because the AI recommendations contradicted their standard, rewarded practice, employees became suspicious of the tool and ignored it. Since their sales incentives were also closely tied to contracts and couldn’t easily be changed, the organization ultimately updated the AI model to recognize the trade-off between profits and the incentives, which helped drive user adoption and lifted the bottom line. CONCLUSION
The actions that promote scale in AI create a virtuous circle. The move from functional to interdisciplinary teams initially brings together the diverse skills and perspectives and the user input needed to build effective tools. In time, workers across the organization absorb new collaborative practices. As they work more closely with colleagues in other functions and geographies, employees begin to think bigger—they move from trying to solve discrete problems to completely reimagining business and operating models. The speed of innovation picks up as the rest of the organization begins to adopt the test-and-learn approaches that successfully propelled the pilots.
As AI tools spread throughout the organization, those closest to the action become increasingly able to make decisions once made by those above them, flattening organizational hierarchies. That encourages further collaboration and even bigger thinking.
The ways AI can be used to augment decision making keep expanding. New applications will create fundamental and sometimes difficult changes in workflows, roles, and culture, which leaders will need to shepherd their organizations through carefully. Companies that excel at implementing AI throughout the organization will find themselves at a great advantage in a world where humans and machines working together outperform either humans or machines working on their own.
For a country with 17,508 islands, it’s quite surprising to think
that only three other countries in the world have more people than
Indonesia does. It’s a country with a rich history, a growing
population, and four unicorns grazing its lush island pastures. That’s
right, of the world’s 311 unicorns (that number is accurate according to CB Insights as of today),
four of them have roots in Indonesia. Late last year, we sent one of
our MBAs over to spend a few weeks in the capital city, Jakarta, looking
at the Indonesian tech scene. That research spawned an article on GO-JEK, a simply fascinating company. It also led to lots of research around how Indonesia is competing in the global artificial intelligence (AI) race.
At
first, we were thinking about naming this article “all the AI startups
we could find in Indonesia,” but then we’d get dozens of emails for the
rest of the year about all the hidden gems we “missed.” Instead, we sat
down and did some Crunchbase searches, combed through company websites,
did some asking around, talked to some of the local startup founders,
and as a result, we have below what is our best estimation of the top AI
startups in Indonesia today. If you are one of the below startups, feel
free to celebrate your acceptance to this top-11 list by emailing this
article to every single person you know.
Name
Application
City
Funding (USD millions)
Snapcart
Smart receipts
Jakarta
14.7
Kata.ai
Conversational AI
Jakarta
3.5
BJtech
Conversational AI
Jakarta
1.2
Sonar Platform
Social Media Monitoring
Jakarta
.15
Nodeflux
Computing Platform
Jakarta
N/A
Bahasa.ai
Conversational AI
Jakarta
N/A
Prosai.ai
Conversational AI
Jakarta
N/A
Dattabot
General Big Data
Jakarta
N/A
Eureka.ai
Telcom Big Data
Jakarta
N/A
AiSensum
Robotic Process Automation
Jakarta
N/A
Deligence.ai
General AI
Jakarta
N/A
The
above startups should be proud of what they’ve accomplished because
each of them stood out, in some way, against the total number of
companies our foreign correspondent pored over while relaxing in some of
North Jakarta’s finest health spas. Let’s take a closer look at each of
these startups.
If the name Snapcart rings a bell, it could be because you read about them in our article last month on Smart Receipts and Why We Should Use Them. Founded in 2015, Indonesian startup Snapcart has taken in $14.7 million in
funding so far to create a mobile application that gives shoppers
cashback for scanning their receipts. This allows the company to collect
massive amounts of purchase data, then analyze it and offer real-time
insights to big names like Johnson & Johnson, Unilever, P&G, and
Nestle. Snapcart currently operates in Indonesia, Philippines,
Singapore, and Brazil. (Sounds like something GO-JEK might be interested in getting their hands on.)
With high retention and engagement rates, Snapcart is also able to
send targeted surveys to customers asking them relevant questions at the
right time.
A survey on face wash – Source: Snapcart
The
system can also capture transactions from independent chains where
existing solutions do not capture, and to-date they’ve processed over a
half a billion receipts. Founded in 2015, Jakarta startup Kata.ai has taken in $3.5 million to build Indonesia’s number one conversational AI
platform. A case study they published talks about the success Unilever
had when deploying a chatbot to engage with customers. The female
chatbot persona was named Jemma, and was deployed on Line messenger, one
of Indonesia’s most popular messaging apps. Less than a year after its
deployment, Jemma managed to acquire 1.5 million friends, with more than
50 million incoming messages in 17 million sessions. “Some of them even
tried to confide their dreams and problems to her,” said the case
study, and the longest conversation recorded exceeded four hours.
Another
case study discusses a chatbot deployment by Telkomsel, Indonesia’s
largest cellular operator with more than 120 million subscribers (that’s almost half of Indonesia’s population).
Turns out 96% of customer inquiries can actually be handled by the
chatbot with minimal human interaction. In order to scale more quickly,
the company built a very slick platform that makes it easy for anyone to
build a bot.
A tool for building chatbots – Source: Kata.ai
We
talked with Kata.ai’s CEO and Co-Founder, Irzan Raditya, about why
conversational AI is so popular in Indonesia. He said it’s largely
because the big tech players are behind the game when it comes to Natural Language Processing (NLP)for Bahasa Indonesia (that’s the language they speak in most of Indonesia).
It’s not an easy task when you’re trying to understand a language that
has 13 different ways to say “I.” When companies like Accenture partner
up with a “small” firm like Kata.ai to bid on projects, it helps
demonstrate that they’re best-of-breed. Moving
on to our second conversational AI startup that speaks Bahasa
Indonesia, we have BJtech. Founded in 2015, the company has taken in $1.5 million
in funding so far to develop an easy-to-use platform that helps you
create chatbots for your business. Their first product is a virtual
friend that does things for you and expects nothing in return, and an
intelligent banking app. Clients include Uber, Skyscanner, and Zomato,
though we have no idea what Uber is doing speaking the Indonesian
language after GO-JEK showed them the door. There’s a fair amount of Engrish on their website, so they may want to sort that out because that’s not the best look for a language processing company.
Founded in 2015, Sonar Platform has taken in just $150,000 in funding to develop a social media monitoring platform that – you guessed it – speaks Bahasa Indonesia. As an example, Unilever
Indonesia certainly doesn’t want some loudmouth influencer bad-mouthing
their latest skin-whitening product, and in order to see what people
are saying about their products, they might use a platform like this
one. The platform allows you to monitor social media in real-time, and
they process over 1 million conversations a day, all of which can be
mined later for insights. Their platform can gauge sentiment as well,
and Air Asia uses it to monitor how pissed off people get when their
flights are delayed. Moving away from the Bahasa Indonesia theme for a moment, we have a startup called Nodeflux that was founded in 2016 with an undisclosed
amount of funding which they’re using to develop Indonesia’s first
intelligent video analytics platform. Backed by Telkom Indonesia,
they’ve also partnered with NVIDIA to offer video analytics services to
companies like GO-JEK
which uses their service to monitor CCTV cameras on the streets of
Jakarta to track where the 1 million plus fleet of GO-JEK scooters is at
during any given time.
They also offer services like facial recognition, license plate reading, flood monitoring, and trash detection. And
we’re back, on to more conversational AI for Bahasa Indonesia with the
aptly named Bahasa.ai, a startup that was founded in 2017 and which has
taken in an undisclosed amount of funding to “build the most robust NLP
modules for Bahasa Indonesia.” Based on the AI research focus we
observed at Kata.ai, they have their work cut out for them. Since our
own Bahasa skills are lacking, and they haven’t translated their website
(can’t they get some of their algos to do it?), that’s about
all we can tell you about Bahasa.ia. Oh, and one of their competitors
vouched for their capabilities which was awful nice of them. In other
words, they’re not just a company that creates chatbot scripts and says
they use AI when they actually don’t. (We’re told there are some of those out there in Jakarta but we’re not naming names.) Our
next company we know little about because they’re so new. Founded in
2018, Prosa.ai was founded by Indonesian experts in AI for NLP in text
and speech. They already have subscription pricing on their website, so
we can only assume that they have developed a product. We saw that
they’re backed by a notable Indonesian venture capitalist, so we can
also assume that someone vetted their business model against the
plethora of NLP startups that are already tackling this problem. Founded in 2003, Indonesian startup Dattabot – formerly known as Mediatrac- is big data analytics company with an undisclosed
amount of funding that has assembled the most comprehensive data
library in Indonesia. We sat down with the founders, Regi Wahyu and
Imron Zuhri, who told us how they started out scanning Indonesia’s dark
world of data, largely offline and in printed form. In 2010, they began
scaling their data offering and in 2015, pivoted to become the company
they are today that targets a number of industry verticals.
Dattabot’s core technology – Source: Dattabot
Their
first project involved a large FMCG company with three databases of
data and no desire to spend money on building a data warehouse. Dattabot
used some clever AI algorithms to solve that problem, and revenues
soared as they optimized various aspects of the operation like the “traveling salesman problem”
we discussed before. Then came one of Indonesia’s largest telcom
providers with a big problem. More than 90% of accounts were prepaid.
How can you know the customer? Dattabot used AI to solve that problem
too. That’s when they realized that an even bigger opportunity could be
found in Indonesian farming, an industry that consists of 49 million
farmers that represent 41% of the country’s total labor force. Their
subsidiary Hara.ag was then born, and the story behind it is so interesting we’re going to dedicate an entire article to it. Stay tuned. We
actually don’t know when our next company was founded, or how much
funding they’ve taken in, but we do know their PR company is asleep at
the wheel because they never responded to our email asking for more
info. That’s okay though, because when you’re busy kicking a33 and
taking names, who needs PR anyways?
The man at the helm is Benjamin Soemartopo, previously with McKinsey
& Company for 12 years as Managing Partner and CEO for Indonesia and
before that, Managing Director for Standard Chartered’s Bank Private
Equity in Indonesia for six years. The company enables partnerships
between mobile operators and companies in industries including banking,
insurance, transportation, and consumer goods with a global presence:
That’s the who/what/where, and about all we can tell you for now. Our
second to the last startup was somewhat difficult to understand until
the company emailed us to clear things up. Their main source of revenue
is data monetization partnerships through their platform called Octopi,
a machine learning driven SaaS dashboard that creates business
intelligence insights. The firm also offers Robotic Process Automation (RPA)
that they describe as “low cost bets for companies who are unwilling or
unable to invest in fully automated AI platforms.” They also let us
know that they didn’t appreciate us making fun of their octopus,
something we blamed on our ethnocentric tendencies to make fun of things
we don’t understand – like this diagram.
If what you do is tough to explain, try using a cephalopod to make things more clearer – Source: AiSensum
Joking
aside, they’re enthusiastic about what they’re doing so we may go visit
them when we’re back in Jakarta. They also have a sister company called
Neurosensum which uses AI for consumer research and which may have some
toys we can play with.
Last
but not least is a startup called Deligence.ai. We know almost nothing
about them because they’ve been so busy doing AI stuff that they haven’t
even created a profile on Crunchbase. The only reason they made this
top-11 list is because a founder we talked to vouched for them. (See how important networking is kids?)
According to the website, they provide “organizations the most optimal
access to the cutting-edge computer vision, machine learning, and big
data technology.” We’ve also reached our word limit on this article so
time for a conclusion.
Conclusion
Forgetting about AI
for a minute, we were simply floored by the opportunity that we saw in
the world’s fourth largest country, the talented and passionate people
we spoke to who could see the opportunity, the astounding success of startups like GO-JEK, and conversely, how isolated and relatively untapped the tech scene seemed. (We’re
trying desperately to find emerging technology startups of any kind in
the country’s second largest city, Surabaya, and have come up empty
handed so far.) In the future, we’re going to take a closer look at
what sort of investment opportunities might exist for retail investors
in Indonesia – largely in the area of ETFs – and also deep-dive into the
fascinating world of Indonesia’s “big” data problem and how it’s being
solved.
Are you paying too much in transaction fees to your broker? Check out a brokerage firm called Zacks Trade that's offering $1 trades on U.S. stocks and options until 2020. After
that, you'll pay just $3 a trade or a penny a share, whichever is
greater. It's one of the cheapest brokers out there and you can also trade stocks on 91 foreign stock exchanges. Click here to trade US stocks and options for as low as $1 per order until 2020.
Changing company culture is the key—and often the biggest challenge—to scaling artificial intelligence across your organization.
It’s an exciting time for leaders.
Artificial intelligence (AI) capabilities are on the precipice of
revolutionizing the way we work, reshaping businesses, industries,
economies, the labor force, and our everyday lives. We estimate AI-powered applications will add $13 trillion in value to the global economy in the coming decade, and leaders are energizing their agendas and investing handsomely in AI to capitalize on the opportunity—to the tune of $26 billion to $39 billion in 2016 alone.
Meanwhile,
AI enablers such as data generation, storage capacity, computer
processing power, and modeling techniques are all on exponential
upswings and becoming increasingly affordable and accessible via the
cloud.
Conditions seem ripe for companies to succeed with AI. Yet, the
reality is that many organizations’ efforts are falling short, with a majority of companies only piloting AI or using it in a single business process—and thus gaining only incremental benefits.
Why the disappointing results?
Many organizations aren’t spending the necessary (and significant)
time and resources on the cultural and organizational changes required
to bring AI to a level of scale capable of delivering meaningful
value—where every pilot enjoys widespread end-user adoption and pilots
across the organization are produced in a consistent, fast, and
repeatable manner. Without addressing these changes up front, efforts to
scale AI can quickly derail.
Making the shift
To scale up AI, companies must make three shifts. First, they must transition from siloed work to interdisciplinary collaboration,
where business, operational, and analytics experts work side by side,
bringing a diversity of perspectives to ensure initiatives address broad
organizational priorities and to surface user needs and necessary
operational changes early on.
Second, they must switch from experience-based, leader-driven decision making to data-driven decision making,
where employees augment their judgment and intuition with algorithms’
recommendations to arrive at better answers than either humans or
machines could reach on their own.
Finally, they must move from rigid and risk averse to agile, experimental, and adaptable, embracing the test-and-learn mentality that’s critical for creating a minimum viable product in weeks rather than months.
Such fundamental shifts don’t come easily. In our recent article, “Building the AI-powered organization,” published in Harvard Business Review,
we discuss in depth how leaders can prepare, motivate, and equip their
workforce to make a change. Here we summarize the four key areas in
which leaders should focus their efforts.
Set up for success
To get employees on board and smooth the way for successful AI
launches, leaders should devote early attention to several tasks,
including the following:
Explaining why AI is important and how workers will fit into a new AI-oriented culture.
Anticipating and addressing from the start their firm’s unique barriers to change.
Budgeting as much for AI integration and adoption as for technology (if not more). One of our surveys
revealed that 90 percent of the companies that engaged in critical
scaling practices spent more than half of their analytics budgets on
activities that drove adoption, such as workflow redesign,
communication, and training.
Balancing feasibility, time investment, and value to pursue a
portfolio of AI initiatives with different time horizons (typically over
three years) and combining complementary efforts with different timelines for maximum value.
Organize for scale
In our experience, AI-enabled companies have two things in common
when it comes to structuring roles and responsibilities—both in terms of
who “owns” the work and how the work is executed.
First, they divide key roles between a central analytics “hub”
(typically led by a chief analytics officer or chief data officer) and
“spokes” (business units, functions, or geographies). A few tasks—such
as data governance, managing AI systems and standards, and establishing
AI recruiting and training strategies—are always best owned by the hub.
And a handful of responsibilities, including end-user training, workflow
redesign, and impact tracking, are almost always best owned by the
spokes. The rest of the work—which includes, among other
responsibilities, setting the direction for AI projects; building,
designing, and testing the tools; and managing the change—falls in a
gray area and is assigned to either the hub or spokes based on each
firm’s AI maturity, business-model complexity, and pace of innovation.
(Generally speaking, the greater the AI maturity and more data experts
available, the more these responsibilities can be shifted to the spokes,
while higher complexity and a need to innovate rapidly may shift these
responsibilities to the hub).
Second, when it comes to execution, they put in place a governing coalition
of business, IT, and analytics leaders that shares accountability for
AI initiatives and sets up interdisciplinary teams within the
spokes—drawing from talent in both the hub and spokes to build, deploy,
and monitor new AI capabilities.
Educate everyone
To ensure the adoption of AI, companies need to educate everyone,
from the top leaders down. To this end, some companies are launching
internal “analytics academies,” which provide leaders a foundational
understanding of AI, enable analytics experts to continue sharpening
their hard and soft skills, build translator expertise
to bridge technical and business requirements, and prepare both
frontline workers and strategic decision makers, such as marketers, to
use new AI tools in their daily work.
Reinforce the change
With most AI transformations taking 18 to 36 months to complete (and
some lasting up to five years), leaders must also take steps to keep the
momentum for AI going. Following are some of the best ways we’ve found
to do this:
Role modeling. For example, leaders can (and should)
attend analytics academies as well as actively encourage new agile ways
of working and appropriate risk taking by highlighting what was learned
from pilots.
Making the businesses accountable. A scorecard that
captures project-performance metrics for all stakeholders, for example,
is an excellent way to align the goals of analytics and business teams.
Tracking adoption so teams can correct course as needed.
Providing incentives for change, such as shining a spotlight on employees who have helped make the company’s AI program a success.
All this work (from the initial setup activities to the reinforcement
mechanisms) not only helps organizations get more value from AI in the
near term but also creates a virtuous cycle: the growth of
interdisciplinary teams, test-and-learn approaches, and data-driven
decision making that comes with the building and adoption of new AI
capabilities leads to more collaborative practices among employees,
flatter organizations, and greater agility. This provides fertile ground
for even greater innovation, enabling companies to thrive as AI
advancements barrel full speed ahead.
For a deeper look at how leaders can drive the cultural and organizational changes necessary for scaling AI, read “Building the AI-powered organization,” on hbr.org. Tim Fountaine is a partner in McKinsey’s Sydney office and leads QuantumBlack, a McKinsey company, in Australia; Brian McCarthy is a partner in the Atlanta office and coleads the knowledge development agenda for McKinsey Analytics; and Tamim Saleh is a senior partner in the London office and heads McKinsey Analytics in Europe.
In our recent surveys AI Adoption in the Enterprise and Machine Learning Adoption in the Enterprise,
we found growing interest in AI technologies among companies across a
variety of industries and geographic locations. Our findings align with
other surveys and studies—in fact, a recent study by the World Intellectual Patent Office (WIPO)
found that the surge in research in AI and machine learning (ML) has
been accompanied by an even stronger growth in AI-related patent
applications. Patents are one sign that companies are beginning to take
these technologies very seriously.
When we asked
what held back their adoption of AI technologies, respondents cited a
few reasons, including some that pertained to culture, organization, and
skills:
[23%] Company culture does not yet recognize needs for AI
[18%] Lack of skilled people / difficulting hiring the required roles
[17%] Difficulties in identifying appropriate business use cases
Implementing and incorporating AI and machine learning technologies
will require retraining across an organization, not just technical
teams. Recall that the rise of big data and data science necessitated a
certain amount of retraining across an entire organization:
technologists and analysts needed to familiarize themselves with new
tools and architectures, but business experts and managers also needed
to reorient their workflows to adjust to data-driven processes and
data-intensive systems. AI and machine learning will require a similar
holistic approach to training. Here are a few reasons why:
As noted from our survey, identifying appropriate business use
cases remains an ongoing challenge. Domain experts and business owners
need to develop an understanding of these technologies in order to be
able to highlight areas where they are likely to make an impact within a
company.
Members of an organization will need to understand—even at a
high-level—the current state of AI and ML technologies so they know the
strengths and limitations of these new tools. For instance, in the case
of robotic process automation (RPA), it’s really the people closest to tasks (“bottoms up”) who can best identify areas where it is most suitable.
AI and machine learning depend on data (usually labeled training
data for machine learning models), and in many instances, a certain
amount of domain knowledge will be needed to assemble high-quality data.
Machine learning and AI involve end-to-end pipelines, so
development/testing/integration will often cut across technical roles
and technical teams.
AI and machine learning applications and solutions often interact
with (or augment) users and domain experts, so UX/design remains
critical.
At our upcoming Artificial Intelligence conferences in San Jose and London,
we have assembled a roster of two-day training sessions, tutorials, and
presentations to help individuals (across job roles and functions)
sharpen their skills and understanding of AI and machine learning. We
return to San Jose with a two-day Business Summit
designed specifically for executives, business leaders, and
strategists. This Business Summit includes a popular two-day training—AI for Managers—and tutorials—Bringing AI into the enterprise and Design Thinking for AI—along
with 12 executive briefings designed to provide in-depth overviews into
important topics in AI. We are also debuting a new half-day tutorial
that will be taught by Ira Cohen (Product management in the Machine Learning era), which given the growing importance of AI and ML, is one that every manager should consider attending.
We will also have our usual strong slate of technical training,
tutorials, and talks. Here are some two-day training sessions and
tutorials that I am excited about:
Deep learning remains a new topic for many companies, and
organizations are interested in augmenting or replacing their existing
ML systems with this class of techniques. Neil Conway and Yoav Zimmerman
are teaching an important new half-day tutorial—Modern Deep Learning: Tools and Techniques—designed
to provide concrete takeaways and best practices for developers,
researchers, ML engineers, and technical managers. If your organization
is serious about using deep learning, this is a tutorial that you and
your colleagues should consider attending.
Reinforcement learning (RL) remains a popular topic at our AI conference. We have a new tutorial—ML problem-solving with a game engine—that will help participants get started using RL with the Unity engine. A team from RISE Lab will teach an updated tutorial on Ray, an open source distributed computing framework that includes a popular library for RL (RLlib). As I noted in a recent post, Ray continues to grow impressively along multiple fronts, including number of users, contributors, and libraries.
AI and ML are going to impact and permeate most aspects of a
company’s operations, products, and services. To succeed in implementing
and incorporating AI and machine learning technologies, companies need
to take a more holistic approach toward retraining their workforces.
This will be an ongoing endeavor as research results continue to be
translated into practical systems that companies can use. Individuals
will need to continue to learn new skills as technologies continue to
evolve and because many areas of AI and ML are increasingly becoming
democratized. Related training and tutorial links:
Included:
Learning Machine Learning from scratch, hardware options, finding
mentorship, who’s important to know in the field, freelancing as a
machine learning engineer, concepts that make you difficult to replace,
preparing for interviews, interviewing with big silicon valley tech
companies, adopting the best productivity habits, and a few other
things.
Credentials:
I graduated with a degree in molecular biology and worked in biotech
after college. Within a year of leaving that industry, I was working
with the Tensorflow team at Google on probabilistic programming tools. I
later joined a security startup as a machine learning engineer.
Disclaimer:
Much of this is based on my own experience, peppered with insights from
friends of mine who have been in similar boats. Your experience might
not be identical. The main value is giving you a roadmap of the space so
you can navigate it if you have no idea what you’re doing. If you have
your own methods for learning ML that are working better than the ones
listed here (like, if you’re literally in school learning about this
stuff), keep on using them.
In
a span of about one year year, I went from quitting biomedical research
to becoming a paid Machine Learning Engineer, all without having a
degree in CS or Math. I’ve worked on side-projects that have been shared
with tens of thousands on Twitter, worked with startups in facial
recognition and distributed apps, sold a side-project, and even worked
with Google’s Tensorflow Team on new additions to Tensorflow. Again,
this was all without having a computer science degree.
This
post, while long, is a compilation of all the important concepts, tips,
and resources for getting into a machine learning career. From readers
who are not yet in College, to readers who have been out of college for a
while and are looking to make a switch, I’ve tried to distil the most
generally applicable points from my own journey that would be beneficial
to a wide array of people.
Enjoy.
Part 1: Introductions, Motivations, and Roadmap
Part 2: Skills of a (Marketable) Machine Learning Engineer
Part 3: Immersion and Finding Mentors
Part 4: Software and Hardware Resources
Part 5: Reading Research Papers (and a few that everyone should know)
Part 6: Groups and People you should be Familiar with
Part 7: Problem-Solving Approaches and Workflows
Part 8: Building your portfolio
Part 9: Freelancing as an ML developer
Part 10: Interviewing for Full-time Machine Learning Engineer Positions
Part 11: Career trajectory and future steps
Part 12: Habits for Improved Productivity & Learning
Part 1: Introductions, Motivations, and Roadmap
Introductions
If you’ve been following the news at all, chances are you’ve seen the headlines about how much demand there is for machine learning talent. In the recent LinkedIn Economic Graph
report, “Machine Learning Engineer” and “Data Scientist” were the two
fastest growing jobs of 2018 (9.8x and 6.5x growth, respectively).
Medium itself is rife with example projects, tutorials, reviews of software, and tales of interesting applications.
Despite the apparent demand, there seem to be few resources on actually
entering this field as an outsider, as compared the resources available
for other areas of software engineering. That’s why I’m writing this
mega-post: to serve as condensed resource for the lessons of my journey
to becoming a Machine Learning Engineer from a non-CS background.
“But Matt”, you must be saying, “That’s not at all unusual, lots of people go into machine learning from other fields.”
It’s
true that many non-CS majors go into the field. However, I was not a
declared statistics, mathematics, physics, or electrical engineering
major in college. My background is in molecular biology, which some of
you may have noticed is frequently omitted from lists of examples of
STEM fields.
While
I was slightly more focused on statistics and programming during my
undergrad than most bio majors, this is still an unusual path compared
to a physicist entering the field (as this lovely post from Nathan Yau’s FlowingData illustrates).
Backstory
I
don’t think it’s wise to focus too much on narratives (outside of
preparing for interviews, which we will get to). There’s many ways I
could spin a narrative for my first steps into the machine learning
field, both heroic and anti-heroic, so here’s one of the more common
ones I use:
Since
high school, I had an almost single-minded obsession with diseases of
aging. A lot of my introduction to machine learning was during my
undergraduate research in this area. This was in a lab that was fitting
discrete fruit fly death data to continuous equations like gompertz and
weibull distributions, as well as using image-tracking to measure the
amounts of physical activity of said fruit flies. Outside of this
research, I was working on projects like a Google Scholar scraper to
expedite the search for papers for literature reviews. Machine learning
seemed like just another useful tool at the time for applying to
biomedical research. Like everyone else, I eventually realized that this
was going to become much bigger, an integral technology of everyday
life in the coming decade. I knew I had to get serious about becoming as
skilled as I could in this area.
But why switch away from aging completely?
To answer that, I’d like to bring up a presentation I saw by Dr. David
Sinclair from Harvard Medical School. Before getting to talking about
his lab’s exciting research developments, he described a common struggle
in the field of aging. Many labs are focused on narrow aspects of the
process, whether it be specific enzyme activity, nutrient signalling,
genetic changes, or any of the other countless areas. Dr. Sinclair
brought up the analogy of the blind men and the elephant, with respect
to many researchers looking at narrow aspects of aging, without spending
as much time recognizing how different the whole is from the part. I
felt like the reality was slightly different (that it was more like
sighted people trying identify an elephant in the dark while using laser
pointers instead of flashlights), but the conclusion was still spot-on:
we need better tools and approaches to addressing problems like aging.
This,
along with several other factors, made me realize that using the
wet-lab approach to the biological sciences alone was incredibly
inefficient. Much of the low-hanging fruit in the search space of cures
and treatments has been acquired long ago. The challenges that remain
encompass diseases and conditions that might require troves of data to
even diagnose, let alone treat (e.g., genomically diverse cancers,
rapidly mutating viruses like HIV). Yes, I agree with many others that aging is definitely a disease, but it is also a nebulously defined one that affects people in wildly varying ways.
I
decided that if I was going to make a large contribution to this, or
any other field I decided to go into, the most productive approach would
be working on the tools for augmenting and automating data analysis. At
least for the near future, I had to focus on making sure my foundation
in Machine Learning was solid before I could return my focus to specific
cases like aging.
“So…what exactly is this long-a** post about again?”
There
are plenty of listicles and video tutorials for specific machine
learning techniques, but there isn’t quite the same level of
career-guide-style support like there is for web or mobile developers.
That’s why this is more than just compiling lists of resources I have
turned to for studying. I also tried to document the best practices I’ve
found for creating portfolio projects, finding both short-term and
long-term work in the field, and keeping up with the rapidly-changing
research landscape. I will also compile nuggets of wisdom from others I
have interviewed who are further along this path than I am.
The
level of technical ability you need to show is not lowered, it’s even
higher when you don’t have the educational background, but it’s totally
possible.
Ultimately,
I want whoever reads this to get a detailed map of the space, so if
they decide to go down my path, they can get through the valley of the
Dunning-Kruger effect much more quickly.
With
that in mind, we’ll start with a rough overview of the skills needed to
master in order to become an (employable) machine learning engineer:
Part 2: Skills of a (Marketable) Machine Learning Engineer
Becoming
a machine learning engineer still isn’t quite as straightforward as
becoming a web or mobile engineer, as we discussed inthe
previous section. This is despite all of the new programs geared toward
machine learning both inside and outside of traditional schools. If you
ask many people with the title of “Machine Learning Engineer” what they
do, you’ll often get wildly different answers.
The goal of this section is to help you put together the beginnings of a mental semantic tree (Khan Academy’s example of such a tree) for learning machine learning (Ã la Elon Musk’s now famous method).
Based on my own experiences, as well as reaching out to hundreds of
machine learning engineers in both academia and industry, here’s an
overview of the soft skills, basic technical skills, and more
specialized skills you’ll need.
Soft Skills
We
need to cover a few non-technical skills that you should keep in mind
before diving into the deep end. Yes, machine learning is mainly math
and computer science knowledge. However, you’ll most likely need to find
ways of applying this to solve real problems.
Learning new skills:
The field is rapidly changing. Every month new neural network models
come out that outperform previous architecture. GPU-manufacturers are in
an arms race. 2017 saw just about every major tech giant release their own machine learning frameworks.
There’s a lot to keep up with, but luckily the ability to quickly learn
things is something you can improve on (Growth mindsets for the win!).
Classes like Coursera Learning how to Learn are great for this. If you have Dyslexia, ADD, or anything similar, the Speechify app
can offer a bit of a productivity boost (this is one app that I used a
bunch to make as much use of my time reading and re-reading papers).
Muad’Dib
learned rapidly because his first training was in how to learn. And the
first lesson of all was the basic trust that he could learn. It’s
shocking to find how many people do not believe they can learn, and how
many more believe learning to be difficult. Muad’Dib knew that every
experience carries its lesson.
Time-management: A lot of my friends have gone to Ivy League schools like Brown, Harvard, and MIT. Out
of the ones that made it there and continued to succeed afterwards, it
seemed that skill in time management was a much bigger factor in their
success than any natural talent or innate intellect. The same pattern
will likely apply to you. When it comes to a cognitively-demanding
task like learning machine learning, RESIST THE URGE TO MULTI-TASK. Yes,
at some point you may need to run model-trainings in parallel if you
have the compute resources, but you should put your phone on airplane
mode when studying and avoid doing multiple tasks at the same time. I
cannot recommend highly enough Cal Newport’s book “Deep Work” (or his Study Hacks Blog). If you’re still in college or high school, Jessica Pointing’s Optimize Guide is also a great resource. I’ll go into more resources like this in the next post in this series.
Business/Domain knowledge:
The most successful machine learning projects out there are going to be
those that address real pain points. It will be up to you to make sure
your project is not the machine learning equivalent of Juicero.
In academia, the emphasis is more on the side of improving metrics of
algorithms. In industry, the focus is all about making those
improvements count towards solving customer or company problems. Beyond
taking classes in entrepreneurship while you’re in school, there are
plenty of classes online that can also help (Coursera has a pretty decent selection). If you want a more comprehensive overview, you can try the Smartly MBA.
It’s creators impose an artificially low acceptance rate, but if you
get in it’s free. At the very least, business or domain knowledge helps a lot with feature engineering (many of the top-ranking Kaggle teams often have at least one member whose role it is to focus on feature engineering).
Communication:
You’ll need to explain ML concepts to people with little to no
expertise in the field. Chances are you’ll need to work with a team of
engineers, as well as many other teams. Oh, and you’ll need to get past
the dreaded interviews eventually. Communication is going to make all of
this much easier. If you’re still in school, I recommend taking at
least one course in rhetoric, acting, or speech. If you’re out of
school, I can personally attest to the usefulness of Toastmasters International.
Rapid Prototyping:
Iterating on ideas as quickly as possible is mandatory for finding one
that works. Throughout your learning process you should maximize the
amount of new, useful, and actionable information you are getting. In
machine learning, this applies to everything from picking the right
model, to working on projects such as A/B testing. I had the pleasure of
learning a lot about rapid prototyping from one of Tom Chi’s
prototyping workshops (he’s the former Head of Experience at GoogleX,
and he now has an online class version of his workshop). Udacity also has a great free class on rapid prototyping that I highly recommend.
Okay,
now that we’ve got the soft skills out of the way, let’s get to the
technical checklist you were most likely looking for when you first
clicked on this article.
The Basic Technical Skills
Python (at least intermediate level) — Python
is the lingua franca of Machine Learning. You may have had exposure to
Python even if you weren’t previously in a programming or CS-related
field (it’s commonly used across the STEM fields and is easy to
self-teach). However, it’s important to have a solid understanding of
classes and data structures (this will be the main focus of most coding
interviews). MITx’s Introduction to Computer Science
is a great place to start, or fill in any gaps. In addition to
intermediate Python, I also recommend familiarizing yourself with
libraries like Scikit-learn, Tensorflow (or Keras if you’re a beginner), and PyTorch, as well as how to use Jupyter notebooks.
C++ (at least intermediate level) — Sometimes
Python won’t be enough. Often you’ll encounter projects that need to
leverage hardware for speed improvements. Make sure you’re familiar with
basic algorithms, as well as classes, memory management, and linking.
If you also choose to do any machine learning involving Unity, knowing C++ will make learning C# much easier.
At the very least, having decent knowledge of a statically-typed
language like C++ will really help with interviews. Even if you’re
mostly using Python, understanding C++ will make using
performance-boosting Python libraries like Numba a lot easier. Learn C++ has been one of my favorite resources. I would also recommend Programming: Principles and Practice Using C++ by Bjarne Stroustrup.
Once you have the basics of either Python or C++ down, I would recommend checking out Leetcode or HackerRank
for algorithm practice. Quickly solving basic algorithms is kind of
like lifting weights. If you do a lot of manual labor (e.g., programming
by day), you might not necessarily be lifting a lot of weights. But, if
you can lift weights well, most people won’t doubt that you can do
manual labor.
Calculus (at least basic level) — If
you have an understanding of derivatives and integrals, you should be
in the clear. Otherwise even simpler concepts like gradient descent will
elude you. If you need more practice, Khan Academy is likely the best
source of online practice problems out there for differential, integral, and multivariable calculus. Differential equations are also helpful for machine learning.
Statistics (at least basic level) — Statistics
is going to come up a lot. At least make sure you’re familiar with
Gaussian distributions, means, and standard deviations. Every bit of
statistical understanding beyond this helps. Some good resources on
statistics can be found at, you probably guessed it, Khan Academy. Elements of Statistical Learning, by Hastie, Tibshirani, & Friedman, is also great if you’re looking for applications of statistics to machine learning.
BONUS: Numerical Analysis (at least basic level) — A
lot of machine learning techniques out there are just fancy types of
function approximation. These often get developed by theoretical
mathematicians, and then get applied by people who don’t understand the
theory at all. The result is that many developers might have a hard time
finding the best technique for their problem. If they do find a
technique, they might have trouble fine-tuning it to get the best
results. Even a basic understanding of numerical analysis will give you a
huge edge. I would seriously look into Deturk’s Lectures on Numerical Analysis from UPenn, which covers the important topics and also provides code examples.
All
this math might seem intimidating at first if you’ve been away from it
for a while. Yes, machine learning is much more math-intensive than
something like front-end development. Just like with any skill, getting
better at Math is a matter of focused practice. There are plenty of
tools you can use to get a more intuitive understanding of these
concepts even if you’re out of school. In addition to Khan Academy, Brilliant.org is a great place to go for practicing concepts such as linear algebra, differential equations, and discrete mathematics.
Common non-neural network Machine Learning Concepts — You
may have decided to go into machine learning because you saw a really
cool neural network demonstration, or wanted to build an artificial
general intelligence (AGI) someday. It’s important to know that there’s a
lot more to machine learning than neural networks. Many algorithms like
random forests, support vector machines (SVMs), and Naive Bayes Classifiers
can yield better performance for your hardware on some tasks. For
example, if you have an application where the priority is fast
classification of new test data, and you don’t have a lot of training
data at the start, an SVM might be the best approach for this. Even if
you are using a neural network for your main training, you might use a
clustering or dimensionality-reduction technique first to improve the
accuracy. Definitely check out Andrew Ng’s Machine Learning, as well as the Scikit-learn documentation.
Common Neural Network Architectures — Of
course, there are still good reasons for the surge in popularity of
neural networks. Neural networks have been by far the most accurate way
of approaching many problems, like translation, speech recognition, and
image classification. Andrew Ng’s Machine Learning (and his more up-to-date Deep Learning specialization) are great starting points. Udacity’s Deep Learning is also a great resource that’s more focused on Python implementations.
Bear
in mind, these are mainly the skills you would need to meet the minimum
requirements for any machine learning job. However, chances are you’ll
be working on a very specific problem within Machine Learning. If you
really want to add value, it will help to specialize in some way beyond
the minimum qualifications.
Voice and Audio Processing — This
field has frequent overlap with natural language processing. However,
natural language processing can be applied to non-audio data like text.
Voice and Audio analysis involves extracting useful information from the
audio signals themselves. Being well versed in math will get you far in
this one (you should at least be be familiar with concepts like fast
Fourier transforms). Knowledge of music theory also helps. I recommend
checking out the Kaggle kernels for the MLSP 2013 Bird Classification Challenge and TensorFlow Speech Recognition Challenge, as well as Google’s NSynth project.
Reinforcement Learning — Reinforcement
learning has been a driver behind many of the most exciting
developments in deep learning and artificial intelligence in 2017, from AlphaGo Zero to OpenAI’s Dota 2 bot to Boston Dynamics’s Backflipping Atlas.
This is will be critical to understand if you want to go into robotics,
Self-driving cars, or any other AI-related area. Georgia Tech has a
great primer course on this available on Udacity. However, there are so many different applications, that I’ll need to write a more in-depth article later in this series.
There
are definitely more subdisciplines to ML than this. Some are larger and
some have yet to reach maturity. Generative Adversarial Networks are
one of these. While, there is definitely a lot of promise for their use in creative fields and drug discovery, they haven’t quite reached the same level of industry maturity as these other areas.
BONUS: Automatic Machine Learning (Auto-ML) — Tuning
networks with many different parameters can be a laborious process (in
fact, the phrase “graduate student descent” refers to getting hordes of
graduate students to tune a model over the course of months). Companies
like Nutonian (bought by DataRobot) and H2O.ai have recognized a massive need for this. At the very least, knowing how to use techniques like grid search (like scikit-learn’s GridSearchCV)and random search will be helpful no matter your subdiscipline. Bonus points if you can implement techniques like bayesian optimization or genetic algorithms.
Conclusions
With
this overview of machine learning skills, you should hopefully have a
better grasp on how the different parts of the field relate to one
another. If you want to get a quick, high-level understanding of any of
these technical skills, Siraj Raval’s YouTube channel and KDnuggets are good places to start.
It’s
not enough to just have this list of subjects in you head though.
Certain approaches to studying this are more effective than others.
Part 3: Immersion and Finding Mentors
Self
study can be tricky, even for those of us without any kind of attention
deficit disorder. It’s especially important to note that not all self
study is equal in quality. Take studying a language, for example. Many
people have had the experience of learning a language for years in a
classroom setting. When they go spend a few weeks or months in a country
where that language is all that is spoken, they often describe
themselves as learning much more quickly than in the classroom setting.
This is often referred to as learning a language by immersion. This means that even the instructions for what you need to do with a language are in the language itself.
While
learning a subject like machine learning might be functionally
different than learning another spoken language (you’re not going to be
speaking in classes and functions, after all), the principle of
surrounding yourself with a subject and filling as many hours of the day
with it is important here. That is what we’re talking about when we
talk about immersion with respect to machine learning. What Cal Newport
might say is that the reason formal institutions often consistently
result in higher quality is immersion for non-language subjects. People
spend many hours per day in structured settings where it’s almost
difficult NOT to study a particular subject. The ones that find more
immersion (i.e., taking additional more advanced classes, spending more
time studying the subject with others, involving themselves in original
research efforts) are the ones that succeed more.
If
you’re studying machine learning in a formal setting, good for you.
Much of the rest of the advice in this post still applies, but you’ve
got an edge. If you’re not studying machine learning in a formal
setting, or if you’re entering into the space from a different field,
your challenge is going to be building your own habits, commitments,
structures, and environments that make you spend as much time studying
machine learning.
How
do you do this? First, you’re going to need to put together a schedule
for learning the different subjects listed in the previous section. Fow
varied this is or how long it will take will depend on your previous
familiarity with the mathematical concepts involved (try starting with 1
week for reviewing each of the subjects to get a sense for the space,
and spend more or less time based on your previous familiarity).
You
should try to fit at least 2 hours into each day studying. EVERY.
SINGLE DAY. This spaced repetition will become stronger as your learning
streaks get longer (and you will be surprised at how rusty you can get
after taking just a single day off). If you can fit more than 2 hours
into certain days, like on weekends, that’s even better. Even when I was
working full time, I was making sure to fit at least 2 hours of
studying each day (part of this was the result of learning how to
effectively read papers, books, and tutorials while also riding a train
or bus). While there were occasionally holidays that I would use for
structured study-sessions, most of this found time came from
relentlessly optimizing what I spent my time doing.
You
should make sure to have a minimum amount of time each day scheduled in
your calendar (and I mean actually reserved in your calendar, in a slot
where nothing else can be scheduled over). Set up alerts for these
times, and find an accountability buddy (someone who can keep you
accountable if you do not study during these times. In my case I had
other friends that were studying subjects in machine learning and we
would present each other with our notes and/or github commits). 2 hours a
day minimum can sound like a lot, but if you remove the items from your
schedule that are less important (*cough* social media), you will be amazed at how much time you can find.
Now
at this point, much of the content has focused on what you as an
individual can to do improve your studying. There’s one more thing to
keep in mind when studying:
DON’T GO IT ALONE.
You’re
probably inexperienced in machine learning if you’re looking for advice
form this post. For the self study, it is absolutely critical that you
find a network of mentors (or at the very least one incredibly
experienced mentor). If you don’t find a mentor, you will have to put a
lot more time and effort into self-study to get the same results as
someone that had a mentor and put in less practice. Our culture is
flooded with the trope of the lone Genius. Many may correctly point out
that people like Mozart and Einstein became masters in their fields by
putting in thousands of man-hours while they were still young. However,
many of the same people often ignore the critical roles that mentors
played in their careers (Mozart had his father, and Einstein had
professors in one of the best physics departments on the planet at the
time).
Why
is finding a mentor so important? Chances are they may have been down
the same road you’re travelling. They have a better map of the space,
and will probably have a better grasp of the common pitfalls that plague
people earlier in their careers. They’ll be able to tell you whether
the machine learning idea you’re working on is truly novel, or whether
it’s been done countless times with a non-ML implementation
There are a few possible steps to acquiring a mentor
Create a list of prospective mentors:
Create a list of experienced people in the field of interest (in this
case it might be computer science or machine learning). This list can be
updated as time goes on, and you get a better feel for how everyone is
connected in the space. You might be surprised at what a small world the machine learning space is.
Be indirect at first:
If you’re talking to a potential mentor for the first time, start out
with very specific questions about their work. Demonstrate your interest
by showing you’ve put thought into your questions (ask the kinds of
questions where it seems like you’ve exhausted other research resources,
and are coming to them because nobody else would have a good answer).
Also, for those on your list, I would avoid asking literally “will you
be my mentor?”. If the person in question is qualified to be a mentor,
then they may not have a lot of time to spare in their schedule
(especially not for something that sounds like it would require
committing a lot of time to a person they just met). That leads me to
the much better strategy…
Demonstrate value:
Again, if a person is experienced enough to be a good mentor, chances
are they will also have very little spare time on their hands. However,
they will often be willing to provide advice or mentorship if you’re
willing to help them out with a project of theirs. Offer to volunteer.
yes, I know unpaid internships can be considered dubious, but at this
point in time getting a good mentor at all is more important. This can
be a short term project that could turn into a referral for a much more
rewarding one.
Use youth to your advantage (if possible):
If you are young, you might have an advantage. People are a lot more
willing to help simply if you are younger. You might be nervous about
approaching people a lot older, but you actually have a lot less to fear
than you realize. The worst they can do is say no.
Be open to reverse mentors:
By reverse mentors, I mean people that are younger than you, but that
are also much further ahead in their machine learning journeys. You may
have come across people that have been programming since they were 5
years old, built their first computer not from a kit, but completely
from scratch. They’ve started ML companies. They’re grad students at top
CS programs. They’re Thiel fellows or Forbes 30 under 30s. If you
happen to run into these people, do not be intimidated or envious. If
you have someone like this in your network as a machine learning expert,
try to learn from them. I’ve been fortunate enough to meet a bunch of
people like this, and they were invaluable in helping me find the next
steps.
Be humble and obedient:
It’s important that you remember you are coming to them for advice.
They are taking time out of their busy schedule to give you
recommendations. If you want someone to remain your mentor, then you
should defer to their judgement. If you do something different than what
they say or don’t do it at all, that will be a pretty good signal them
that you either don’t value their advice, or that you aren’t that
serious about becoming a machine learning engineer.
If
you focus on making sure you get as much immersion as possible, and you
are able to find experienced machine learning engineers to provide
advice and guidance, you’re off to a fantastic start.
There is one, last, minor detail to consider before you begin your learning journey…. you need an actual computer to program on.
Part 4: Software and Hardware Resources
Programming
for machine learning often distinguishes itself from web programming by
the fact that it can be much more demanding in terms of hardware. When I
started out on my machine learning journey, I originally used a
3-year-old Windows laptop. For basic machine learning tutorials this may
be adequate, but once you try spending 28 hours training a simple
low-resolution GAN, hearing your CPU scream in agony the whole time,
like me you will realize you need to expand your options.
The choice of environments can be daunting at first, but it can easily be split up into a parseable list.
The
first thing you may be wondering is whether you should pick Windows,
Mac, or Linux. A lot more packages, like you would see with Anaconda,
are compatible with Mac and Linux rather than Windows. Tensorflow and
PyTorch are available on all 3, but some less common but still useful
packages like XGBoost may be trickier to install on Windows. Windows has
becoming more popular in recent years as a development platform for
machine learning, though this has largely been due to the emergence of
more cloud resources with Azure. You can still use a Windows machine to
run software that was developed for Mac or Linux, such as by setting up a
VirtualBox virtual machine. It’s also possible you could use a Kaggle
kernel or a databricks kernel, but that of course is dependent on having
a great internet connection. For the operating system, if you’re
already used to using a Mac you should be fine. Regardless of which
operating system you choose, you should still try to add an
understanding of Linux to your skill set (in part because you will
probably want to deploy trained models to servers or larger systems of
some kind.
For
your machine learning set up, you have four main options: 1) the Laptop
Option, 2) Cloud Resources/Options, 3) Desktop Option and 4) Custom/DIY
Machine Learning Rigs.
Laptop Option: Favoring portability & Flexibility — If
you’re going for the machine learning freelancing route, this can be an
attractive option. It can be your best friend if you’re a digital nomad
(albeit, one that might feel like a panini press if you’re keeping it
in your actual lap when you’re using it for model training). With that
in mind, here are some features and system settings you should make sure
you have if you’re using your Laptop for Machine Learning.
RAM: Don’t settle for anything less than 16 GB of RAM.
GPU:
This is probably the most important feature. Having the right GPU (or
having any GPU instead of just CPUs) could mean the difference between
model training taking an hour or taking weeks or months (and making lots
of heat and noise in the process). Since many ML libraries make use of
Cuda, go with an NVIDIA graphics card (like a GeForce)for the least
amount of trouble. You may need to write some low-level code for getting
your projects to run on an AMD card.
Processor: Go with an Intel i7 (if you’re a Mr. Krabs-esque penny-pincher, make sure you don’t go below an Intel i5).
Storage: Chances
are you’re going to be working on projects that require a lot of data.
Even if you have extra storage on something like Dropbox or an external
drive, make sure you have at least 1 TB.
Operating System: Sorry
Mac and Windows cultists, but Skynet is probably going to be running on
Linux when it comes out. If you don’t have a computer with Linux as the
main OS yet, you can make a temporary substitute by setting up a
virtual machine with Ubuntu on either your Mac or Windows machine.
When it comes to specific brands there are many choices. Everyone from Acer to NVIDIA has Laptops.
Of
course, if you insist on using a Mac, you could always connect your
machine to an external GPU (using a device like an NVIDIA Pascal)
But if you’re strapped for cash, don’t fear. You can always go with one of the cheapest laptops out there (e.g., a $249 Chromebook),
and then use all the money you saved for cloud computing time. You
won’t be able to do much on your local machine, but as long as you have a
decent internet connection you should be able to do plenty with cloud
computing.
Speaking of cloud computing…
Cloud Resources/Options — It’s
possible that even your powerful out-of-the-box or custom build won’t
be enough for a really big project. You’ve probably seen papers or press
releases on massive AI projects that use 32 GPUs over many days or weeks.
While most of your projects won’t be quite that demanding, it will be
nice to have at least some options for expanding your computing
resources. These also have the benefit of being combined with whatever
laptop you have, combining it
Microsoft
Azure is usually the cheapest for compute time (I might be fanning the
flames of the Github/Gitlab holy war here). Amazon usually has a lot
more options (including more obscure options like combining it with data
streaming Kinesis or long term storage in S3 or Glacier). If you’re
using a Tensorflow Model, Google Cloud’s TPUs (i.e., re-marketed GPUs)
are optimized for models built using this. They also offer tools and
services for optimizing your hyperparameters, so you don’t have to set
up the bayesian Optimization yourself.
If
you’re relatively new to using Cloud Services, Floydhub is the simplest
to use in terms of user experience. If you’re a beginner, this is by
far the easiest one to set up.
Then
again, you might not find the idea of shelling out a bunch of money for
GPU compute-time for every project you want to do. At some point, you
may decide to yourself that you only want to concern yourself with the
electric bill when it comes to compute power, nothing else.
Desktop Option: Powerful and reliable — If
you don’t want to have variable costs due to cloud computing bills, and
you don’t want your important machine learning work to be at risk for
environmental damage, another option could be to set up a Desktop
environment. Lambda Labs and Puget Systems
make some really great high-end desktops as well. The hardware options
for a desktop can take a bit more skill to navigate, but here are some
general principles to keep in mind:
For
the GPU, go with an RTX 2070 or RTX 2080 Ti. If cost is a concern,
going with a cheaper GTX 1070, GTX 1080, GTX 1070 Ti, or GTX 1080 Ti can
also be a good option. However many GPUs you have, make sure you have
1–2 GPU cores per GPU (more of you’re doing a lot of preprocessing). As
long as you buy at least as much CPU RAM to match the RAM of your
largest GPU, go with the cheapest RAM that you can (Tim Dettmers has a post explaining how the clock rates make little meaningful difference). Make sure the hard drive is at least 3 TB in size.
If
possible, go with a solid-state drive to improve the speed for
preprocessing small datasets. Make sure your setup has adequate cooling
(this is a bigger concern for Desktops than for laptops). As for
monitors, I’d recommend putting together a dual-monitor setup (3 may be
excessive, but knock yourself out. I don’t know what you would use 4 for
though).
Downside?
Basic models will cost about $2,000 to $3,000, with high-end machines
costing around $8,897 to $23,000. This is much steeper than the laptop
option, and unless you’re training complex models on massive datasets,
this is probably outside your initial budget for cloud computing.
However,
there is a big advantage that desktops have over laptops: Since desktop
computers are less restricted by design constraints such as
portability, or not turning your lap into a panini press from the heat radiating from it, it is far easier to build and customize your own. This can also be a fantastic way to cheaply build your ideal machine.
Custom/DIY Machine Learning Rigs: For the enthusiast — Chances
are if you’re in the field for a while, you’re going to start wanting
to build your own custom computer. It’s an inevitable consequence of
thinking about how much GPU resources you’re spending on a project for
this long. Given that the development of the GPUs that made cheap and
effective machine learning was pretty much subsidized by the gaming
industry, there are plenty of resources out there on building your own
PC. Here is an example breakdown of a few components and their prices.
These were intentionally selected for being cheap, so you could easily
replace any of the parts with something higher-end.
It’s
entirely possible that your level of comfort with hardware might not be
on the same level as your software comfort. If that’s the case,
building your PC is certainly going to be a lot (and I mean A LOT)
trickier than it is in PC Building Simulator. If you do succeed, this
can be a fun project, and you’ll also save money on a desktop machine
learning rig
With the custom build, you also have the option for some pretty out-there options as well…
Whichever
setup you choose, whether it be mainly Laptop, Cloud-based, Desktop, or
custom build, you should now be more than ready to run your own machine
learning projects.
Of
course, becoming a machine engineer is about more than just setting up
your hardware/software environment correctly. Since the field is
changing so much, you’re going to need to learn how to read research
papers on the subject.
Part 5: Reading Research Papers (and a few that everyone should know)
In
order to have a proper understanding of machine learning, you need to
get acquainted with the current research in the space. It’s not enough
to agree with claims of what AI can do, just because it got enough hype on social media.
If you have GPU resources, you need to know how to properly utilize
them or else they’ll be useless. You need to learn to be critical and
balanced in your assessment. This is what PhD students learn how to do,
but luckily you can also learn how to do this.
For finding new papers to read, you can often find them by following machine learning engineers and researchers on Twitter. The machine learning subreddit
is also another fantastic resource. Countless papers are available for
free on Arxiv (and if navigating that is too intimidating, Andrej
Karpathy put together archive sanity to make I easier to navigate). Journals like Nature and Science can also be good sources, too (though Arxiv often has much more, and without paywalls).
Usually there about 2 or 3 papers that are particularly popular in any given week.
For
getting through a paper, it usually helps if you have some kind of
motivation for getting through it. For example, if I want to learn about
influence functions or Neural ODEs,
I will search through the papers and read them until I understand them.
As was mentioned before with the immersion, how far you get is going to
be a function of discipline, which in turn is going to be influenced
even further by your motivation.
For any given paper, there are certain techniques you can use to make the information easier to digest and understand. The book “How to read a book”
is a fantastic resource that described this in detail. In short, you
can use what is known as a three-pass approach. In the first pass
through the paper, you can just skim through the paper to see if it is
interesting. This means you first read the title, and if it’s appealing
move onto the abstract. The abstract is the short summary at the
beginning that covers the main points in the papers. If that seems good,
you move onto the introduction, read through that, then read the
section and subsection headers, but not the content of those sections.
In the first pass, you can temporarily ignore the math (assume it’s
sound for now). Once you go through the section headers, you read the
conclusion, and then skim through the references. In the references, if
you see any papers that you’ve read before, you can mark those. The
whole purpose of this first pass is to understand what the purpose of
the paper is, what the authors are trying to do, what problem they are
trying to solve. After the first pass, I will usually turn to Twitter
(or whatever source the paper came from), and compare what others are
saying about the paper to my initial assumptions.
If
after all this I have determined that the paper is interesting enough
to read more in-depth, I’ll take another pass through it. I’ll try to
get a high level understanding of the math in the paper. I’ll read the
thicker descriptions, the plots, and try to understand the high-level
algorithm. I’ll usually pay more attention to the math this time around.
However, there may be times where the author tries to factor out all
the math. On the second pass I’m still not going through these
factorizations and derivations just yet. When I read the experiments, I
will try to evaluate whether the experiments seem reproducible. If there
is code available on Github for this paper, I will usually follow the
Github link and read through the code, and perhaps even try running some
part of it on my own device. Usually comments in the code help with
understanding. I will also read through other online resources online
that help with understanding (the more popular papers often have plenty
of high-level summaries, such as on sites like ML Explained).
On
the 3rd pass, this is when you try to understand the math itself. At
this point, you will be going through the paper with a pen and notepad,
and following along with the math itself. If there are any mathematical
terms or concepts that you do not understand, this is the point where
you search online for better explanations. If you’re really ambitious,
you can also try replicating the paper in code form, complete with the
parameters and data that they use in the paper.
If
at any point you feel stuck or frustrated, just remember to not give
up. persistence will get you very far, and reading papers gets much
easier the more times you do it. If you’re still stuck on the math,
don’t hesitate to turn to Khan Academy or Wikipedia. If you’re looking
for even more help, try reaching out on the Machine Learning Subreddit,
or join a journal club meetup group in your city.
As
for which papers to start with, I would try applying the technique
above to some of the classic papers in machine learning. A lot of the
papers you read (especially the avalanche of GAN papers out there) will
have many concepts from these. I’ve listed a few of the big ones by
subject and included links to the papers.
These
papers are a great starting point for a conceptual understanding of
where these large, daunting, machine learning models come from. While
this will take you very far in building projects and following the
latest developments, it also helps to know who is creating these developments.
Part 6: Groups and People you should be Familiar with
As
I mentioned before, finding mentors and reading papers are important.
However, it’s also worth paying attention to the work of specific
researchers.
Depending
on which subfield you go into, following certain individuals might be
more important than others, but generally speaking being familiar with
these ones will reduce the risk of you getting into an awkward moment at
NIPS. Since many of these groups are also the most heavily-connected,
you can probably navigate the increasingly crowded machine learning
research space by traversing a mental graph of who is connected to who,
and through whom.
These
companies often get a lot attention for research in the ML space
because they often have much more computing resources (and can pay the
researchers more) than in academia. However, that’s not to say there
aren’t plenty of Academic research centers you should be aware of. These
include (but again, are not limited to) IDSIA (Dalle Molle Institute for Artificial Intelligence Research, Juergen Schmidhuber’s Lab), MILA — Montreal Institute of Learning Algorithms, University of Toronto (as a whole, since so many researchers like Ian GoodFellow and Geoffrey Hinton have come out of there), and Gatsby.
For researchers, Demis Hassabis (Co-founder of DeepMind), Shane Legg (Co-founder of DeepMind), Mustafa Suleyman (head of product at DeepMind), Jeff Dean (Google), Greg Corrado (Google AI Research Scientist), Andrew Ng (Stanford, Coursera), Ray Kurzweil (Transhumanism, computer vision, and too much else to list here), Dileep George (Vicarious), D. Scott Phoenix (Vicarious and Numenta), Yann Lecun (creator of CNNs, you should probably make sure you know this guy), Jeff Hawkins (Numenta, Palm Computing, and Handspring), and Richard Socher
(Salesforce, Stanford) are good ones to keep in mind. Like the list of
companies, this should not be considered a comprehensive list. Rather,
since many of these people are superconnectors within the machine
learning space, you can gradually build up a graph to connect the most
prominent people. If you want to stay connected and aware without
information overwhelm, twitter is a fantastic tool (just keep the number
of people you’re following to under 1,500 and triage accordingly), as
well as newsletters like Papers with Code, O’Reilly Data Newsletter, KDNuggets News, and the Artificial Intelligence Podcast by Lex Fridman.
Of
course, it’s not enough to be familiar with the current celebrities of
machine learning. You should probably also make yourself familiar with
historical figures such as Charles Babbage, Ada Lovelace, Alan Turing. I
recommend Walter Isaacson’s “The Innovators” for an overview of the connection for all of them.
Again,
I should stress that your map of the organizations and prominent
researchers here should not be limited by this list. As with anything in
machine learning, you are going to need to continually update your
knowledge-base, and figure things out for yourself.
Speaking of figuring things out for yourself…
Part 7: Problem-Solving Approaches and Workflows
The
ultimate goal behind reading many research papers, working on many
projects, and understanding the works of top researchers is to better
develop your own approaches. While the workflows of top researchers can
be attributed at least partially to intuition from having seen so much,
there are still some general patterns and steps you can take for
undertaking a machine learning project. Many of these apply for
everything from original research to developing models for freelance
clients.
Determine if Machine learning is actually necessary:
It’s of course not as simple as throwing a neural network at
everything. First off, you might want to make sure that for the problem
you’re working on Machine learning will actually be an improvement over
some other algorithm. You wouldn’t use a neural network to solve FizzBuzz, riiiiiiiight?
Understanding the type of problem: Once
you’ve determined that using machine learning would be beneficial, you
probably want to determine what specific type of machine learning is
useful (or even if a pipeline with multiple steps would be useful). Are
you trying to get a model that matches patterns in known data? You’re
probably using Supervised learning. Are you trying to uncover patterns
you’re not sure exist? It’s likely unsupervised learning that you’re
working on. Are you working with data that changes after each output
from your model? You’re probably going down the reinforcement learning
path.
Check Previous work: It
is a wise precaution to see what other previous work has been done on a
problem. Take a scan of Github to get some ideas. It’s also worth
looking into existing literature on a specific problem.
Image-processing, for example, has so many solutions that some refer to
it as a solved problem. Facebook’s AIs can already recognize human faces
with much greater accuracy than most humans. That being said, it’s
likely you will get to a point where even the best existing solutions
are inadequate (i.e., pretty much the state of the entire field of NLP
for many tasks). When it comes to that, there are a variety of different
steps you can incorporate into solving a problem.
Preprocessing and Exploratory Data Analysis: Before
you input the data into your model, you should always stop to make sure
your dataset is up to snuff. This can involve everything from checking
for missing data, to rescaling and filtering the data, to looking at the
relationships between parts of the data at a basic level.
For
preprocessing, one common technique is to use a zero mean (subtract the
mean from each predictor) to center the data, which can be combined
with dividing by standard deviation to scale the data. This can be used
for anything from tabular data to RGB values in images. Dates and times
should be put into a consistent DateTime format. If you have a lot of
categorical variables, it is more often than not crucially important to
One-Hot encode them. At this stage you should also strive to resolve any
outliers (and if possible understand their meaning). If your model is
sensitive to outliers, you can try applying a spatial sign. You should
also make the effort to eliminate any missing data. This can obviously
be problematic if missingness is somehow predictive. Tree-based models
are great for deal with missing data, or if you don’t have time for that
you can use imputation/interpolation (KNN or intermediate regression
model).
The
exploratory data analysis can also be useful for getting an intuitive
sense for what kinds of models or data reduction techniques could be
useful. This is important for finding possible relationships between any
and all of the features you might be working with. Calculations such as
Maximal Information Coefficients can be useful. Building correlation
matrices for the features (i.e., box-charting everything),
scatter-plotting and histogram-plotting every combination of features
can expand this even more. Don’t get so excited about jumping into using
a k-NN classifier that you forget the techniques from simple excel
tables, such as using pivot tables and grouping by particular features.
Some of your variables might need to be transformed (square, cube,
inverse, log, Optimus…wait…what?)
before they can be plotted or models can be trained on them. For
example, if you’re looking at river flow events or cryptocurrency
prices, it will probably be wise to plot values on a log scale. While
you’re putting together the boilerplate for automatically doing all
these steps for whatever dataset you find, don’t forget the classic
summary statistics (mean, mode, minimum, maximum, upper/lower quartiles,
identification of >2.5 SD outliers).
Data Reduction:
When beginning a project, it’s a good first step to see if reducing the
amount of data to be processed will help with the training. There are
many techniques for this. You’ve probably heard of using Principal
Component Analysis (PCA) or Linear Discriminant Analysis (LDA, in the
case of classification). Feature Selection, or only using the components
that account for a majority of the information when Modeling, can be
another easy way to focus on the important information to the model. How
do you decide what to remove? Removing low/zero variance predictors
(ones that don’t vary with the correct classification), or removing
multicollinear heavily correlated features (if there’s a 99% correlation
between two features, one of them is possibly useless) can be good
heuristics. Other techniques like Isomap or Lasso (in the case of
regression) can help even more.
Parameter tuning: Once
you do have your model running, i may not be performing exactly as you
wanted. Fortunately this can be solved with clever parameter tuning.
Unfortunately there are often many parameters for models like neural
networks, so some techniques like grid search may take longer than
anticipated. Unintuitively, using random search can give improvements
over grid search, but even then the dimensionality problem can remain.
There is a field focused on efficiently tuning large models. This can
involve anything from bayesian optimization, to training SVMs on data of
model parameters, to genetic algorithms for architecture search. That
being said, it is often the case that once you learn enough about the
techniques you’re using in a model (such as an Adam or AdaDelta
optimizer), you’ll begin to have an intuition for how to quickly
converge on ideal parameters based on the output of the training graphs.
Higher-level modelling techniques: We
covered the importance of feature engineering. This can cover
everything from basis expansions, to combining features, to properly
scaling features based on average values, median values, variances,
sums, differences, maximums or minimums, and counts. Algorithms such as
Random forest, boosters, and other tree-based models for finding the
important features. Clustering, or any models based on distances to
class centroids, can also be useful for problems where a lot of feature
engineering is needed.
Another
advanced technique is the use of stacking or blending. Stacking and
Blending are two similar approaches of combining classifiers
(ensembling). I recommend reading the Kaggle Ensembling Guide for more detailed information.
However
sophisticated your modelling techniques get, don’t forget the
importance of acquiring domain knowledge for feature engineering. This is a common strategy among Kaggle competition winners:
thoroughly researching the subject of the competition to better
influence their decisions for how to build their model. Even if you do
not have a lot of domain knowledge, you should be able to account for
missing data (It can be information), or add on additional external data
(such as with APIs).
Reproducibility: This
one is more a quality of workflows than problem-solving strategies.
You’re probably not going to do an entire project in one sitting. It’s
important to be able to pick up where you left off, or easily be able to
start from the beginning with only a few clicks. For model training,
make sure you set up your code to have the proper checkpointing and
weight-saving. Reproducibility is one of the big reasons why Jupyter
notebooks have gotten so popular in machine learning.
That
was a bit of a mouthful. I encourage you to follow the links within
there to learn more about the subjects. Once you have gotten the grasp
of these different strategies and workflows, the inevitable question is
what you should apply them to. If you’re reading this, your goal might
be to enter into machine learning as a career. Whether you do this as a
freelancer or a full-time engineer, you’re going to need some kind of
track record of projects. That’s where the portfolio comes in.
Part 8: Building your portfolio
When
you’re transitioning into a new career as a machine learning engineer
(or any kind of software-tangential career, not just ML), you may be
faced with an all too common conundrum: you’re trying to get work to get
experience, but you need experience before you can get the work to get
experience. How does one solve this Catch-22? Simple: Portfolio projects.
You
often hear about portfolios being a thing that front-end developers or
designers put together. It turns out this can be a crucial
career-booster for Data Scientists
and Machine Learning Engineers. Even if you’re not in the position of
looking for work just yet, the goal of building a portfolio can be
incredibly useful on its own for learning machine learning
What NOT to include in your portfolio
Before
we get into examples, it’s important to make it clear what should not
be included in your ML portfolio. For the most part, you have a lot of
flexibility when it comes to your portfolio. However, when it comes to
projects that could result in your resume being thrown in the trash,
there are 3 big ones that come to mind: Survival classification on the Titanic dataset. Handwritten digit classification on the MNIST dataset. Flower species classification using the iris dataset.
These
datasets are used so heavily in introductory machine learning and data
science courses, that having project based on these will probably hurt
you more than help you. These are the types of projects that are already
used in the example folders in many machine learning libraries, so
there’s probably not that many original uses for them.
Machine learning portfolio ideas
Now
that we have that warning out of the way, here’s some suggestions of
projects you CAN add to your machine learning portfolio.
Kaggle Competitions
Beyond Kaggle, there are other similar competitions out there. Halite
is an AI programming competition created by Two-Sigma investing. This
is somewhat more niche than Kaggle competitions, but it can be great if
you want to test your skills in reinforcement learning problems. The
only downside is that the competition is seasonal, and doesn’t have as
many frequent competitions as Kaggle, but if you can get your bot high
into the leaderboards when the next competition comes around, this can
be a great addition to your portfolio.
Implementations of Algorithms in Papers
Many
of the newer machine learning algorithms out there are first reported
in the form of scientific papers. Reproducing a paper, or reimplementing
a paper in a novel setting or on an interesting dataset is a fantastic
way to demonstrate your command of the material. Being able to code the
usual ML algorithms is one thing, but being able to take a description
of an algorithm and then turn it into a working project is a skill
that’s far too low in supply. This could involve reimplementing the
project in a different language (e.g., Python to C++), a different
framework (e.g., if the code for the paper was written in tensorflow,
try reimplementing in PyTorch or MXNet), or on different datasets (e.g.,
bigger datasets or less publically available datasets).
Mobile Apps with Machine Learning (e.g., Not Hotdog Spinoffs)
If
you’re looking for work in machine learning, chances are you won’t just
be making standalone JuPyter notebooks. If you can demonstrate that you
can integrate . Since libraries like tensorflow.js have come out for
doing machine learning in javascript, this is also a fantastic
opportunity to try integrating ML into react or react native
applications. If you’re really scraping the bottom of the barrel for
ideas, there’s always the classic “Not Hotdog” from HBO’s Silicon
Valley.
Of
course, copying the exact app probably won’t be enough (after all, the
joke was how poorly the app was prepared to handle anything other than hotdog and not hotdog.
What additional features can you add? Can you increase the accuracy?
Can you make it classify condiments as well? How big of a variety of
foods can you get it to classify? Can you also get it to provide
nutritional or allergy information?
Hackathons and other competitions
In
the absence of anything else, projects are often judged based on the
impact they’ve had or the notoriety they’ve received. One of the easiest
ways to get an impressive project in this regard is to put a hackathon
project into your portfolio. I’ve taken this approach in the past with
projects I’ve done as part of hackathons at MassChallenge or the MIT
Policy Hackathon. Being a track or prize-winner can be a fantastic
addition to your portfolio. The only downside is that hackathon projects
(including the edge cases) are basically glorified demos. They are
often terrible at standing up to much scrutiny or edge cases. You may
want to polish you code a bit before adding it to your portfolio.
Don’t
feel the need to restrict yourself to these ideas too much. You can
also add any talks you’ve given, livestream demos you’ve recorded, or
even online classes you’ve taught. If you’re looking for any other
inspiration, you can take a look at my portfolio site as an example.
Above
all else, it’s important to remember that a portfolio is always a work
in process. It’s never something that you will 100%. If you wait until
that point before you start applying to jobs and advertising your
skills, you’ve waited too long.
Part 9: Freelancing as an ML developer
There
may be many areas of Machine Learning you might be interested in doing
research in. When it comes to getting hands-on experience and immersion.
Working on paid ML work is the next level up. It’s also incredibly easy
to get started.
For
sites to do freelancing on, I recommend turning to Upwork or
Freelancer. Freelancer requires payment for taking the skill tests on
their site, so Upwork may be superior in that sense (at least, that’s
why I chose it).
If
you’re looking to delegate more on the side of project management and
screening potential clients, Toptal might be a good option. Toptal
Screens potential clients for you, as well as provides support on
project management. The only downside is that they also heavily screen
freelancers as well (They advertise that they only hire the “Top 3% of
freelancers”. Whether or not that exact statistic is true, they are
nonetheless very selective). Becoming a freelancer with Toptal will
require passing a timed coding test, as well as passing a few
interviews.
You
may have also built up a neat portfolio geared towards the ML subfield
you’re interested in. This portfolio solves one problem with places
hiring “junior” machine learning developers, but another remains. Few
people/organizations are looking for anything other than “Senior” ML
developers. I’ve seen job postings that require +5 years of experience
with libraries like Tensorflow, despite the fact that Tensorflow has only been out for 3 years.
Why does this happen? Most places that are hiring for ML work,
regardless of specifics of the job description, are pretty much looking
for the same thing: a Machine Learning Mary Poppins to come in and solve
all their problems.
To
increase your chances of convincing an organization you’re the solution
to their problems, it helps to build up a track record of successful
projects. In my case, I met with my first clients in person and agreed
on a project with them first, before the payment and contract was set up
on Upwork. The advantage of this method is that if your first client is
someone you know, you can get a starting reputation on the site, and
potentially get some constructive criticism at the same time.
The
work you DO end up getting may be slightly different from the goals you
had in mind when creating your portfolio. With that, your goal may have
been to demonstrate that you could code well, or implement a research
paper in code, or do a cool project. Freelance clients will only care
about one thing: Can you use ML to solve their problems.
They
say it’s better to learn from the mistakes of others instead of just
relying on your own. You can find such freelancing horror stories
curated at Clients From Hell.
While most of these examples are from freelance artists, designers, and
web developers, you may encounter some similar types (e.g., poor
communicators, clients who overestimate the capabilities of even
state-of-the-art machine learning, people with tiny or even nonexistent
budgets, and even the occasional racist).
While
it’s amusing to poke fun at some of the more extreme cases, it’s also
important to hold yourself to a high standard when it comes to working
for your clients. If your client is proposing something that is not
possible with the current state of ML as a field, do not try and prey on
their ignorance (that WILL come back to bite you). For example, I had
one client reach out to me about original content summarization, and how
they wanted to integrate it into their project. After doing some
research, I presented them with the performance results of some of
Google Brain’s summarization experiments. I told them that even with the
resources of Google, these results were still far below human
performance on summarization, and that I could not guarantee better
performance than the world’s state of the art. The interviewer thanked
me for my honesty. Obviously I did not get that particular contract, but
if I had lied and said that it was possible, then I would have been
faced with an impossible task, that likely would have resulted in an
incomplete project (and it would have taken a long time to get that
stain off my reputation). When it comes to expectations, be absolutely
transparent.
They say that trust has a half-life of 6 weeks. This actually is false. This applies when you are working in an office environment, but if you’re doing remote work, trust can have a much shorter half life. Think 6 days instead of 6 weeks.
Over
time, as you get new clients and grow your reputation, you will be able
to earn more as a freelancer and transition to more and more
interesting projects.
At
some point, however, you may decide that you prefer something with more
stability. This is a conclusion I eventually came to, even after
working with a company like Google as a contractor (the very first
machine learning contractor that the Tensorflow team ever hired). When I
did, I decided to take the leap to interview for full-time machine
learning engineer positions.
Part 10: Interviewing for Full-time Machine Learning Engineer Positions
This
is by far the most intense part of the machine learning journey.
Interviewing with companies is often much more intense than interviewing
with individual freelance clients (though most companies that hire
freelancers will do pretty thorough interviews for contract work as
well). If you’re interviewing with smaller startups, then they may be
much more flexible with their hiring process (compared to companies like
Facebook or Amazon, where an entire sub-industry has sprang up around
teaching people how to interview for those). Regardless of who you’re
interviewing with, just remember the following general steps.
The
first step is to to come up with a compelling “why”, as in what do you
want. Take time to reflect on your own thoughts and motivations. This
will allow you to focus on what you are looking for, and will probably
help you with answering questions about what you’re looking for.
The
next phase is to put together a study plan for your interview. I would
plan for about 3 to 6 months of studying the subjects from earlier in
this post. This assumes you’ve already put together some kind of
portfolio from either projects, or doing freelance work. For this phase,
you should spend at least 2 hours per day studying algorithms and data
structures, as well as additional time for reviewing the requisite math,
machine learning concepts, and statistics. Put together flashcards for
important concepts, but make sure to combine it with solving actual
coding problems.
Make
sure you put together a resume and portfolio. The resume should be one
page. You can follow the steps from earlier to put together your
portfolio. Once your resume is together, you can start reaching out to
companies.
Sites like Angel.co and VentureLoop can provide listings of openings available at startups. Y Combinator also has a page with job listings for their companies.
Don’t feel like you just need to rely on these listing sites. Ask
friends on social media if they’re aware of companies looking for
machine learning engineers, or perhaps even ask if they know about
specific companies. You can also find technical recruiters for specific
companies by searching “site:linkedin.com <COMPANY NAME> technical
recruiter”. It’s also possible that, depending on how much prior
freelancing you’ve done before applying, you may get far more recruiters
reaching out to you. This was my case, as after many months of
freelancing for clients like Google, I was getting on average 3.5
messages from recruiters per day. This is one advantage of transitioning
from freelancing to full-time when becoming a machine learning
engineer.
Once
you’ve got an interview, or several, with your company of choice, now
you need to pass the actual interview. In the early stages, there will
likely be a lot of behavioral questions. Questions along the lines such
as what motivates you, what you would do in a variety of given
scenarios, examples of times you’ve struggled and overcome said
struggle. If you pass this part, you’ll often come to the technical
interview. For the most part, do the technical interview in whichever
language is strongest for you. Answering the questions in python should
be more tolerable in this case, as this is the lingua-franca of
machine-learning. You will likely need to be able to do both standard
data structures and algorithms questions, as well as things like
implementing certain machine learning algorithms like linear regression
or image convolution from scratch. Much like an iOS engineer would be
asked about model-object-view, or how a back-end developer would be
asked about system design, you’re going to be asked a lot about how to
approach specific problems in ML. For the machine-learning specific
questions, if you’ve studied enough of the material referred-to in the
previous parts of this blog post, you should have some level of
preparation. For the algorithms interviews, I recommend practicing these
interviews every day. Some great resources for this include LeetCode,
InterviewCake, and interviewing.io (the latter of which provides mock
interviews with actual engineers).
The
interview process can take a long time. For small companies it may be 2
weeks or less. For larger companies it can take longer. Try to set up
as many interviews with companies you are less interested in for the
sake of practice. It’s often the case that someone will interview with
10 companies, and then by the 9th interview have gotten so used to the
interview process itself that the 10th ends up being a breeze.
Once
you do pass the interview, you will come to the negotiation phase.
Don’t be afraid to negotiate (a pretty compelling overview on how and
why you should negotiate in this blog post).
You will be surprised at how flexible many companies are. Just make
sure you don’t try to negotiate AFTER you’ve already signed an
agreement.
Once you’re past the negotiation stage and you’ve accepted an offer, congratulations!
Part 11: Career trajectory and future steps
So
you’ve now got an established career as a machine learning engineer.
After months or years in this space, you then might begin to ask
yourself,
Tenure
at Tech companies is often notoriously short. I believe the average for
companies like Google is about 3.2 years. For many companies it’s even
less. At some point, as you’re figuring out new ways of solving data
problems for whatever company or group you’re part of, you’ll start to
wonder what you want to do with your new skills for the next decade or
so.
If
you’re feeling like you want to apply your skills towards public good,
there are many options there as well. Check out Code for America, or
CodeFellows if you have your eyes outside the U.S..
Effective
Altruism may also be a good resource. If you cannot decide on a
specific issue, or you prefer to just focus on the fun machine-learning
tasks in front of you, you could always take up the earning-to-give
pledge. Machine Learning Engineers are often high-earners, so you could
do a lot of good by pledging a certain amount to optimal charities.
Whichever
path you take, keep in mind that Machine learning is one of those areas
where you can learn the landscape in a very short amount of time, but
true mastery takes much longer. Make sure you have an attitude of always
being a student, always looking to improve, and no matter how far you
get in your ML career, NEVER resting on your laurels (I recommend
reading Peter Norvig’s post (Peter Norvig of Google) Teach Yourself Programming in 10 Years).
Part 12: Habits for Improved Productivity & Learning
We’re
not quite done here. It’s worth also listing some general habits that
are important to keep while studying, even after you’ve attained
whatever academic or professional status you were looking for. Learning
machine learning is going to be a marathon, not a sprint. This applies
whether you’re in or out of school.
Get a full night’s sleep
If
you follow any advice from this post, even if you ignore the machine
learning checklist from earlier, follow this: make sure you get your
sleep cycle in order. Becoming a machine learning engineer is as much
about stamina as it is about speed & efficiency. Not only will your
mood and cognitive abilities increase, but you’ll have a much better
chance of staving off dementia and Alzheimer’s in the long term. as you
maintain your sleep schedule even as your daily schedule gets more
complex you’ll find that it will become much more easier and satisfying.
I
remember a friend of mine recommended Qualia to help with productivity.
One of the recommendations was that I use it while getting a full
night’s sleep. Using Qualia while also getting a full night’s sleep
definitely yielded interesting results. However, it is unknown how much
of this productivity is due to the Qualia, or is due to the Sleep. It is
entirely possible that most if not all was due to sleep, and that this
is more of a “Stone Soup” situation. Nonetheless, if you want to experiment with it in greater rigor than I had time to, go ahead.
Stay away from Social Media
This
might be controversial, considering that so many machine learning
developers and researchers are often on Twitter, but you should probably
limit the amount of time you spend on sites like Facebook. Ask yourself
this: “When was the last time any of the news articles shared in my
feed impacted my life?” It quite possibly hasn’t been ever. If you’re
worried about keeping in touch with friends and family members, chances
are you can give the close ones to you other contact info like your
phone number or email address. Those other connections that are
effectively ghosts? You can reconnect with them later if you want. If
you don’t want to fall into temptation, use a chrome extension to block
your wall in Facebook (messenger might be a lot more helpful). Delete
Snapchat if you have it (or if you haven’t deleted it already).
Granted,
this is not a universal approach. Twitter is often a useful feed, and
often features many useful resources. Here are some of the people I am following,
whom I highly recommend. Quora is also another maybe. Definitely take
it in strides. If you can, spend more time answering questions related
to deep learning rather than reading the 1001st motivational post from
another 20-year-old self-proclaimed “millionaire entrepreneur” trying to
sell you “5 secrets to becoming just like them” (a possible goal for
you: getting a job at Quora and helping them cut down on spam posts).
One
of the best ways I’ve found to deal with the short-term social media
withdrawal that came early was to replace it with something similar yet
more in line with my long-term goals. Specifically, I replaced the time I
used to spend on Facebook with time spent on Github, finding
interesting developers and projects to follow, cool repos to fork, and
working on. If you need some more time to fully wean yourself off of
your Facebook feed, this (disclaimer: sample size of n=1, your results may vary).
Eat a healthier diet
Another
important consideration for optimizing your learning is to maintain a
healthy diet. If you’re subsisting on junk food, it’s going to catch up
to you. The sugar rushes and sluggishness are going to hinder you in the
long run (and in many cases, in the short-run as well). If you’re just
eating nothing but the cheapest coffee and ramen that you can get, guess
what, you’re going to get what you pay for (which is not going to be
much at all).
As
a general rule, stay away from carbohydrates. There are many variants
on this strategy (e.g., the increasingly popular ketogenic diet, the
Bulletproof diet, etc.), but the idea is basically the same. If you can
get your body to rely more on proteins and fats for energy than sugars,
you will be less subject to the insulin spikes that can mess with your
energy levels throughout the day, and take you out of the state of flow
and concentration that helps you perform your best.
Of
course, completely going cold turkey on anything carbohydrate-related
might not be as practical if your machine learning work. The temptation
for stress-eating might be pretty strong. One compromise might be to go
with the “Slow-carb” diet that Tim Ferriss famously described. This
approach may sound great, but a word of warning: this approach works
because you’re consuming massive amounts of fiber, i.e., whatever you
eat on your cheat day, be prepared for it to come out the other end in
roughly the same quantity…and probably all at once…the next day. If
you’re mentally prepared for that, go right ahead.