Monday, March 13, 2017

Clickbaits Revisited: Deep Learning on Title + Content Features to Tackle Clickbaits

https://www.linkedin.com/pulse/clickbaits-revisited-deep-learning-title-content-features-thakur

Principal Data Scientist @ Productsup

Sometime back I wrote an article about detecting clickbaits (https://www.linkedin.com/pulse/identifying-clickbaits-using-machine-learning-abhishek-thakur). The article received a very good response and also a lot of criticism. Some people mentioned that I should have taken care of website content, some wanted more samples from different sources and some wanted me to try deep learning methods.
In this article, I will address all those issues and take clickbait detection to the next level.

No Free Lunch Information

After going through my own Facebook feed, I found out that clickbaits cannot be classified just by looking at the title of the post. Clickbait also depends on the content. If the content of the website is very relevant to the title, it shouldn’t be classified as a clickbait. However, it is very difficult to define what an actual clickbait is.
Let’s look at some of real posts on Facebook

Friday, March 10, 2017

Welcome to the Self-Driving Car Challenge 2017

https://challenge.udacity.com/


The Challenge

One of the most important aspects of operating an autonomous vehicle is understanding the surrounding environment in order to make safe decisions. Udacity and Didi Chuxing are partnering together to provide incentive for students to come up with the best way to detect obstacles using camera and LIDAR data. This challenge will allow for pedestrian, vehicle, and general obstacle detection that is useful to both human drivers and self-driving car systems.
Competitors will need to process LIDAR and Camera frames to output a set of obstacles, removing noise and environmental returns. Participants will be able to build on the large body of work that has been put into the Kitti datasets and challenges, using existing techniques and their own novel approaches to improve the current state-of-the-art.
Specifically, students will be competing against each other in the Kitti Object Detection Evaluation Benchmark. While a current leaderboard exists for academic publications, Udacity and Didi will be hosting our own leaderboard specifically for this challenge, and we will be using the standard object detection development kit that enables us to evaluate approaches as they are done in academia and industry.
New datasets for both testing and training will be released in a format that adheres to the Kitti standard, and participants will be able to use all of the associated tools to process and evaluate their own approaches. While Udacity is currently producing datasets for the challenge, all participants can get started today using the existing Kitti data.

Round 1 - Vehicles

The first round will provide data collected from sensors on a moving car, and competitors must identify distance and estimated orientation of multiple stationary and moving obstacles.

Round 2 - Vehicles, Pedestrians

The second round will also challenge participants with identifying pedestrians.

Data/Inputs

Training data will mirror the Kitti Datasets, enabling all Kitti data to be used by competitors in order to train and refine their models. Below is a detailed description of the data available, retrieved from the Kitti Raw Data site.
The dataset comprises the following information, captured and synchronized at 10 Hz:
  • Raw (unsynced+unrectified) and processed (synced+rectified) grayscale stereo sequences (0.5 Megapixels, stored in png format)
  • Raw (unsynced+unrectified) and processed (synced+rectified) color stereo sequences (0.5 Megapixels, stored in png format)
  • 3D Velodyne point clouds (100k points per frame, stored as binary float matrix)
  • 3D GPS/IMU data (location, speed, acceleration, meta information, stored as text file)
  • Calibration (Camera, Camera-to-GPS/IMU, Camera-to-Velodyne, stored as text file)
  • 3D object tracklet labels (cars, trucks, trams, pedestrians, cyclists, stored as xml file)
Here, "unsynced+unrectified" refers to the raw input frames where images are distorted and the frame indices do not correspond, while "synced+rectified" refers to the processed data where images have been rectified and undistorted and where the data frame numbers correspond across all sensor streams. For both settings, files with timestamps are provided. Most people require only the "synced+rectified" version of the files.

Requirements

Using the given data, competitors must:
  • Automatically detect and locate obstacles in 3D space to inform the driver/SDC system (e.g. using deep learning and classification approaches)
  • Fuse detection output from camera and LIDAR sensors
  • Remove noise and environment false detections

Evaluation and Judging Criteria

Student submissions will be automatically evaluated using the method put forth by Kitti in their CVPR 2012 publication, which uses the PASCAL criteria for object detection and orientation estimation performance. Specifically, we will be using the “Moderate” evaluation parameters for ranking, which are specified on the Object Detection Evaluation Benchmark page. For ranking purposes, we will be using the “Object Detection and Orientation Estimation Evaluation” with the “Moderate” difficulty. Currently, Deep MANTA holds the “Moderate” record of 89.73% performance. Extensive documentation on getting started with the dataset format and the evaluation procedure is available on the Kitti object detection evaluation page, and specifically within the Development Kit which is available here.

Prizes

First Place – US$100,000 Cash Prize
Second Place – US$3,000 Cash Prize
Third Place – US$1,500 Cash Prize
Top 5 Teams – Airfare and hotel accomodation for two representatives from each team to attend the award ceremony in Silicon Valley from their place of residence, and chance to run code on the Udacity self-driving car.

Timeline

The submission deadlines are as noted below. All start dates start at 12:00 AM PST and end-dates/deadlines end at 11:59 PM PST on the noted dates.

March 8 – March 21 — Competitor and Team Registration

Competition platform opens for account and team registration
Competitors can register with Udacity accounts or create a new account
Team leaders can recruit team members through Forum

March 22 – April 21 — Round 1

Data set for Round 1 is released
New user and team registration closes on April 21

April 22 – April 30 — Round 1 Evaluation

Top 75 teams will be asked to submit runnable code
Code will be spot-checked to prevent fraudulent submissions
Of that group the Top 50 qualified teams will progress to next round

May 1 – May 31 — Round 2

Data set for Round 2 is released
Teams will no longer be able to add or remove members after May 21

Jun 1 – Jun 14 — Finalist Evaluation

Top teams required to submit identity verification documents and runnable code
Code will be evaluated and output compared against scores on final leaderboard
Top 5 teams will be invited to attend final award ceremony at Udacity headquarters in Mountain View, California

Jun 15 – July 12 — Travel arrangements

5 week break for teams to arrange visas and travel that Udacity will help with

Jul 12 — Final Award Ceremony

Top 5 teams present their solutions to a panel of Udacity and DiDi executives and have chance to run their code on Udacity’s self-driving car

Rules

In addition to Contest Rules, the following administrative rules shall be applied to the Contest:

Eligibility

See the Contest Rules for all terms of eligiblity and any restrictions.

Teams

  • No limit on the number of members per team but all team members must meet the eligibility requirements to participate and win.
  • Team name shall not comprise any terms that violate intellectual property or rights of privacy or publicity or other applicable laws and regulations.
  • Each team will have an appointed team captain authorized to receive communications.
  • Participant registration will close on April 21 11:59 PM PST.
  • Team leaders can invite new members until May 21 11:59 PM PST.
  • Participants are not allowed to join more than one team at the same time.

Submissions

  • Solution must run in the provided ROS framework (Ubuntu 14.04, ROS Indigo)
  • Submission will be considered ineligible if it was developed using code containing or depending on software that is not approved by the Open Source Initiative, or a license that prohibits commercial use.
  • Code must demonstrably run in real time. This means at least 10Hz on our evaluation platform with a Titan X and I7.
  • Each submission will take 1-2 minutes to submit and process, to prevent teams from using automatic submissions to guess the solution and artificially inflate their scores.
  • All code submitted will be open-sourced, and there should be no expectation of maintaining exclusive IP over submitted code.
  • No hand-labelling of test dataset allowed.

Prizes

  • To qualify as finalist and to receive any prizes teams must:
    • Submit runnable code after close Round 1 that will be spot-checked to ensure that teams that progress to the final round have not violated any Contest rules.
    • Submit runnable code (with documentation and description of resources/dependencies required to run the solution) with reproducible results at the close of Round 2. If running code results in output and score that differs significantly from the last submission and leaderboard score, that team will be disqualified.
    • Submit completed, signed and returned documentation required by Udacity, including but not limited to, Affidavit of Eligibility, release of liability (except where prohibited), publicity release form, and a completed IRS W-9 or W-8BEN form, from each team member at the close of Round 2.
  • Prizes shall be awarded equally and pro rata to all registered team members who meet eligibility requirements and complete Participant Release forms.
  • Udacity cannot provide any guarantees that visas will be available from all jurisdictions and to all individuals.
  • No cash value for the travel expense subsidy will be awarded to teams that do not travel to attend the final award ceremony.
  • “Top 5 teams” includes the first place team.
  • At the end of Round 1, teams with the top 50 scores will move on to Round 2.
  • At the end of Round 2, in the event that teams have tying scores, the run-time of the code submitted to Udacity will be used to determine the winning team.

Terms & Conditions

For complete set of Terms & Conditions see here.