Developer

Machine Learning is going viral...

❝Using #DeepLearning when all you needed was a few if statements. #MachineLearning #DataScience❞ —https://twitter.com/randal_olson/status/927157485240311808

Agenda Machine Learning Domain Dataset

Machine Learning

Artificial Intelligence Machine Learning Deep Learning !

https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/

General AI Human characteristics

AI Winter

Narrow AI Specific tasks

Facebook alt="Image may contain: ocean, sky, bridge, cloud, outdoor, water and nature"

https://www.facebook.com/ MaxIsDrawing/photos/a. 182284588581885/912506332 226370/?type=3&permPage=1

❝I made an AI watch 105 Tour de France races along with viewer numbers to make it simulate the most interesting Tour de France race possible. This is what it made. #ai #ml #machinelearning #artificialintelligence #wat❞ —https://twitter.com/thetafferboy/status/1039519923150630912

PS: A lot of Chatbots are not AI

❝Having been introduced to the wonders of Visual Chatbot by @JanelleCShane's AI experiments, I decided to see what it would think of two-headed snakes. As it turns out: mostly bananas. shutdowntheAI —https://twitter.com/AmeliaRMellor/status/1031006460598149121

❝Alice: I love stateless protocols! Bob: There has to be something bad about them. Alice: Bad about what?❞ —https://twitter.com/znjp/status/933405548678021120

Machine Learning Algorithms parse data → learn from it → make a determination or prediction "Trained" machine

❝Learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.❞

❝"Machine Learning is an emerging tech!" Logistic regression 1958 Hidden Markov Model 1960 Support Vector Machine 1963 k-nearest neighbors 1967 Artificial Neural Networks 1975 Expectation Maximization 1977 Decision tree 1986 Q-learning 1989 Random forest 1995❞ —https://twitter.com/farbodsaraf/status/977916871000412160

https://twitter.com/ algorithmia/status/ 1009486664933052416

❝But saying "powered by AI" is like saying you’re "powered by the internet" or "powered by computer code". By itself, it means nothing.❞ —https://twitter.com/jensenharris/status/999119292086960128

Learning Regression Ranking Clustering

https://twitter.com/ShawnWildermuth/status/932724124237123584

For children and machines Watch your language

Statistics 101: Linear Regression

❝We are leveraging machine learning.❞

https://twitter.com/LesGuessing/status/997146590442799105

Supervised Learning Input features and output labels are defined

Unsupervised Learning Unlabeled dataset Discover hidden relationships

https://xkcd.com/882/

Reinforcement Learning Feedback loop to optimize some parameter

Deep Learning Neural network producing a probability vector Lots of training and parallelization

https://www.youtube.com/watch?v=bxe2T-V8XRs

Access to a unique data set is inherently valuable

❝"What's the difference between AI and ML?" "It's AI when you're raising money, it's ML when you're trying to hire people."❞ —https://twitter.com/WAWilsonIV/status/925599712849174528

Domain

Patterns Trend (stationary) Cyclical Seasonal Irregular

Anomaly Point Anomalies Contextual Anomalies Collective Anomalies

Breakouts Mean Shift Ramp Up

Anomaly Detection with Machine Learning Supervised Learning Unsupervised Learning

Examples IT operations: Spiking 500s Security analytics: Unusual DNS activity Business analytics: Rare log message

Visual Inspection Complex, fast moving data Humans not made to stare at graphs Easy to miss

Where is the Anomaly?

Static Rules Definition False positives & negatives Tuning and adjustment

Which threshold?

Machine learning

! ❝OH: "Do you run any CPU intensive application on your laptop? Like, machine learning, or Slack?" ❞ —https://twitter.com/jpetazzo/status/932464823530430464

Frameworks TensorFlow Keras SciKit ...

How to build ML pipelines? ETL Data storage Optimization algorithms

❝I see you expected clean data. That's cute.❞

Model Baseline: What is normal?

Unsupervised

Evolves "Online" model learns continuously and ages out data

Single Time Series Example: Unusual traffic?

Multiple Time Series Multiple metrics or single metric split up Each series modeled independently Example: Unusual activity by country?

Dataset

nginx access log { "source": "/home/ec2-user/data/production-4/prod4elasticlog/_logs/access-logs541.log", "beat": { "hostname": "ip-172-31-5-206", "name": "ip-172-31-5-206", "version": "5.4.0" }, "@timestamp": "2017-03-08T11:44:51.562Z", "read_timestamp": "2017-06-20T08:49:58.538Z", "fileset": { "name": "access", "module": "nginx" },

"nginx": { "access": { "body_sent": { "bytes": "3262" }, "url": "/assets/blt1afcb054f02e257c/logo-activision.svg", "geoip": { "continent_name": "Asia", "country_iso_code": "IN", "location": { "lat": 20, "lon": 77 } },

"response_code": "200", "user_agent": { "device": "Other", "os_name": "Other", "os": "Other", "name": "Other" }, "http_version": "1.1", "method": "GET", "remote_ip": "192.19.197.26" } }, "prospector": { "type": "log" } }

Most of the internet went down

PS: When everything is on , nobody cares about your downloads

Counterfactual Reasoning Which host / IP / ... is involved in the anomaly

Combine Multiple Models

Correlation ≠ causation

https://xkcd.com/552/

https://xkcd.com/925/

Common problems Correlated features will mess up any model

Common problems Throw out most features if they are just noise

More features Future predictions

Conclusion

Agenda Machine Learning Domain Dataset

Rules of Machine Learning: Best Practices for ML Engineering http://martin.zinkevich.org/rules_of_ml/ rules_of_ml.pdf

43 rules Rule #1: Don’t be afraid to launch a product without machine learning Rule #14: Starting with an interpretable model makes debugging easier Rule #16: Plan to launch and iterate

Machine Learning ohne Hype Philipp Krenn @xeraa