Machine Learning 44444 ohne Hype 4444 Philipp Krenn 44 @xeraa

44444 Developer !

Machine Learning is going viral...

❝ Using #DeepLearning when all you needed was a few if statements. #MachineLearning #DataScience ❞ —https://twitter.com/randal_olson/status/927157485240311808

Agenda Machine Learning Domain Dataset

Machine Learning

Artificial Intelligence Machine Learning Deep Learning !

https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/

General AI Human characteristics

AI Winter

Narrow AI Specific tasks

Facebook alt="Image may contain: ocean, sky, bridge, cloud, outdoor, water and nature"

PS: A lot of Chatbots are not AI

❝ Alice: I love stateless protocols! Bob: " ere has to be something bad about them. Alice: Bad about what? ❞ — https://twitter.com/znjp/status/933405548678021120

Machine Learning Algorithms parse data → learn from it → make a determination or prediction "Trained" machine

❝ Learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. ❞

❝ "Machine Learning is an emerging tech!" Logistic regression 1958 Hidden Markov Model 1960 Support Vector Machine 1963 k-nearest neighbors 1967 Artificial Neural Networks 1975 Expectation Maximization 1977 Decision tree 1986 Q-learning 1989 Random forest 1995 ❞ — https://twitter.com/farbodsaraf/status/977916871000412160

https://twitter.com/ algorithmia/status/ 1009486664933052416

❝ But saying "powered by AI" is like saying you’re "powered by the internet" or "powered by computer code". By itself, it means nothing. ❞ — https://twitter.com/jensenharris/status/999119292086960128

Learning Regression Ranking Clustering

https://twitter.com/ShawnWildermuth/status/932724124237123584

For children and machines Watch your language

Statistics 101: Linear Regression

❝ We are leveraging machine learning. ❞

https://twitter.com/LesGuessing/status/997146590442799105

Supervised Learning Input features and output labels are defined

Unsupervised Learning Unlabeled dataset Discover hidden relationships

https://xkcd.com/882/

Reinforcement Learning Feedback loop to optimize some parameter

Deep Learning Neural network producing a probability vector Lots of training and parallelization

https://www.youtube.com/watch?v=bxe2T-V8XRs

Access to a unique data set is inherently valuable

❝ "What's the difference between AI and ML?" "It's AI when you're raising money, it's ML when you're trying to hire people." ❞ —https://twitter.com/WAWilsonIV/status/925599712849174528

Domain

Patterns Trend ( stationary ) Cyclical Seasonal Irregular

Anomaly Point Anomalies Contextual Anomalies Collective Anomalies

Breakouts Mean Shift Ramp Up

Anomaly Detection with Machine Learning Supervised Learning Unsupervised Learning

Examples IT operations: Spiking 500s Security analytics: Unusual DNS activity Business analytics: Rare log message

Visual Inspection Complex, fast moving data Humans not made to stare at graphs Easy to miss

Where is the Anomaly?

Static Rules Definition False positives & negatives Tuning and adjustment

Which threshold?

444444444444 Machine learning

❝ OH: "Do you run any CPU intensive application on your laptop? Like, machine learning, or Slack?" ! ❞ — https://twitter.com/jpetazzo/status/932464823530430464

Frameworks TensorFlow Keras SciKit ...

How to build ML pipelines? ETL Data storage Optimization algorithms

❝ I see you expected clean data. " at's cute. ❞

Model Baseline: What is normal?

Unsupervised

Evolves "Online" model learns continuously and ages out data

Single Time Series Example: Unusual tra ! c?

Multiple Time Series Multiple metrics or single metric split up Each series modeled independently Example: Unusual activity by country?

Dataset

nginx access log {

"source" : "/home/ec2-user/data/production-4/prod4elasticlog/_logs/access-logs541.log" ,

"beat" : {

"hostname" : "ip-172-31-5-206" ,

"name" : "ip-172-31-5-206" ,

"version" : "5.4.0" },

"@timestamp" : "2017-03-08T11:44:51.562Z" ,

"read_timestamp" : "2017-06-20T08:49:58.538Z" ,

"fileset" : {

"name" : "access" ,

"module" : "nginx" },

"nginx" : {

"access" : {

"body_sent" : {

"bytes" : "3262" },

"url" : "/assets/blt1afcb054f02e257c/logo-activision.svg" ,

"geoip" : {

"continent_name" : "Asia" ,

"country_iso_code" : "IN" ,

"location" : {

"lat" : 20 ,

"lon" : 77 } },

"response_code" : "200" ,

"user_agent" : {

"device" : "Other" ,

"os_name" : "Other" ,

"os" : "Other" ,

"name" : "Other" },

"http_version" : "1.1" ,

"method" : "GET" ,

"remote_ip" : "192.19.197.26" } },

"prospector" : {

"type" : "log" } }

Most of the internet went down

PS: When everything is on ! , nobody cares about your downloads

Counterfactual Reasoning Which host / IP / ... is involved in the anomaly

Combine Multiple Models

Correlation ≠ causation

https://xkcd.com/552/

https://xkcd.com/925/

Common problems Correlated features will mess up any model

Common problems ! row out most features if they are just noise

More features Future predictions

More features Clustering

Conclusion

Agenda Machine Learning Domain Dataset

Rules of Machine Learning: Best Practices for ML Engineering http://martin.zinkevich.org/rules_of_ml/ rules_of_ml.pdf

43 rules Rule #1: Don’t be afraid to launch a product without machine learning Rule #14: Starting with an interpretable model makes debugging easier Rule #16: Plan to launch and iterate

Machine Learning 44444 ohne Hype 4444 Philipp Krenn 44 @xeraa