Machine Learning ohne Hype

A presentation at IT Tage in December 2018 in Frankfurt, Germany by Philipp Krenn

Slide 1

Slide 1

Slide 2

Slide 2

Developer

Slide 3

Slide 3

Machine Learning is going viral...

Slide 4

Slide 4

Slide 5

Slide 5

Slide 6

Slide 6

❝Using #DeepLearning when all you needed was a few if statements. #MachineLearning #DataScience❞ —https://twitter.com/randal_olson/status/927157485240311808

Slide 7

Slide 7

Slide 8

Slide 8

Slide 9

Slide 9

Slide 10

Slide 10

Agenda Machine Learning Domain Dataset

Slide 11

Slide 11

Machine Learning

Slide 12

Slide 12

Artificial Intelligence Machine Learning Deep Learning !

Slide 13

Slide 13

https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/

Slide 14

Slide 14

General AI Human characteristics

Slide 15

Slide 15

AI Winter

Slide 16

Slide 16

Narrow AI Specific tasks

Slide 17

Slide 17

Slide 18

Slide 18

Facebook alt="Image may contain: ocean, sky, bridge, cloud, outdoor, water and nature"

Slide 19

Slide 19

https://www.facebook.com/ MaxIsDrawing/photos/a. 182284588581885/912506332 226370/?type=3&permPage=1

Slide 20

Slide 20

Slide 21

Slide 21

Slide 22

Slide 22

❝I made an AI watch 105 Tour de France races along with viewer numbers to make it simulate the most interesting Tour de France race possible. This is what it made. #ai #ml #machinelearning #artificialintelligence #wat❞ —https://twitter.com/thetafferboy/status/1039519923150630912

Slide 23

Slide 23

Slide 24

Slide 24

PS: A lot of Chatbots are not AI

Slide 25

Slide 25

Slide 26

Slide 26

Slide 27

Slide 27

❝Having been introduced to the wonders of Visual Chatbot by @JanelleCShane's AI experiments, I decided to see what it would think of two-headed snakes. As it turns out: mostly bananas. shutdowntheAI —https://twitter.com/AmeliaRMellor/status/1031006460598149121

Slide 28

Slide 28

Slide 29

Slide 29

Slide 30

Slide 30

Slide 31

Slide 31

Slide 32

Slide 32

❝Alice: I love stateless protocols! Bob: There has to be something bad about them. Alice: Bad about what?❞ —https://twitter.com/znjp/status/933405548678021120

Slide 33

Slide 33

Machine Learning Algorithms parse data → learn from it → make a determination or prediction "Trained" machine

Slide 34

Slide 34

❝Learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.❞

Slide 35

Slide 35

❝"Machine Learning is an emerging tech!" Logistic regression 1958 Hidden Markov Model 1960 Support Vector Machine 1963 k-nearest neighbors 1967 Artificial Neural Networks 1975 Expectation Maximization 1977 Decision tree 1986 Q-learning 1989 Random forest 1995❞ —https://twitter.com/farbodsaraf/status/977916871000412160

Slide 36

Slide 36

https://twitter.com/ algorithmia/status/ 1009486664933052416

Slide 37

Slide 37

❝But saying "powered by AI" is like saying you’re "powered by the internet" or "powered by computer code". By itself, it means nothing.❞ —https://twitter.com/jensenharris/status/999119292086960128

Slide 38

Slide 38

Learning Regression Ranking Clustering

Slide 39

Slide 39

https://twitter.com/ShawnWildermuth/status/932724124237123584

Slide 40

Slide 40

Slide 41

Slide 41

For children and machines Watch your language

Slide 42

Slide 42

Statistics 101: Linear Regression

Slide 43

Slide 43

❝We are leveraging machine learning.❞

Slide 44

Slide 44

https://twitter.com/LesGuessing/status/997146590442799105

Slide 45

Slide 45

Supervised Learning Input features and output labels are defined

Slide 46

Slide 46

Unsupervised Learning Unlabeled dataset Discover hidden relationships

Slide 47

Slide 47

https://xkcd.com/882/

Slide 48

Slide 48

Slide 49

Slide 49

Slide 50

Slide 50

Reinforcement Learning Feedback loop to optimize some parameter

Slide 51

Slide 51

Deep Learning Neural network producing a probability vector Lots of training and parallelization

Slide 52

Slide 52

https://www.youtube.com/watch?v=bxe2T-V8XRs

Slide 53

Slide 53

Access to a unique data set is inherently valuable

Slide 54

Slide 54

Slide 55

Slide 55

❝"What's the difference between AI and ML?" "It's AI when you're raising money, it's ML when you're trying to hire people."❞ —https://twitter.com/WAWilsonIV/status/925599712849174528

Slide 56

Slide 56

Domain

Slide 57

Slide 57

Patterns Trend (stationary) Cyclical Seasonal Irregular

Slide 58

Slide 58

Anomaly Point Anomalies Contextual Anomalies Collective Anomalies

Slide 59

Slide 59

Breakouts Mean Shift Ramp Up

Slide 60

Slide 60

Anomaly Detection with Machine Learning Supervised Learning Unsupervised Learning

Slide 61

Slide 61

Examples IT operations: Spiking 500s Security analytics: Unusual DNS activity Business analytics: Rare log message

Slide 62

Slide 62

Visual Inspection Complex, fast moving data Humans not made to stare at graphs Easy to miss

Slide 63

Slide 63

Where is the Anomaly?

Slide 64

Slide 64

Static Rules Definition False positives & negatives Tuning and adjustment

Slide 65

Slide 65

Which threshold?

Slide 66

Slide 66

Machine learning

Slide 67

Slide 67

! ❝OH: "Do you run any CPU intensive application on your laptop? Like, machine learning, or Slack?" ❞ —https://twitter.com/jpetazzo/status/932464823530430464

Slide 68

Slide 68

Frameworks TensorFlow Keras SciKit ...

Slide 69

Slide 69

How to build ML pipelines? ETL Data storage Optimization algorithms

Slide 70

Slide 70

❝I see you expected clean data. That's cute.❞

Slide 71

Slide 71

Slide 72

Slide 72

Model Baseline: What is normal?

Slide 73

Slide 73

Slide 74

Slide 74

Slide 75

Slide 75

Unsupervised

Slide 76

Slide 76

Evolves "Online" model learns continuously and ages out data

Slide 77

Slide 77

Single Time Series Example: Unusual traffic?

Slide 78

Slide 78

Multiple Time Series Multiple metrics or single metric split up Each series modeled independently Example: Unusual activity by country?

Slide 79

Slide 79

Slide 80

Slide 80

Dataset

Slide 81

Slide 81

nginx access log { "source": "/home/ec2-user/data/production-4/prod4elasticlog/_logs/access-logs541.log", "beat": { "hostname": "ip-172-31-5-206", "name": "ip-172-31-5-206", "version": "5.4.0" }, "@timestamp": "2017-03-08T11:44:51.562Z", "read_timestamp": "2017-06-20T08:49:58.538Z", "fileset": { "name": "access", "module": "nginx" },

Slide 82

Slide 82

"nginx": { "access": { "body_sent": { "bytes": "3262" }, "url": "/assets/blt1afcb054f02e257c/logo-activision.svg", "geoip": { "continent_name": "Asia", "country_iso_code": "IN", "location": { "lat": 20, "lon": 77 } },

Slide 83

Slide 83

"response_code": "200", "user_agent": { "device": "Other", "os_name": "Other", "os": "Other", "name": "Other" }, "http_version": "1.1", "method": "GET", "remote_ip": "192.19.197.26" } }, "prospector": { "type": "log" } }

Slide 84

Slide 84

Slide 85

Slide 85

Slide 86

Slide 86

Slide 87

Slide 87

Slide 88

Slide 88

Slide 89

Slide 89

Slide 90

Slide 90

Most of the internet went down

Slide 91

Slide 91

PS: When everything is on , nobody cares about your downloads

Slide 92

Slide 92

Slide 93

Slide 93

Counterfactual Reasoning Which host / IP / ... is involved in the anomaly

Slide 94

Slide 94

Slide 95

Slide 95

Slide 96

Slide 96

Slide 97

Slide 97

Combine Multiple Models

Slide 98

Slide 98

Slide 99

Slide 99

Correlation ≠ causation

Slide 100

Slide 100

https://xkcd.com/552/

Slide 101

Slide 101

https://xkcd.com/925/

Slide 102

Slide 102

Common problems Correlated features will mess up any model

Slide 103

Slide 103

Common problems Throw out most features if they are just noise

Slide 104

Slide 104

More features Future predictions

Slide 105

Slide 105

Slide 106

Slide 106

Conclusion

Slide 107

Slide 107

Agenda Machine Learning Domain Dataset

Slide 108

Slide 108

Rules of Machine Learning: Best Practices for ML Engineering http://martin.zinkevich.org/rules_of_ml/ rules_of_ml.pdf

Slide 109

Slide 109

43 rules Rule #1: Don’t be afraid to launch a product without machine learning Rule #14: Starting with an interpretable model makes debugging easier Rule #16: Plan to launch and iterate

Slide 110

Slide 110

Machine Learning ohne Hype Philipp Krenn @xeraa