Machine Learning ohne Hype

A presentation at Developer Week Nürnberg in June 2018 in Nuremberg, Germany by Philipp Krenn

Slide 1

Slide 1

Machine Learning 44444 ohne Hype 4444 Philipp Krenn 44 @xeraa

Slide 2

Slide 2

44444 Developer !

Slide 3

Slide 3

Slide 4

Slide 4

Machine Learning is going viral...

Slide 5

Slide 5

Slide 6

Slide 6

Slide 7

Slide 7

❝ Using #DeepLearning when all you needed was a few if statements. #MachineLearning #DataScience ❞ —https://twitter.com/randal_olson/status/927157485240311808

Slide 8

Slide 8

Slide 9

Slide 9

Slide 10

Slide 10

Slide 11

Slide 11

Agenda Machine Learning Domain Dataset

Slide 12

Slide 12

Machine Learning

Slide 13

Slide 13

Artificial Intelligence Machine Learning Deep Learning !

Slide 14

Slide 14

https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/

Slide 15

Slide 15

General AI Human characteristics

Slide 16

Slide 16

AI Winter

Slide 17

Slide 17

Narrow AI Specific tasks

Slide 18

Slide 18

Slide 19

Slide 19

Facebook alt="Image may contain: ocean, sky, bridge, cloud, outdoor, water and nature"

Slide 20

Slide 20

Slide 21

Slide 21

Slide 22

Slide 22

PS: A lot of Chatbots are not AI

Slide 23

Slide 23

Slide 24

Slide 24

Slide 25

Slide 25

❝ Alice: I love stateless protocols! Bob: " ere has to be something bad about them. Alice: Bad about what? ❞ — https://twitter.com/znjp/status/933405548678021120

Slide 26

Slide 26

Machine Learning Algorithms parse data → learn from it → make a determination or prediction "Trained" machine

Slide 27

Slide 27

❝ Learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. ❞

Slide 28

Slide 28

❝ "Machine Learning is an emerging tech!" Logistic regression 1958 Hidden Markov Model 1960 Support Vector Machine 1963 k-nearest neighbors 1967 Artificial Neural Networks 1975 Expectation Maximization 1977 Decision tree 1986 Q-learning 1989 Random forest 1995 ❞ — https://twitter.com/farbodsaraf/status/977916871000412160

Slide 29

Slide 29

https://twitter.com/ algorithmia/status/ 1009486664933052416

Slide 30

Slide 30

❝ But saying "powered by AI" is like saying you’re "powered by the internet" or "powered by computer code". By itself, it means nothing. ❞ — https://twitter.com/jensenharris/status/999119292086960128

Slide 31

Slide 31

Learning Regression Ranking Clustering

Slide 32

Slide 32

https://twitter.com/ShawnWildermuth/status/932724124237123584

Slide 33

Slide 33

Slide 34

Slide 34

For children and machines Watch your language

Slide 35

Slide 35

Statistics 101: Linear Regression

Slide 36

Slide 36

❝ We are leveraging machine learning. ❞

Slide 37

Slide 37

https://twitter.com/LesGuessing/status/997146590442799105

Slide 38

Slide 38

Supervised Learning Input features and output labels are defined

Slide 39

Slide 39

Unsupervised Learning Unlabeled dataset Discover hidden relationships

Slide 40

Slide 40

https://xkcd.com/882/

Slide 41

Slide 41

Slide 42

Slide 42

Slide 43

Slide 43

Reinforcement Learning Feedback loop to optimize some parameter

Slide 44

Slide 44

Deep Learning Neural network producing a probability vector Lots of training and parallelization

Slide 45

Slide 45

https://www.youtube.com/watch?v=bxe2T-V8XRs

Slide 46

Slide 46

Access to a unique data set is inherently valuable

Slide 47

Slide 47

Slide 48

Slide 48

❝ "What's the difference between AI and ML?" "It's AI when you're raising money, it's ML when you're trying to hire people." ❞ —https://twitter.com/WAWilsonIV/status/925599712849174528

Slide 49

Slide 49

Domain

Slide 50

Slide 50

Patterns Trend ( stationary ) Cyclical Seasonal Irregular

Slide 51

Slide 51

Anomaly Point Anomalies Contextual Anomalies Collective Anomalies

Slide 52

Slide 52

Breakouts Mean Shift Ramp Up

Slide 53

Slide 53

Anomaly Detection with Machine Learning Supervised Learning Unsupervised Learning

Slide 54

Slide 54

Examples IT operations: Spiking 500s Security analytics: Unusual DNS activity Business analytics: Rare log message

Slide 55

Slide 55

Visual Inspection Complex, fast moving data Humans not made to stare at graphs Easy to miss

Slide 56

Slide 56

Where is the Anomaly?

Slide 57

Slide 57

Static Rules Definition False positives & negatives Tuning and adjustment

Slide 58

Slide 58

Which threshold?

Slide 59

Slide 59

444444444444 Machine learning

Slide 60

Slide 60

❝ OH: "Do you run any CPU intensive application on your laptop? Like, machine learning, or Slack?" ! ❞ — https://twitter.com/jpetazzo/status/932464823530430464

Slide 61

Slide 61

Frameworks TensorFlow Keras SciKit ...

Slide 62

Slide 62

How to build ML pipelines? ETL Data storage Optimization algorithms

Slide 63

Slide 63

❝ I see you expected clean data. " at's cute. ❞

Slide 64

Slide 64

Slide 65

Slide 65

Model Baseline: What is normal?

Slide 66

Slide 66

Slide 67

Slide 67

Slide 68

Slide 68

Unsupervised

Slide 69

Slide 69

Evolves "Online" model learns continuously and ages out data

Slide 70

Slide 70

Single Time Series Example: Unusual tra ! c?

Slide 71

Slide 71

Multiple Time Series Multiple metrics or single metric split up Each series modeled independently Example: Unusual activity by country?

Slide 72

Slide 72

Slide 73

Slide 73

Dataset

Slide 74

Slide 74

nginx access log {

"source" : "/home/ec2-user/data/production-4/prod4elasticlog/_logs/access-logs541.log" ,

"beat" : {

"hostname" : "ip-172-31-5-206" ,

"name" : "ip-172-31-5-206" ,

"version" : "5.4.0" },

"@timestamp" : "2017-03-08T11:44:51.562Z" ,

"read_timestamp" : "2017-06-20T08:49:58.538Z" ,

"fileset" : {

"name" : "access" ,

"module" : "nginx" },

Slide 75

Slide 75

"nginx" : {

"access" : {

"body_sent" : {

"bytes" : "3262" },

"url" : "/assets/blt1afcb054f02e257c/logo-activision.svg" ,

"geoip" : {

"continent_name" : "Asia" ,

"country_iso_code" : "IN" ,

"location" : {

"lat" : 20 ,

"lon" : 77 } },

Slide 76

Slide 76

"response_code" : "200" ,

"user_agent" : {

"device" : "Other" ,

"os_name" : "Other" ,

"os" : "Other" ,

"name" : "Other" },

"http_version" : "1.1" ,

"method" : "GET" ,

"remote_ip" : "192.19.197.26" } },

"prospector" : {

"type" : "log" } }

Slide 77

Slide 77

Slide 78

Slide 78

Slide 79

Slide 79

Slide 80

Slide 80

Slide 81

Slide 81

Slide 82

Slide 82

Slide 83

Slide 83

Most of the internet went down

Slide 84

Slide 84

PS: When everything is on ! , nobody cares about your downloads

Slide 85

Slide 85

Slide 86

Slide 86

Counterfactual Reasoning Which host / IP / ... is involved in the anomaly

Slide 87

Slide 87

Slide 88

Slide 88

Slide 89

Slide 89

Slide 90

Slide 90

Combine Multiple Models

Slide 91

Slide 91

Slide 92

Slide 92

Correlation ≠ causation

Slide 93

Slide 93

https://xkcd.com/552/

Slide 94

Slide 94

https://xkcd.com/925/

Slide 95

Slide 95

Common problems Correlated features will mess up any model

Slide 96

Slide 96

Common problems ! row out most features if they are just noise

Slide 97

Slide 97

More features Future predictions

Slide 98

Slide 98

Slide 99

Slide 99

More features Clustering

Slide 100

Slide 100

Conclusion

Slide 101

Slide 101

Agenda Machine Learning Domain Dataset

Slide 102

Slide 102

Rules of Machine Learning: Best Practices for ML Engineering http://martin.zinkevich.org/rules_of_ml/ rules_of_ml.pdf

Slide 103

Slide 103

43 rules Rule #1: Don’t be afraid to launch a product without machine learning Rule #14: Starting with an interpretable model makes debugging easier Rule #16: Plan to launch and iterate

Slide 104

Slide 104

Machine Learning 44444 ohne Hype 4444 Philipp Krenn 44 @xeraa