Cluster analysis - Dati World

Похожие документы
IL MINISTRO DEL LAVORO, DELLA SALUTE E DELLE POLITICHE SOCIALI. di concerto con IL MINISTRO DELL'ECONOMIA E DELLE FINANZE

Sezione Regionale Abruzzo e Molise

Differenze incolmabili nelle strutture per età agiranno come fattori push nei paesi di emigrazione e fattori pull nei paesi di immigrazione

Corruzione, l Italia migliora ma è comunque terzultima in Europa

Alcuni dati economici

1-Stranieri residenti a Torino per cittadinanza e genere

KIT DI PRECARICA tipo PC 11.1 I 01-12

1.Stranieri residenti a Torino per cittadinanza e genere

KRESTON GV Italy Audit Srl The new brand, the new vision

2011/ / / / /16 Post laurea. Primo e secondo. Totale secondo. Post laurea. Post laurea. Post laurea. Post laurea.

obbligazioni corporate e semi-government (tenute presso il corrispondente estero) emesse prima del 01/01/1999;

Lista dei paesi terzi

MODULAR DOOR SYSTEM. ISO Classe 1 MDS1.

2007/ / / / /12 Post laurea. Primo e secondo. Totale secondo. Post laurea. Post laurea. Post laurea. Post laurea.

2005/ / / / /10 Post laurea. Primo e secondo. Totale secondo. Post laurea. Post laurea. Post laurea. Post laurea.

2008/ / / / /13 Post laurea. Primo e secondo. Totale secondo. Post laurea. Post laurea. Post laurea. Post laurea.

Report di osservazione

Una Malattia dei Bronchi e dei Polmoni Cronica Ostruttiva SCUOLA BPCO 2012 A.C.O. S. FILIPPO NERI ROMA

POPOLAZIONE STRANIERA RESIDENTE ASL NAPOLI 2 NORD

Stranieri e italiani per acquisizione della cittadinanza a Palermo anno 2017

Elenco Bandiere del Mondo

Il Sistema Bus KNX. Standard mondiale ISO/IEC Milano, 14 dicembre Renato Ricci Diego Pastore

Le novità sui materiali specifici a rischio

Museo Internazionale della Croce Rossa. Mariagrazia Baccolo Croce Rossa Italiana _ Castiglione delle Stiviere

DIAGNOSTICA DEI RESIDUI

Tavola Stima (a) dei nati per area geografica e singolo paese di cittadinanza - Anni di iscrizione

Procedura per presentazione domande di brevetto nazionale ed estensione in Europa e/o internazionale: costi e valutazioni

TABELLA 3 - CLASSIFICAZIONE PER AREE PAESI ESTERI

Транскрипт:

Cluster analysis - Dati World Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 1/47

World data Un data set sui Paesi che contiene informazioni demografiche, economiche, sulla sanità e sulle abitudini alimentari. Originariamente disponibile a https://docs.google.com/spreadsheets/d/1w8bnx48xz4yba9pf0xhinrh single=true&gid=0 library(ggplot2) library(data.table) world=read.csv("http://www.cs.unitn.it/~taufer/data/world.csv") str(world) ## 'data.frame': 86 obs. of 102 variables: ## $ Countries ## $ Average.latitude..º. ## $ Annual.insolation..W.h.m2.day. ## $ Energy..kcal.day. ## $ Protein..g.day. ## $ Fats..g.day. ## $ Carbohydrates..g.day. ## $ Animal.Products...kcal.day. ## $ Animal.Fats..kcal.day. ## $ Bovine.Meat..kcal.day. ## $ Butter..Ghee..kcal.day. ## $ Cheese..kcal.day. ## $ Eggs..kcal.day. ## $ Fats..Animals..Raw..kcal.day. ## $ Fish..Seafood..kcal.day. ## $ Freshwater.Fish..kcal.day. ## $ Honey..kcal.day. ## $ Meat..kcal.day. ## $ Milk...Excluding.Butter..kcal.day. ## $ Milk..Whole..kcal.day. ## $ Mutton...Goat.Meat..kcal.day. ## $ Offals..Edible..kcal.day. ## $ Pelagic.Fish..kcal.day. ## $ Pigmeat..kcal.day. ## $ Poultry.Meat..kcal.day. ## $ Vegetal.Products...kcal.day. ## $ Alcoholic.Beverages..kcal.day. ## $ Apples..kcal.day. ## $ Bananas..kcal.day. ## $ Beans..kcal.day. ## $ Cereals...Excluding.Beer..kcal.day. ## $ Coconut.Oil..kcal.day. ## $ Coffee..kcal.day. ## $ Fruits...Excluding.Wine..kcal.day. ## $ Nuts..kcal.day. ## $ Olive.Oil..kcal.day. ## $ Palm.Oil..kcal.day. file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 2/47

## $ Potatoes..kcal.day. ## $ Pulses..kcal.day. ## $ Rice..Milled.Equivalent...kcal.day. ## $ Rice..Paddy.Equivalent...kcal.day. ## $ Roots...Tuber.Dry.Equiv..kcal.day. ## $ Soyabean.Oil..kcal.day. ## $ Starchy.Roots..kcal.day. ## $ Sugar...Sweeteners..kcal.day. ## $ Sugar..Raw.Equivalent...kcal.day. ## $ Sugar..Raw.Equivalent..kcal.day. ## $ Sugar..Refined.Equiv..kcal.day. ## $ Vegetable.Oils..kcal.day. ## $ Vegetables..kcal.day. ## $ Wheat..kcal.day. ## $ Wine..kcal.day. ## $ Gross.national.income.per.capita..PPP.international... ## $ Population.annual.growth.rate... ## $ Population.in.urban.areas... ## $ Population.median.age..years...2006 ## $ Population.proportion.over.60...2006 ## $ Population.proportion.under.15...2006 ## $ Total.fertility.rate..per.female. ## $ Per.capita.recorded.alcohol.consumption..litres.of.pure.alcohol..among.adults...1 ## $ Population.with.sustainable.access.to.improved.drinking.water.sources...rural ## $ Population.with.sustainable.access.to.improved.drinking.water.sources...total ## $ Population.with.sustainable.access.to.improved.drinking.water.sources...urban ## $ Population.with.sustainable.access.to.improved.sanitation...rural ## $ Population.with.sustainable.access.to.improved.sanitation...total ## $ Population.with.sustainable.access.to.improved.sanitation...urban ## $ Prevalence.of.current.tobacco.use.among.adults...15.years...both.sexes..2005 ## $ Prevalence.of.current.tobacco.use.among.adults...15.years...female..2005 ## $ Prevalence.of.current.tobacco.use.among.adults...15.years...male..2005 ## $ Mean.total.cholesterol..men..mg.dl...2005 ## $ Mean.total.cholesterol..female..mg.dl...2005 ## $ Diabetes.crude.prevalence..adults.aged.20.to.79... ## $ Systolic.blood.pressure..adults.aged.15.and.above..men..mmHg. ## $ Systolic.blood.pressure..adults.aged.15.and.above..female..mmHg. ## $ Obesity.prevalence..men... ## $ Obesity.prevalence..female... ## $ Adult.mortality.rate..probability.of.dying.between.15.to.60.years.per.1000.populat ## $ Adult.mortality.rate..probability.of.dying.between.15.to.60.years.per.1000.populat ## $ Adult.mortality.rate..probability.of.dying.between.15.to.60.years.per.1000.populat ## $ Age.standardized.mortality.rate.for.cancer..per.100.000.population...2002 ## $ Age.standardized.mortality.rate.for.cardiovascular.diseases..per.100.000.populatio ## $ Age.standardized.mortality.rate.for.injuries..per.100.000.population...2002 ## $ Age.standardized.mortality.rate.for.non.communicable.diseases..per.100.000.populat ## $ Healthy.life.expectancy..HALE..at.birth..years..both.sexes ## $ Healthy.life.expectancy..HALE..at.birth..years..female ## $ Healthy.life.expectancy..HALE..at.birth..years..male ## $ Incidence.of.tuberculosis..per.100.000.population.per.year. ## $ Infant.mortality.rate..per.1.000.live.births..both.sexes ## $ Infant.mortality.rate..per.1.000.live.births..female ## $ Infant.mortality.rate..per.1.000.live.births..male ## $ Life.expectancy.at.birth..years..both.sexes ## $ Life.expectancy.at.birth..years..female ## $ Life.expectancy.at.birth..years..male ## $ Maternal.mortality.ratio..per.100.000.live.births...2005 ## $ Neonatal.mortality.rate..per.1.000.live.births...2004 ## $ Prevalence.of.tuberculosis..per.100.000.population. ## $ Under.5.mortality.rate..probability.of.dying.by.age.5.per.1000.live.births..both.s ## $ Under.5.mortality.rate..probability.of.dying.by.age.5.per.1000.live.births..female ## $ Under.5.mortality.rate..probability.of.dying.by.age.5.per.1000.live.births..male ## [list output truncated] file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 3/47

file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 4/47

Dati sulle abitudini alimentari Le variabili dalla # 4 alla # 52 nei dati riguardano il consumo di una varietà di alimenti misurato in kcal / giorno. Mettiamo queste varibili nell oggetto (data.frame) food. Usiamo questo sottoinsieme dei dati analizzare le abitudini alimentari utilizzando una cluster analysis food=world[,4:52] str(food) ## 'data.frame': 86 obs. of 49 variables: ## $ Energy..kcal.day. : int 2860 2980 3120 3740 2200 2960 3640 2220 ## $ Protein..g.day. : int 96 94 107 111 48 87 92 57 72 83... ## $ Fats..g.day. : int 86 100 134 162 25 99 162 58 58 93... ## $ Carbohydrates..g.day. : num 426 426 372 460 446... ## $ Animal.Products...kcal.day. : int 813 823 1033 1219 65 766 1120 397 344 67 ## $ Animal.Fats..kcal.day. : int 49 72 124 320 5 140 404 73 24 55... ## $ Bovine.Meat..kcal.day. : int 62 342 142 59 5 103 54 113 32 130... ## $ Butter..Ghee..kcal.day. : int 11 28 62 102 3 65 119 1 10 9... ## $ Cheese..kcal.day. : int 50 90 107 193 0 18 165 6 36 2... ## $ Eggs..kcal.day. : int 22 24 23 49 3 47 45 11 13 24... ## $ Fats..Animals..Raw..kcal.day. : int 38 42 61 200 2 71 246 72 12 46... ## $ Fish..Seafood..kcal.day. : int 8 10 30 21 19 24 0 4 7 10... ## $ Freshwater.Fish..kcal.day. : int 1 1 3 6 17 2 0 1 1 4... ## $ Honey..kcal.day. : int 2 1 8 12 0 2 4 0 2 0... ## $ Meat..kcal.day. : int 196 475 492 488 12 295 286 247 84 378.. ## $ Milk...Excluding.Butter..kcal.day. : int 524 222 331 336 22 242 376 47 206 198.. ## $ Milk..Whole..kcal.day. : int 465 127 172 131 21 168 117 36 162 192.. ## $ Mutton...Goat.Meat..kcal.day. : int 33 8 97 6 3 1 13 14 5 3... ## $ Offals..Edible..kcal.day. : int 13 17 30 2 1 15 7 12 7 7... ## $ Pelagic.Fish..kcal.day. : int 5 1 7 9 0 10 0 2 4 1... ## $ Pigmeat..kcal.day. : int 52 34 107 358 0 154 137 49 29 102... ## $ Poultry.Meat..kcal.day. : int 48 83 143 60 3 34 74 66 17 141... ## $ Vegetal.Products...kcal.day. : int 2059 2135 2100 2512 2127 2118 2513 1822 ## $ Alcoholic.Beverages..kcal.day. : int 35 102 152 278 0 108 204 39 165 73... ## $ Apples..kcal.day. : int 17 26 18 67 0 31 21 3 15 4... ## $ Bananas..kcal.day. : int 11 19 20 15 6 3 1 88 16 53... ## $ Beans..kcal.day. : int 28 1 1 3 2 0 6 2 31 162... ## $ Cereals...Excluding.Beer..kcal.day.: int 1299 1050 719 891 1802 1017 783 830 1278 ## $ Coconut.Oil..kcal.day. : int 0 1 16 11 5 7 78 0 0 0... ## $ Coffee..kcal.day. : int 1 1 3 7 0 0 8 3 3 2... ## $ Fruits...Excluding.Wine..kcal.day. : int 129 86 121 168 13 54 72 166 75 112... ## $ Nuts..kcal.day. : int 8 2 31 39 2 12 47 28 12 2... ## $ Olive.Oil..kcal.day. : int 17 2 34 13 0 0 29 0 1 2... ## $ Palm.Oil..kcal.day. : int 0 0 112 15 52 0 61 0 0 20... ## $ Potatoes..kcal.day. : int 57 80 85 113 36 317 151 102 134 28... ## $ Pulses..kcal.day. : int 30 9 9 6 40 0 19 24 46 165... ## $ Rice..Milled.Equivalent...kcal.day.: int 68 41 91 51 1598 52 43 207 12 371... ## $ Rice..Paddy.Equivalent...kcal.day. : int 68 41 91 51 1598 52 43 207 12 371... ## $ Roots...Tuber.Dry.Equiv..kcal.day. : int 57 100 87 113 42 317 151 168 134 129... ## $ Soyabean.Oil..kcal.day. : int 2 43 17 89 48 20 112 11 9 251... file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 5/47

## $ Starchy.Roots..kcal.day. : int 57 100 87 113 42 317 151 168 134 129... ## $ Sugar...Sweeteners..kcal.day. : int 193 406 423 437 59 288 522 282 251 550. ## $ Sugar..Raw.Equivalent...kcal.day. : int 187 337 407 404 29 279 488 280 237 529. ## $ Sugar..Raw.Equivalent..kcal.day. : int 191 405 415 424 59 285 517 282 249 549. ## $ Sugar..Refined.Equiv..kcal.day. : int 187 337 407 404 29 279 488 280 237 529. ## $ Vegetable.Oils..kcal.day. : int 174 311 435 442 131 237 545 207 153 321 ## $ Vegetables..kcal.day. : int 94 51 67 61 10 62 124 46 107 29... ## $ Wheat..kcal.day. : int 1166 914 559 617 180 608 718 387 529 388 ## $ Wine..kcal.day. : int 6 59 39 55 0 6 54 0 3 2... file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 6/47

1 Controllare le variabili Sembra vi siano duplicazioni nei dati: 1. Meat, Bovine.Meat, Pigmeat. C è sovrapposizione di informazioni? 2. Energy, Protein, Fats. Sono sintesi di altre variabili? (ancora, sovrapposizione) 3. Rice.paddy equivalent e Rice.milled equivalent. sembrano la stessa variabile (osservandone i valori) 4. file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 7/47

I dati food ripuliti Eliminiamo da food le variabli ridondanti choose=c(7:14,16:36,38:42,46:49) food=food[,choose] str(food) ## 'data.frame': 86 obs. of 38 variables: ## $ Bovine.Meat..kcal.day. : int 62 342 142 59 5 103 54 113 32 130... ## $ Butter..Ghee..kcal.day. : int 11 28 62 102 3 65 119 1 10 9... ## $ Cheese..kcal.day. : int 50 90 107 193 0 18 165 6 36 2... ## $ Eggs..kcal.day. : int 22 24 23 49 3 47 45 11 13 24... ## $ Fats..Animals..Raw..kcal.day. : int 38 42 61 200 2 71 246 72 12 46... ## $ Fish..Seafood..kcal.day. : int 8 10 30 21 19 24 0 4 7 10... ## $ Freshwater.Fish..kcal.day. : int 1 1 3 6 17 2 0 1 1 4... ## $ Honey..kcal.day. : int 2 1 8 12 0 2 4 0 2 0... ## $ Milk...Excluding.Butter..kcal.day. : int 524 222 331 336 22 242 376 47 206 198.. ## $ Milk..Whole..kcal.day. : int 465 127 172 131 21 168 117 36 162 192.. ## $ Mutton...Goat.Meat..kcal.day. : int 33 8 97 6 3 1 13 14 5 3... ## $ Offals..Edible..kcal.day. : int 13 17 30 2 1 15 7 12 7 7... ## $ Pelagic.Fish..kcal.day. : int 5 1 7 9 0 10 0 2 4 1... ## $ Pigmeat..kcal.day. : int 52 34 107 358 0 154 137 49 29 102... ## $ Poultry.Meat..kcal.day. : int 48 83 143 60 3 34 74 66 17 141... ## $ Vegetal.Products...kcal.day. : int 2059 2135 2100 2512 2127 2118 2513 1822 ## $ Alcoholic.Beverages..kcal.day. : int 35 102 152 278 0 108 204 39 165 73... ## $ Apples..kcal.day. : int 17 26 18 67 0 31 21 3 15 4... ## $ Bananas..kcal.day. : int 11 19 20 15 6 3 1 88 16 53... ## $ Beans..kcal.day. : int 28 1 1 3 2 0 6 2 31 162... ## $ Cereals...Excluding.Beer..kcal.day.: int 1299 1050 719 891 1802 1017 783 830 1278 ## $ Coconut.Oil..kcal.day. : int 0 1 16 11 5 7 78 0 0 0... ## $ Coffee..kcal.day. : int 1 1 3 7 0 0 8 3 3 2... ## $ Fruits...Excluding.Wine..kcal.day. : int 129 86 121 168 13 54 72 166 75 112... ## $ Nuts..kcal.day. : int 8 2 31 39 2 12 47 28 12 2... ## $ Olive.Oil..kcal.day. : int 17 2 34 13 0 0 29 0 1 2... ## $ Palm.Oil..kcal.day. : int 0 0 112 15 52 0 61 0 0 20... ## $ Potatoes..kcal.day. : int 57 80 85 113 36 317 151 102 134 28... ## $ Pulses..kcal.day. : int 30 9 9 6 40 0 19 24 46 165... ## $ Rice..Paddy.Equivalent...kcal.day. : int 68 41 91 51 1598 52 43 207 12 371... ## $ Roots...Tuber.Dry.Equiv..kcal.day. : int 57 100 87 113 42 317 151 168 134 129... ## $ Soyabean.Oil..kcal.day. : int 2 43 17 89 48 20 112 11 9 251... ## $ Starchy.Roots..kcal.day. : int 57 100 87 113 42 317 151 168 134 129... ## $ Sugar...Sweeteners..kcal.day. : int 193 406 423 437 59 288 522 282 251 550. ## $ Vegetable.Oils..kcal.day. : int 174 311 435 442 131 237 545 207 153 321 ## $ Vegetables..kcal.day. : int 94 51 67 61 10 62 124 46 107 29... ## $ Wheat..kcal.day. : int 1166 914 559 617 180 608 718 387 529 388 ## $ Wine..kcal.day. : int 6 59 39 55 0 6 54 0 3 2... file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 8/47

2 Clustering con k-means Proviamo a valutare rapidamente diversi valori di k tracciando i valori di k contro il rapporto SSW / SST st.food=scale(food) # k- means clustering loop ratiowss=vector() for (i in 2:30){ km=kmeans(st.food,i,nstart=50) ratiowss[i]=km$tot.withinss/km$totss} dt=data.frame("k"=2:30,"ratiowss"=ratiowss[2:30]) ggplot(dt,aes(x=k,y=ratiowss))+geom_line(size=3) Non c è chiara indicazione, ci sono forse molti gruppi. file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 9/47

K=7 clusters? Proviamo ad analizzare in dettaglio il caso k=7 per avere più informazioni set.seed(1) km=kmeans(st.food,7,nstart=50) km ## K-means clustering with 7 clusters of sizes 25, 20, 8, 10, 12, 9, 2 ## ## Cluster means: ## Bovine.Meat..kcal.day. Butter..Ghee..kcal.day. Cheese..kcal.day. ## 1 0.32989097 0.97331797 1.3370428 ## 2-0.04616923-0.37423100-0.6170505 ## 3-0.34668805 0.05005711-0.3388631 ## 4-0.80878720-0.71539835-0.7111165 ## 5 0.94149481-0.24355357-0.3200256 ## 6-0.77951292-0.77908292-0.7490346 ## 7-0.37241740-0.08020678-0.3406861 ## Eggs..kcal.day. Fats..Animals..Raw..kcal.day. Fish..Seafood..kcal.day. ## 1 0.86876834 0.9721236 0.4105214 ## 2-0.51299141-0.2773722-0.5053352 ## 3-0.04621414-0.6696613-0.3617156 ## 4 0.07140877-0.5979779 0.9067984 ## 5 0.05902741-0.1011835-0.6136138 ## 6-1.21350123-0.7479965-0.4256301 ## 7-0.79528641 0.2637976 2.4317222 ## Freshwater.Fish..kcal.day. Honey..kcal.day. ## 1 0.1471113 0.78747214 ## 2-0.3871934-0.51401317 ## 3-0.2142599 0.02472802 ## 4 1.3779202-0.66996457 ## 5-0.6813789-0.23519098 ## 6 0.2329817-0.37696498 ## 7-0.9596626 1.65512902 ## Milk...Excluding.Butter..kcal.day. Milk..Whole..kcal.day. ## 1 1.0535298 0.32838020 ## 2-0.4229140-0.08508669 ## 3-0.3021864-0.14812903 ## 4-1.0287812-0.92110764 ## 5 0.5812926 1.22441443 ## 6-1.2292098-1.10512312 ## 7-0.5436417-0.42926379 ## Mutton...Goat.Meat..kcal.day. Offals..Edible..kcal.day. ## 1 0.1696441 0.1493913 ## 2-0.2607088-0.1720812 ## 3 0.2005357-0.1580227 ## 4-0.3949278-0.4204492 ## 5 0.3691971 0.3730788 ## 6-0.3989520-0.6891241 ## 7 1.2391349 3.4503428 ## Pelagic.Fish..kcal.day. Pigmeat..kcal.day. Poultry.Meat..kcal.day. ## 1 0.1889683 1.1434802 0.47555592 ## 2-0.5080530-0.5858025-0.07232933 ## 3-0.3086798-0.8932905 0.63235391 ## 4 0.9676226-0.1354736-0.49943904 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 10/47

## 5-0.4787488-0.1880691-0.36535016 ## 6-0.2798989-0.8142066-1.06658298 ## 7 3.2470704 0.6073976 1.73834829 ## Vegetal.Products...kcal.day. Alcoholic.Beverages..kcal.day. ## 1 0.3033270 1.1694372 ## 2-0.2773750-0.4859975 ## 3 1.5493774-1.0598159 ## 4-0.1677874-0.6173812 ## 5-0.4022675 0.0728212 ## 6-0.7528943-0.6614677 ## 7-0.5747812 0.1078569 ## Apples..kcal.day. Bananas..kcal.day. Beans..kcal.day. ## 1 0.9366070-0.11602552-0.3689216 ## 2-0.7080348 0.29999252 0.9464497 ## 3 0.3971556-0.38524806-0.3513856 ## 4-0.7191422-0.04896357-0.3765610 ## 5 0.3508746-0.45510426-0.3354701 ## 6-0.9264811-0.24824327 0.1709316 ## 7-0.5562331 4.08392411-0.3210015 ## Cereals...Excluding.Beer..kcal.day. Coconut.Oil..kcal.day. ## 1-0.71526346-0.006741647 ## 2-0.03225038-0.063084068 ## 3 1.47838912-0.385689864 ## 4 0.76943553 0.441271471 ## 5 0.29245338-0.468991830 ## 6-0.24036759-0.312485106 ## 7-1.17050324 4.271647325 ## Coffee..kcal.day. Fruits...Excluding.Wine..kcal.day. Nuts..kcal.day. ## 1 1.0535729 0.257912843 0.8367175 ## 2-0.3863959 0.003010899-0.6365760 ## 3-0.5321418 0.944180840 0.9379361 ## 4-0.6195893-0.788933794-0.6365760 ## 5-0.4835598-0.683285121-0.3377103 ## 6-0.6293057-0.285818187-0.4748206 ## 7 1.6540458 2.299818650-0.4991186 ## Olive.Oil..kcal.day. Palm.Oil..kcal.day. Potatoes..kcal.day. ## 1 0.6164991-0.3428790 0.7463559 ## 2-0.3174581 0.2483901-0.4593402 ## 3 0.1664928 0.1818258-0.2292374 ## 4-0.3278656 0.3309446-0.8565629 ## 5-0.2971767-0.6505372 0.8151124 ## 6-0.3287552 0.8299079-0.7908761 ## 7-0.2958424-0.4113025-0.8680144 ## Pulses..kcal.day. Rice..Paddy.Equivalent...kcal.day. ## 1-0.5909183-0.6156692 ## 2 0.9182791 0.1290478 ## 3 0.5015319-0.1319391 ## 4-0.4611223 1.9996180 ## 5-0.7911267-0.6622619 ## 6 0.7783029 0.2516673 ## 7-0.2524306-0.2238785 ## Roots...Tuber.Dry.Equiv..kcal.day. Soyabean.Oil..kcal.day. ## 1-0.1431194 0.1467725 ## 2-0.2107852 0.4359844 ## 3-0.5908237 0.5243360 ## 4-0.4999269-0.0941256 ## 5-0.1047665-0.5604882 ## 6 1.9954806-0.7791927 ## 7 0.4087093-0.9519196 ## Starchy.Roots..kcal.day. Sugar...Sweeteners..kcal.day. ## 1-0.1431194 0.8414865 ## 2-0.2107852 0.2490932 ## 3-0.5908237-0.1025423 ## 4-0.4999269-0.6526871 ## 5-0.1047665-0.2690550 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 11/47

## 6 1.9954806-1.6746178 ## 7 0.4087093-0.1857987 ## Vegetable.Oils..kcal.day. Vegetables..kcal.day. Wheat..kcal.day. ## 1 0.7143137 0.6079541 0.3275307 ## 2-0.2000841-0.7375136-0.5415179 ## 3 0.3354790 1.2793650 1.5343514 ## 4-0.2970543-0.3366515-0.9221915 ## 5-0.4046467 0.2992595 0.9555595 ## 6-0.7252155-0.9854983-1.2817166 ## 7-1.0933747-1.0193077-0.1710360 ## Wine..kcal.day. ## 1 1.10091380 ## 2-0.54867585 ## 3-0.59679222 ## 4-0.59679222 ## 5 0.02471093 ## 6-0.61238549 ## 7-0.29606489 ## ## Clustering vector: ## [1] 5 5 1 1 4 5 1 2 5 2 5 6 1 5 4 2 2 1 1 1 2 2 3 1 6 2 1 1 4 5 1 6 1 2 2 ## [36] 1 1 2 4 3 3 1 2 4 2 2 6 6 6 4 1 2 2 5 3 1 1 6 2 2 2 4 1 1 5 5 7 7 3 4 ## [71] 6 2 1 4 1 1 6 4 2 3 3 5 3 1 1 5 ## ## Within cluster sum of squares by cluster: ## [1] 605.59446 367.94875 135.06515 174.00272 232.60861 163.25349 68.95847 ## (between_ss / total_ss = 45.9 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" ## [5] "tot.withinss" "betweenss" "size" "iter" ## [9] "ifault" file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 12/47

Commenti In cluster means troviamo le coordinate dei centrodi per gruppo. I centroidi sono in uno spazio a 38 dimensioni. I valori sono standardizzati e quindi le coordinate dei centroidi sono espresse in deviazioni standard. Ad esempio il primo gruppo ha un consumo sopra le media di Carne Bovina (0.32 DS), Burro (0.97 DS), Formaggio (1.33 DS) rispetto a tutti gli altri gruppi I valori da analizzare sono troppi, vale la pena effettuare un analisi esplorativa dei risultati file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 13/47

3 EDA dei risultati ( ) k = 7 Per poter effettuare una EDA mettiamo i valori dei centroidi in un data.frame, che chiameremo dt. dt=data.frame(km$centers) dt ## Bovine.Meat..kcal.day. Butter..Ghee..kcal.day. Cheese..kcal.day. ## 1 0.32989097 0.97331797 1.3370428 ## 2-0.04616923-0.37423100-0.6170505 ## 3-0.34668805 0.05005711-0.3388631 ## 4-0.80878720-0.71539835-0.7111165 ## 5 0.94149481-0.24355357-0.3200256 ## 6-0.77951292-0.77908292-0.7490346 ## 7-0.37241740-0.08020678-0.3406861 ## Eggs..kcal.day. Fats..Animals..Raw..kcal.day. Fish..Seafood..kcal.day. ## 1 0.86876834 0.9721236 0.4105214 ## 2-0.51299141-0.2773722-0.5053352 ## 3-0.04621414-0.6696613-0.3617156 ## 4 0.07140877-0.5979779 0.9067984 ## 5 0.05902741-0.1011835-0.6136138 ## 6-1.21350123-0.7479965-0.4256301 ## 7-0.79528641 0.2637976 2.4317222 ## Freshwater.Fish..kcal.day. Honey..kcal.day. ## 1 0.1471113 0.78747214 ## 2-0.3871934-0.51401317 ## 3-0.2142599 0.02472802 ## 4 1.3779202-0.66996457 ## 5-0.6813789-0.23519098 ## 6 0.2329817-0.37696498 ## 7-0.9596626 1.65512902 ## Milk...Excluding.Butter..kcal.day. Milk..Whole..kcal.day. ## 1 1.0535298 0.32838020 ## 2-0.4229140-0.08508669 ## 3-0.3021864-0.14812903 ## 4-1.0287812-0.92110764 ## 5 0.5812926 1.22441443 ## 6-1.2292098-1.10512312 ## 7-0.5436417-0.42926379 ## Mutton...Goat.Meat..kcal.day. Offals..Edible..kcal.day. ## 1 0.1696441 0.1493913 ## 2-0.2607088-0.1720812 ## 3 0.2005357-0.1580227 ## 4-0.3949278-0.4204492 ## 5 0.3691971 0.3730788 ## 6-0.3989520-0.6891241 ## 7 1.2391349 3.4503428 ## Pelagic.Fish..kcal.day. Pigmeat..kcal.day. Poultry.Meat..kcal.day. ## 1 0.1889683 1.1434802 0.47555592 ## 2-0.5080530-0.5858025-0.07232933 ## 3-0.3086798-0.8932905 0.63235391 ## 4 0.9676226-0.1354736-0.49943904 ## 5-0.4787488-0.1880691-0.36535016 ## 6-0.2798989-0.8142066-1.06658298 ## 7 3.2470704 0.6073976 1.73834829 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 14/47

## Vegetal.Products...kcal.day. Alcoholic.Beverages..kcal.day. ## 1 0.3033270 1.1694372 ## 2-0.2773750-0.4859975 ## 3 1.5493774-1.0598159 ## 4-0.1677874-0.6173812 ## 5-0.4022675 0.0728212 ## 6-0.7528943-0.6614677 ## 7-0.5747812 0.1078569 ## Apples..kcal.day. Bananas..kcal.day. Beans..kcal.day. ## 1 0.9366070-0.11602552-0.3689216 ## 2-0.7080348 0.29999252 0.9464497 ## 3 0.3971556-0.38524806-0.3513856 ## 4-0.7191422-0.04896357-0.3765610 ## 5 0.3508746-0.45510426-0.3354701 ## 6-0.9264811-0.24824327 0.1709316 ## 7-0.5562331 4.08392411-0.3210015 ## Cereals...Excluding.Beer..kcal.day. Coconut.Oil..kcal.day. ## 1-0.71526346-0.006741647 ## 2-0.03225038-0.063084068 ## 3 1.47838912-0.385689864 ## 4 0.76943553 0.441271471 ## 5 0.29245338-0.468991830 ## 6-0.24036759-0.312485106 ## 7-1.17050324 4.271647325 ## Coffee..kcal.day. Fruits...Excluding.Wine..kcal.day. Nuts..kcal.day. ## 1 1.0535729 0.257912843 0.8367175 ## 2-0.3863959 0.003010899-0.6365760 ## 3-0.5321418 0.944180840 0.9379361 ## 4-0.6195893-0.788933794-0.6365760 ## 5-0.4835598-0.683285121-0.3377103 ## 6-0.6293057-0.285818187-0.4748206 ## 7 1.6540458 2.299818650-0.4991186 ## Olive.Oil..kcal.day. Palm.Oil..kcal.day. Potatoes..kcal.day. ## 1 0.6164991-0.3428790 0.7463559 ## 2-0.3174581 0.2483901-0.4593402 ## 3 0.1664928 0.1818258-0.2292374 ## 4-0.3278656 0.3309446-0.8565629 ## 5-0.2971767-0.6505372 0.8151124 ## 6-0.3287552 0.8299079-0.7908761 ## 7-0.2958424-0.4113025-0.8680144 ## Pulses..kcal.day. Rice..Paddy.Equivalent...kcal.day. ## 1-0.5909183-0.6156692 ## 2 0.9182791 0.1290478 ## 3 0.5015319-0.1319391 ## 4-0.4611223 1.9996180 ## 5-0.7911267-0.6622619 ## 6 0.7783029 0.2516673 ## 7-0.2524306-0.2238785 ## Roots...Tuber.Dry.Equiv..kcal.day. Soyabean.Oil..kcal.day. ## 1-0.1431194 0.1467725 ## 2-0.2107852 0.4359844 ## 3-0.5908237 0.5243360 ## 4-0.4999269-0.0941256 ## 5-0.1047665-0.5604882 ## 6 1.9954806-0.7791927 ## 7 0.4087093-0.9519196 ## Starchy.Roots..kcal.day. Sugar...Sweeteners..kcal.day. ## 1-0.1431194 0.8414865 ## 2-0.2107852 0.2490932 ## 3-0.5908237-0.1025423 ## 4-0.4999269-0.6526871 ## 5-0.1047665-0.2690550 ## 6 1.9954806-1.6746178 ## 7 0.4087093-0.1857987 ## Vegetable.Oils..kcal.day. Vegetables..kcal.day. Wheat..kcal.day. file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 15/47

## 1 0.7143137 0.6079541 0.3275307 ## 2-0.2000841-0.7375136-0.5415179 ## 3 0.3354790 1.2793650 1.5343514 ## 4-0.2970543-0.3366515-0.9221915 ## 5-0.4046467 0.2992595 0.9555595 ## 6-0.7252155-0.9854983-1.2817166 ## 7-1.0933747-1.0193077-0.1710360 ## Wine..kcal.day. ## 1 1.10091380 ## 2-0.54867585 ## 3-0.59679222 ## 4-0.59679222 ## 5 0.02471093 ## 6-0.61238549 ## 7-0.29606489 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 16/47

EDA - continua Per avere una visione migliore, si transponga il data.frame appena creato. tdt=t(dt) tdt ## 1 2 3 ## Bovine.Meat..kcal.day. 0.329890974-0.046169227-0.34668805 ## Butter..Ghee..kcal.day. 0.973317975-0.374231004 0.05005711 ## Cheese..kcal.day. 1.337042802-0.617050500-0.33886310 ## Eggs..kcal.day. 0.868768338-0.512991406-0.04621414 ## Fats..Animals..Raw..kcal.day. 0.972123552-0.277372209-0.66966126 ## Fish..Seafood..kcal.day. 0.410521440-0.505335175-0.36171562 ## Freshwater.Fish..kcal.day. 0.147111285-0.387193352-0.21425993 ## Honey..kcal.day. 0.787472140-0.514013175 0.02472802 ## Milk...Excluding.Butter..kcal.day. 1.053529776-0.422914031-0.30218635 ## Milk..Whole..kcal.day. 0.328380197-0.085086688-0.14812903 ## Mutton...Goat.Meat..kcal.day. 0.169644062-0.260708836 0.20053573 ## Offals..Edible..kcal.day. 0.149391344-0.172081217-0.15802265 ## Pelagic.Fish..kcal.day. 0.188968255-0.508052983-0.30867979 ## Pigmeat..kcal.day. 1.143480184-0.585802544-0.89329048 ## Poultry.Meat..kcal.day. 0.475555919-0.072329333 0.63235391 ## Vegetal.Products...kcal.day. 0.303327021-0.277374993 1.54937743 ## Alcoholic.Beverages..kcal.day. 1.169437229-0.485997506-1.05981586 ## Apples..kcal.day. 0.936606994-0.708034798 0.39715560 ## Bananas..kcal.day. -0.116025517 0.299992516-0.38524806 ## Beans..kcal.day. -0.368921587 0.946449740-0.35138562 ## Cereals...Excluding.Beer..kcal.day. -0.715263459-0.032250379 1.47838912 ## Coconut.Oil..kcal.day. -0.006741647-0.063084068-0.38568986 ## Coffee..kcal.day. 1.053572931-0.386395940-0.53214178 ## Fruits...Excluding.Wine..kcal.day. 0.257912843 0.003010899 0.94418084 ## Nuts..kcal.day. 0.836717456-0.636575989 0.93793609 ## Olive.Oil..kcal.day. 0.616499079-0.317458076 0.16649281 ## Palm.Oil..kcal.day. -0.342878963 0.248390072 0.18182582 ## Potatoes..kcal.day. 0.746355907-0.459340243-0.22923736 ## Pulses..kcal.day. -0.590918349 0.918279065 0.50153189 ## Rice..Paddy.Equivalent...kcal.day. -0.615669171 0.129047804-0.13193911 ## Roots...Tuber.Dry.Equiv..kcal.day. -0.143119350-0.210785158-0.59082372 ## Soyabean.Oil..kcal.day. 0.146772472 0.435984362 0.52433602 ## Starchy.Roots..kcal.day. -0.143119350-0.210785158-0.59082372 ## Sugar...Sweeteners..kcal.day. 0.841486503 0.249093223-0.10254234 ## Vegetable.Oils..kcal.day. 0.714313713-0.200084147 0.33547902 ## Vegetables..kcal.day. 0.607954119-0.737513573 1.27936499 ## Wheat..kcal.day. 0.327530728-0.541517897 1.53435142 ## Wine..kcal.day. 1.100913805-0.548675851-0.59679222 ## 4 5 6 ## Bovine.Meat..kcal.day. -0.80878720 0.94149481-0.7795129 ## Butter..Ghee..kcal.day. -0.71539835-0.24355357-0.7790829 ## Cheese..kcal.day. -0.71111649-0.32002560-0.7490346 ## Eggs..kcal.day. 0.07140877 0.05902741-1.2135012 ## Fats..Animals..Raw..kcal.day. -0.59797792-0.10118349-0.7479965 ## Fish..Seafood..kcal.day. 0.90679842-0.61361379-0.4256301 ## Freshwater.Fish..kcal.day. 1.37792018-0.68137894 0.2329817 ## Honey..kcal.day. -0.66996457-0.23519098-0.3769650 ## Milk...Excluding.Butter..kcal.day. -1.02878125 0.58129262-1.2292098 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 17/47

## Milk..Whole..kcal.day. -0.92110764 1.22441443-1.1051231 ## Mutton...Goat.Meat..kcal.day. -0.39492781 0.36919714-0.3989520 ## Offals..Edible..kcal.day. -0.42044923 0.37307876-0.6891241 ## Pelagic.Fish..kcal.day. 0.96762261-0.47874879-0.2798989 ## Pigmeat..kcal.day. -0.13547360-0.18806915-0.8142066 ## Poultry.Meat..kcal.day. -0.49943904-0.36535016-1.0665830 ## Vegetal.Products...kcal.day. -0.16778745-0.40226747-0.7528943 ## Alcoholic.Beverages..kcal.day. -0.61738122 0.07282120-0.6614677 ## Apples..kcal.day. -0.71914224 0.35087460-0.9264811 ## Bananas..kcal.day. -0.04896357-0.45510426-0.2482433 ## Beans..kcal.day. -0.37656102-0.33547013 0.1709316 ## Cereals...Excluding.Beer..kcal.day. 0.76943553 0.29245338-0.2403676 ## Coconut.Oil..kcal.day. 0.44127147-0.46899183-0.3124851 ## Coffee..kcal.day. -0.61958928-0.48355983-0.6293057 ## Fruits...Excluding.Wine..kcal.day. -0.78893379-0.68328512-0.2858182 ## Nuts..kcal.day. -0.63657599-0.33771027-0.4748206 ## Olive.Oil..kcal.day. -0.32786562-0.29717670-0.3287552 ## Palm.Oil..kcal.day. 0.33094462-0.65053719 0.8299079 ## Potatoes..kcal.day. -0.85656294 0.81511241-0.7908761 ## Pulses..kcal.day. -0.46112228-0.79112666 0.7783029 ## Rice..Paddy.Equivalent...kcal.day. 1.99961800-0.66226191 0.2516673 ## Roots...Tuber.Dry.Equiv..kcal.day. -0.49992687-0.10476653 1.9954806 ## Soyabean.Oil..kcal.day. -0.09412560-0.56048816-0.7791927 ## Starchy.Roots..kcal.day. -0.49992687-0.10476653 1.9954806 ## Sugar...Sweeteners..kcal.day. -0.65268711-0.26905500-1.6746178 ## Vegetable.Oils..kcal.day. -0.29705432-0.40464666-0.7252155 ## Vegetables..kcal.day. -0.33665152 0.29925947-0.9854983 ## Wheat..kcal.day. -0.92219149 0.95555953-1.2817166 ## Wine..kcal.day. -0.59679222 0.02471093-0.6123855 ## 7 ## Bovine.Meat..kcal.day. -0.37241740 ## Butter..Ghee..kcal.day. -0.08020678 ## Cheese..kcal.day. -0.34068608 ## Eggs..kcal.day. -0.79528641 ## Fats..Animals..Raw..kcal.day. 0.26379762 ## Fish..Seafood..kcal.day. 2.43172224 ## Freshwater.Fish..kcal.day. -0.95966261 ## Honey..kcal.day. 1.65512902 ## Milk...Excluding.Butter..kcal.day. -0.54364171 ## Milk..Whole..kcal.day. -0.42926379 ## Mutton...Goat.Meat..kcal.day. 1.23913493 ## Offals..Edible..kcal.day. 3.45034283 ## Pelagic.Fish..kcal.day. 3.24707045 ## Pigmeat..kcal.day. 0.60739764 ## Poultry.Meat..kcal.day. 1.73834829 ## Vegetal.Products...kcal.day. -0.57478120 ## Alcoholic.Beverages..kcal.day. 0.10785686 ## Apples..kcal.day. -0.55623310 ## Bananas..kcal.day. 4.08392411 ## Beans..kcal.day. -0.32100151 ## Cereals...Excluding.Beer..kcal.day. -1.17050324 ## Coconut.Oil..kcal.day. 4.27164732 ## Coffee..kcal.day. 1.65404578 ## Fruits...Excluding.Wine..kcal.day. 2.29981865 ## Nuts..kcal.day. -0.49911859 ## Olive.Oil..kcal.day. -0.29584240 ## Palm.Oil..kcal.day. -0.41130255 ## Potatoes..kcal.day. -0.86801441 ## Pulses..kcal.day. -0.25243056 ## Rice..Paddy.Equivalent...kcal.day. -0.22387855 ## Roots...Tuber.Dry.Equiv..kcal.day. 0.40870927 ## Soyabean.Oil..kcal.day. -0.95191956 ## Starchy.Roots..kcal.day. 0.40870927 ## Sugar...Sweeteners..kcal.day. -0.18579867 ## Vegetable.Oils..kcal.day. -1.09337472 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 18/47

## Vegetables..kcal.day. -1.01930769 ## Wheat..kcal.day. -0.17103600 ## Wine..kcal.day. -0.29606489 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 19/47

EDA - continua Notiamo che il Gruppo 1 ha un consumo decisamente sopra le media per molti alimenti. L analisi dei valori nel data.frame tdt non è agevole e vale utilizzare strumenti grafici. Può essere inoltre interessante individuare, per ciascun alimento, qual è il gruppo che ne consuma di più. A tal fine si utilizzi il codice sotto maxims=vector() for (i in 1:nrow(tdt)){maxims[i]=which.max(tdt[i,])} # per ogni riga di tdt, il codice sopra indica il numero della colonna # (cioè il gruppo) con il valore più alto file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 20/47

Mettiamo tutte le variabili create in data.frame per analizzarle con gli strumenti grafici tdt=as.data.frame(tdt,keep.rownames = TRUE) tdt$aliments<-row.names(tdt) colnames(tdt)<-c("g1","g2","g3","g4","g5","g6","g7","aliments") world=as.data.frame(world) world$cluster=km$cluster Nota che la colonna che contiene i nomi degli alimenti nel data.frame tdt è la numero 8 Nota che la colonna che contiene i nomi dei Paesi nel data.frame world è la numero 1 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 21/47

Analisi del gruppo 1 Alimenti con consumo più elevato rispetto agli altri gruppi tdt[maxims==1,8] ## [1] "Butter..Ghee..kcal.day." ## [2] "Cheese..kcal.day." ## [3] "Eggs..kcal.day." ## [4] "Fats..Animals..Raw..kcal.day." ## [5] "Milk...Excluding.Butter..kcal.day." ## [6] "Pigmeat..kcal.day." ## [7] "Alcoholic.Beverages..kcal.day." ## [8] "Apples..kcal.day." ## [9] "Olive.Oil..kcal.day." ## [10] "Sugar...Sweeteners..kcal.day." ## [11] "Vegetable.Oils..kcal.day." ## [12] "Wine..kcal.day." Nazioni del gruppo world[km$cluster==1,1] ## [1] Australia Austria ## [3] Belgium Canada ## [5] Cyprus Czech Republic ## [7] Denmark Estonia ## [9] Finland France ## [11] Germany Greece ## [13] Hungary Iceland ## [15] Italy Malta ## [17] Netherlands New Zealand ## [19] Poland Portugal ## [21] Spain Sweden ## [23] Switzerland United Kingdom ## [25] United States of America ## 86 Levels: Albania Argentina Australia Austria Bangladesh... Uzbekistan file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 22/47

Plot degli alimenti a più alto consumo ## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$aliments <- with(tdt, factor(tdt$aliments, levels=tdt[order(-tdt$g1), ]$Aliments)) ggplot(tdt[tdt$g1>0,],aes(aliments,g1,fill=aliments))+geom_bar(stat="identity")+ggtitle("cluster 1: Alimenti file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 23/47

Analisi del gruppo 2 Alimenti con consumo più elevato rispetto agli altri gruppi tdt[maxims==2,8] ## [1] Beans..kcal.day. Pulses..kcal.day. ## 38 Levels: Cheese..kcal.day.... Cereals...Excluding.Beer..kcal.day. Nazioni del gruppo world[km$cluster==2,1] ## [1] Bolivia Brazil Colombia ## [4] Costa Rica Dominican Republic Ecuador ## [7] Fiji Guatemala Haiti ## [10] India Jamaica Kenya ## [13] Lesotho Mauritius Mexico ## [16] Pakistan Paraguay Peru ## [19] South Africa Trinidad and Tobago ## 86 Levels: Albania Argentina Australia Austria Bangladesh... Uzbekistan file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 24/47

Plot degli alimenti a più alto consumo ## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$aliments <- with(tdt, factor(tdt$aliments, levels=tdt[order(-tdt$g2), ]$Aliments)) ggplot(tdt[tdt$g1>0,],aes(aliments,g2,fill=aliments))+geom_bar(stat="identity")+ggtitle("cluster 2: Alimenti file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 25/47

Analisi del gruppo 3 Alimenti con consumo più elevato rispetto agli altri gruppi tdt[maxims==3,8] ## [1] Vegetal.Products...kcal.day. Cereals...Excluding.Beer..kcal.day. ## [3] Nuts..kcal.day. Soyabean.Oil..kcal.day. ## [5] Vegetables..kcal.day. Wheat..kcal.day. ## 38 Levels: Beans..kcal.day. Pulses..kcal.day.... Vegetables..kcal.day. Nazioni del gruppo world[km$cluster==3,1] ## [1] Egypt Iran, Islamic Republic of ## [3] Israel Morocco ## [5] Saudi Arabia Tunisia ## [7] Turkey United Arab Emirates ## 86 Levels: Albania Argentina Australia Austria Bangladesh... Uzbekistan file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 26/47

Plot degli alimenti a più alto consumo tdt$aliments <- with(tdt, factor(tdt$aliments, levels=tdt[order(-tdt$g3), ]$Aliments)) ggplot(tdt[tdt$g1>0,],aes(aliments,g3,fill=aliments))+geom_bar(stat="identity")+ggtitle("cluster 3: Alimenti file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 27/47

Analisi del gruppo 4 Alimenti con consumo più elevato rispetto agli altri gruppi tdt[maxims==4,8] ## [1] Freshwater.Fish..kcal.day. Rice..Paddy.Equivalent...kcal.day. ## 38 Levels: Vegetal.Products...kcal.day.... Alcoholic.Beverages..kcal.day. Nazioni del gruppo world[km$cluster==4,1] ## [1] Bangladesh China Gambia Indonesia Japan ## [6] Malaysia Philippines Senegal Sri Lanka Thailand ## 86 Levels: Albania Argentina Australia Austria Bangladesh... Uzbekistan file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 28/47

Plot degli alimenti a più alto consumo ## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$aliments <- with(tdt, factor(tdt$aliments, levels=tdt[order(-tdt$g4), ]$Aliments)) ggplot(tdt[tdt$g4>0,],aes(aliments,g4,fill=aliments))+geom_bar(stat="identity")+ggtitle("cluster 4: Alimenti file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 29/47

Analisi del gruppo 5 Alimenti con consumo più elevato rispetto agli altri gruppi tdt[maxims==5,8] ## [1] Bovine.Meat..kcal.day. Milk..Whole..kcal.day. Potatoes..kcal.day. ## 38 Levels: Rice..Paddy.Equivalent...kcal.day.... Nazioni del gruppo world[km$cluster==5,1] ## [1] Albania Argentina Belarus ## [4] Bosnia and Herzegovina Bulgaria Chile ## [7] Georgia Mongolia Romania ## [10] Russian Federation Ukraine Uzbekistan ## 86 Levels: Albania Argentina Australia Austria Bangladesh... Uzbekistan file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 30/47

Plot degli alimenti a più alto consumo ## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$aliments <- with(tdt, factor(tdt$aliments, levels=tdt[order(-tdt$g5), ]$Aliments)) ggplot(tdt[tdt$g5>0,],aes(aliments,g5,fill=aliments))+geom_bar(stat="identity")+ggtitle("cluster 5: Alimenti file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 31/47

Analisi del gruppo 6 Alimenti con consumo più elevato rispetto agli altri gruppi tdt[maxims==6,8] ## [1] Palm.Oil..kcal.day. Roots...Tuber.Dry.Equiv..kcal.day. ## [3] Starchy.Roots..kcal.day. ## 38 Levels: Milk..Whole..kcal.day.... Pulses..kcal.day. Nazioni del gruppo world[km$cluster==6,1] ## [1] Cameroon Ethiopia Ghana Liberia Madagascar ## [6] Malawi Nigeria Sierra Leone Tanzania ## 86 Levels: Albania Argentina Australia Austria Bangladesh... Uzbekistan file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 32/47

Plot degli alimenti a più alto consumo ## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$aliments <- with(tdt, factor(tdt$aliments, levels=tdt[order(-tdt$g6), ]$Aliments)) ggplot(tdt[tdt$g6>0,],aes(aliments,g6,fill=aliments))+geom_bar(stat="identity")+ggtitle("cluster 6: Alimenti file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 33/47

Analisi del gruppo 7 Alimenti con consumo più elevato rispetto agli altri gruppi tdt[maxims==7,8] ## [1] Fish..Seafood..kcal.day. Honey..kcal.day. ## [3] Mutton...Goat.Meat..kcal.day. Offals..Edible..kcal.day. ## [5] Pelagic.Fish..kcal.day. Poultry.Meat..kcal.day. ## [7] Bananas..kcal.day. Coconut.Oil..kcal.day. ## [9] Coffee..kcal.day. Fruits...Excluding.Wine..kcal.day. ## 38 Levels: Roots...Tuber.Dry.Equiv..kcal.day.... Sugar...Sweeteners..kcal.day. Nazioni del gruppo world[km$cluster==7,1] ## [1] Saint Lucia Samoa ## 86 Levels: Albania Argentina Australia Austria Bangladesh... Uzbekistan file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 34/47

Plot degli alimenti a più alto consumo ## ordina gli alimenti in base ai valori del centroide nel gruppo e plotta tdt$aliments <- with(tdt, factor(tdt$aliments, levels=tdt[order(-tdt$g7), ]$Aliments)) ggplot(tdt[tdt$g7>0,],aes(aliments,g7,fill=aliments))+geom_bar(stat="identity")+ggtitle("cluster 7: Alimenti file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 35/47

World map library(rworldmap) jd=joincountrydata2map(world,joincode="name",namejoincolumn = "Countries") FALSE 86 codes from your data successfully matched countries in the map FALSE 0 codes from your data failed to match with a country code in the map FALSE 157 codes from the map weren't represented in your data mapcountrydata(jd, namecolumntoplot="cluster", catmethod="categorical",colourpalette = "rainbow",addlegend = file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 36/47

Commenti La mappa individua correttamente, sembra, diverse aree mondiali a dieta simile. Si potrebbe provare a ripetere l analisi con un numero maggiore di gruppi per vedere se aumenta la precisione della rappresentazione delle diverse diete nel mondo (ad esempio, in questa soluzione, non si nota la presenza di una dieta mediterranea ) Se questa soluzione è ritenuta soddisfacente, si potrebbe analizzare ulteriormente i gruppi attraverso le CP e provare ad individuare caratteristiche generali delle diete (in relazione ad esempio, all apporto calorico, i grassi etc.) file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 37/47

4 Clustering gerarchico: legame singolo (min) Proviamo ad utilizzare un clustering gerarchico, confrontandolo con l algoritmo k-means, e confrontando tra loro diversi metodi di aggregazione. rownames(st.food)<-world$countries distance=dist(st.food) hc=hclust(distance,method="single") file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 38/47

Dendrogramma plot(hc) rect.hclust(hc, k = 8) file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 39/47

Ottenere i gruppi # get cluster IDs groups <- cutree(hc, k = 8) Groups=as.data.table(groups,keep.rownames = TRUE) setnames(groups,"rn","counties") Groups[groups==1,] ## Counties groups ## 1: Albania 1 ## 2: Australia 1 ## 3: Austria 1 ## 4: Bangladesh 1 ## 5: Belarus 1 ## 6: Belgium 1 ## 7: Bolivia 1 ## 8: Bosnia and Herzegovina 1 ## 9: Brazil 1 ## 10: Bulgaria 1 ## 11: Cameroon 1 ## 12: Canada 1 ## 13: Chile 1 ## 14: Colombia 1 ## 15: Costa Rica 1 ## 16: Cyprus 1 ## 17: Czech Republic 1 ## 18: Denmark 1 ## 19: Dominican Republic 1 ## 20: Ecuador 1 ## 21: Egypt 1 ## 22: Estonia 1 ## 23: Ethiopia 1 ## 24: Fiji 1 ## 25: Finland 1 ## 26: France 1 ## 27: Gambia 1 ## 28: Georgia 1 ## 29: Germany 1 ## 30: Greece 1 ## 31: Guatemala 1 ## 32: Haiti 1 ## 33: Hungary 1 ## 34: Iceland 1 ## 35: India 1 ## 36: Indonesia 1 ## 37: Iran, Islamic Republic of 1 ## 38: Israel 1 ## 39: Italy 1 ## 40: Jamaica 1 ## 41: Japan 1 ## 42: Kenya 1 ## 43: Lesotho 1 ## 44: Liberia 1 ## 45: Madagascar 1 ## 46: Malawi 1 ## 47: Malaysia 1 ## 48: Malta 1 ## 49: Mauritius 1 ## 50: Mexico 1 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 40/47

## 51: Morocco 1 ## 52: Netherlands 1 ## 53: Nigeria 1 ## 54: Pakistan 1 ## 55: Paraguay 1 ## 56: Peru 1 ## 57: Philippines 1 ## 58: Poland 1 ## 59: Portugal 1 ## 60: Romania 1 ## 61: Russian Federation 1 ## 62: Saudi Arabia 1 ## 63: Senegal 1 ## 64: Sierra Leone 1 ## 65: South Africa 1 ## 66: Spain 1 ## 67: Sri Lanka 1 ## 68: Sweden 1 ## 69: Switzerland 1 ## 70: Tanzania 1 ## 71: Thailand 1 ## 72: Trinidad and Tobago 1 ## 73: Tunisia 1 ## 74: Turkey 1 ## 75: Ukraine 1 ## 76: United Arab Emirates 1 ## 77: United Kingdom 1 ## 78: United States of America 1 ## 79: Uzbekistan 1 ## Counties groups file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 41/47

Clustering gerarchico: metodo di Ward rownames(st.food)<-world$countries distance=dist(st.food) hc=hclust(distance,method="ward.d2") file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 42/47

Dendrogramma plot(hc) rect.hclust(hc, k = 8) file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 43/47

Ottenere i gruppi # get cluster IDs groups <- cutree(hc, k = 8) Groups=as.data.table(groups,keep.rownames = TRUE) setnames(groups,"rn","country") Groups[groups==1,] ## Country groups ## 1: Albania 1 ## 2: Argentina 1 ## 3: Bolivia 1 ## 4: Bosnia and Herzegovina 1 ## 5: Brazil 1 ## 6: Bulgaria 1 ## 7: Chile 1 ## 8: Colombia 1 ## 9: Costa Rica 1 ## 10: Dominican Republic 1 ## 11: Ecuador 1 ## 12: Fiji 1 ## 13: Georgia 1 ## 14: India 1 ## 15: Jamaica 1 ## 16: Mauritius 1 ## 17: Mexico 1 ## 18: Mongolia 1 ## 19: Pakistan 1 ## 20: Paraguay 1 ## 21: Saudi Arabia 1 ## 22: South Africa 1 ## 23: Trinidad and Tobago 1 ## 24: Uzbekistan 1 ## Country groups file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 44/47

Gruppo 2 Groups[groups==2,] ## Country groups ## 1: Australia 2 ## 2: Belarus 2 ## 3: Canada 2 ## 4: Cyprus 2 ## 5: Czech Republic 2 ## 6: Estonia 2 ## 7: Finland 2 ## 8: Iceland 2 ## 9: Israel 2 ## 10: Malta 2 ## 11: New Zealand 2 ## 12: Poland 2 ## 13: Romania 2 ## 14: Russian Federation 2 ## 15: Sweden 2 ## 16: Ukraine 2 ## 17: United Arab Emirates 2 ## 18: United Kingdom 2 ## 19: United States of America 2 file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 45/47

World map world=data.frame(world,"cluster.ward"=groups) jd=joincountrydata2map(world,joincode="name",namejoincolumn = "Countries") FALSE 86 codes from your data successfully matched countries in the map FALSE 0 codes from your data failed to match with a country code in the map FALSE 157 codes from the map weren't represented in your data mapcountrydata(jd, namecolumntoplot="cluster.ward", catmethod="categorical",colourpalette = "rainbow",addleg file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 46/47

Commenti L utilizzo del legame singolo non sembra appropriato per il problema in esame. Il metodo di Ward fornisce una risposta molto simile a k-means. A differenza di k-means, sembra individuare una dieta mediterranea. Se vogliamo utilizzare k-means vale la pena utilizzare più di 7 gruppi. Il clustering gerarchico ci permette di variare il numero di gruppi individuati tagliando (con linee orizzontali) opportunamente il dendrogramma L algortmo k-means, fornendo i centroidi dei gruppi permette un analisi più approfondita delle loro caratteristiche file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/3%20sqg/labs/l-10_cluster_analysis_food-df.html#(1) 47/47