Capstone Project - The Battle of the Neighborhoods: Open a Brazilian Restaurant in Miami

by Anderson Braz

Step 01 - Install and Import Libraries Necessary

In [1]:
## My Installs
!pip install BeautifulSoup4
!pip install lxml
!pip install geocoder
!pip install seaborn
!pip install wordcloud
!pip install sklearn --upgrade
!pip install folium --upgrade

## My Imports
import pandas as pd
from bs4 import BeautifulSoup
import requests ## Used to make requisitions
import geocoder ## Used to capture latitude e longitude from location
import numpy as np
import seaborn as sns
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import folium

import warnings
warnings.filterwarnings('ignore')
Collecting BeautifulSoup4
  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
     |████████████████████████████████| 122kB 10.5MB/s eta 0:00:01
Collecting soupsieve>1.2; python_version >= "3.0" (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/02/fb/1c65691a9aeb7bd6ac2aa505b84cb8b49ac29c976411c6ab3659425e045f/soupsieve-2.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.3 soupsieve-2.1
Collecting lxml
  Downloading https://files.pythonhosted.org/packages/bd/78/56a7c88a57d0d14945472535d0df9fb4bbad7d34ede658ec7961635c790e/lxml-4.6.2-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
     |████████████████████████████████| 5.5MB 6.3MB/s eta 0:00:01�███████████████████████▏| 5.4MB 6.3MB/s eta 0:00:01
Installing collected packages: lxml
Successfully installed lxml-4.6.2
Collecting geocoder
  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
     |████████████████████████████████| 102kB 7.2MB/s ta 0:00:011
Requirement already satisfied: click in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geocoder) (7.1.2)
Requirement already satisfied: six in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geocoder) (1.15.0)
Requirement already satisfied: requests in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geocoder) (2.25.0)
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Requirement already satisfied: future in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geocoder) (0.18.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->geocoder) (3.0.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->geocoder) (1.25.11)
Requirement already satisfied: certifi>=2017.4.17 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->geocoder) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->geocoder) (2.10)
Requirement already satisfied: decorator in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from ratelim->geocoder) (4.4.2)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Requirement already satisfied: seaborn in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (0.9.0)
Requirement already satisfied: scipy>=0.14.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from seaborn) (1.5.4)
Requirement already satisfied: numpy>=1.9.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from seaborn) (1.19.4)
Requirement already satisfied: pandas>=0.15.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from seaborn) (1.1.5)
Requirement already satisfied: matplotlib>=1.4.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from seaborn) (3.3.3)
Requirement already satisfied: pytz>=2017.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn) (2020.4)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (2.4.7)
Requirement already satisfied: pillow>=6.2.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (8.0.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn) (0.10.0)
Requirement already satisfied: six>=1.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas>=0.15.2->seaborn) (1.15.0)
Collecting wordcloud
  Downloading https://files.pythonhosted.org/packages/05/e7/52e4bef8e2e3499f6e96cc8ff7e0902a40b95014143b062acde4ff8b9fc8/wordcloud-1.8.1-cp36-cp36m-manylinux1_x86_64.whl (366kB)
     |████████████████████████████████| 368kB 8.0MB/s eta 0:00:01
Requirement already satisfied: pillow in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from wordcloud) (8.0.1)
Requirement already satisfied: matplotlib in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from wordcloud) (3.3.3)
Requirement already satisfied: numpy>=1.6.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from wordcloud) (1.19.4)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib->wordcloud) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib->wordcloud) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib->wordcloud) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib->wordcloud) (0.10.0)
Requirement already satisfied: six>=1.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from python-dateutil>=2.1->matplotlib->wordcloud) (1.15.0)
Installing collected packages: wordcloud
Successfully installed wordcloud-1.8.1
Collecting sklearn
  Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Requirement already satisfied, skipping upgrade: scikit-learn in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from sklearn) (0.20.1)
Requirement already satisfied, skipping upgrade: numpy>=1.8.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from scikit-learn->sklearn) (1.19.4)
Requirement already satisfied, skipping upgrade: scipy>=0.13.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from scikit-learn->sklearn) (1.5.4)
Building wheels for collected packages: sklearn
  Building wheel for sklearn (setup.py) ... done
  Stored in directory: /home/jupyterlab/.cache/pip/wheels/76/03/bb/589d421d27431bcd2c6da284d5f2286c8e3b2ea3cf1594c074
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0
Collecting folium
  Downloading https://files.pythonhosted.org/packages/c3/83/e8cb37afc2f016a1cf4caab8d22caf7fe4156c4c15230d8abc9c83547e0c/folium-0.12.1-py2.py3-none-any.whl (94kB)
     |████████████████████████████████| 102kB 7.3MB/s ta 0:00:011
Requirement already satisfied, skipping upgrade: branca>=0.3.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from folium) (0.4.1)
Requirement already satisfied, skipping upgrade: requests in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from folium) (2.25.0)
Requirement already satisfied, skipping upgrade: numpy in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from folium) (1.19.4)
Requirement already satisfied, skipping upgrade: jinja2>=2.9 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from folium) (2.11.2)
Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->folium) (3.0.4)
Requirement already satisfied, skipping upgrade: urllib3<1.27,>=1.21.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->folium) (1.25.11)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->folium) (2020.12.5)
Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->folium) (2.10)
Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from jinja2>=2.9->folium) (1.1.1)
Installing collected packages: folium
  Found existing installation: folium 0.5.0
    Uninstalling folium-0.5.0:
      Successfully uninstalled folium-0.5.0
Successfully installed folium-0.12.1

Step 02 - Preparation of Parameters and Useful Functions

In [2]:
# PARAMETERS - API FOURSQUARE

CLIENT_ID = 'HIDE-MY-CODE'
CLIENT_SECRET = 'HIDE-MY-CODE'
VERSION = '20180605'

# PARAMETERS - INIT FOLIUM

LOCATION_SPEC = '{}, FLORIDA, Miami City, EUA'
LATITUDE_MIAMI = 25.7825453
LONGITUDE_MIAMI = -80.2994988
In [3]:
## Function capture Latitude and Longitue from Neighborhood

def get_latlng (neighborhood):
  
  lat_lng_coords = None 
  while (lat_lng_coords is None): 
      g = geocoder.arcgis (LOCATION_SPEC.format (neighborhood)) 
      lat_lng_coords = g.latlng 
  return lat_lng_coords

## Function capture details venue with API Foursquare

def get_venues(lat,lng):
    
  #set variables
  radius = 3000
  LIMIT = 100
  
  #url to fetch data from foursquare api
  url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
          CLIENT_ID, 
          CLIENT_SECRET, 
          VERSION, 
          lat, 
          lng, 
          radius, 
          LIMIT)
  
  # get all the data
  results = requests.get(url).json()
  venue_data = results["response"]['groups'][0]['items']
  venue_details = []
  for row in venue_data:
      try:
          venue_id = row['venue']['id']
          venue_name = row['venue']['name']
          venue_category = row['venue']['categories'][0]['name']
          venue_origin = row['venue']['categories'][0]['shortName']
          venue_latitude = row['venue']['location']['lat']
          venue_longitude = row['venue']['location']['lng']
          venue_details.append([venue_id, venue_name, venue_category, venue_origin, venue_latitude, venue_longitude])
      except KeyError:
          pass
      
  column_names=['ID', 'Name', 'Category', 'Origin', 'Latitude', 'Longitude']
  df = pd.DataFrame(venue_details, columns = column_names)
  print("done!")
  return df

def get_info(id):

  #url to fetch data from foursquare api
  url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&v={}&client_secret={}'.format(
          id,
          CLIENT_ID, 
          VERSION, 
          CLIENT_SECRET)

  # get all the data
  result = requests.get(url).json()
  venue_info = result['response']

  return venue_info
  
def return_most_common_venues(row, num_top_venues):

  row_categories = row.iloc[1:]
  row_categories_sorted = row_categories.sort_values(ascending = False)

  return row_categories_sorted.index.values[0:num_top_venues]

Step 03 - Scraping Source Data

In [4]:
response = requests.get('https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Miami')
soup = BeautifulSoup(response.content, 'lxml')
table = soup.find_all('table')[0]
data = pd.read_html(str(table))

source_table = pd.DataFrame(data[0])
source_table
Out[4]:
Neighborhood Demonym Population2010 Population/Km² Sub-neighborhoods Coordinates
0 Allapattah NaN 54289 4401 NaN 25.815-80.224
1 Arts & Entertainment District NaN 11033 7948 NaN 25.799-80.190
2 Brickell Brickellite 31759 14541 West Brickell 25.758-80.193
3 Buena Vista NaN 9058 3540 Buena Vista East Historic District and Design ... 25.813-80.192
4 Coconut Grove Grovite 20076 3091 Center Grove, Northeast Coconut Grove, Southwe... 25.712-80.257
5 Coral Way NaN 35062 4496 Coral Gate, Golden Pines, Shenandoah, Historic... 25.750-80.283
6 Design District NaN 3573 3623 NaN 25.813-80.193
7 Downtown Downtowner 71,000 (13,635 CBD only) 10613 Brickell, Central Business District (CBD), Dow... 25.774-80.193
8 Edgewater NaN 15005 6675 NaN 25.802-80.190
9 Flagami NaN 50834 5665 Alameda, Grapeland Heights, and Fairlawn 25.762-80.316
10 Grapeland Heights NaN 14004 4130 NaN 25.792-80.258
11 Health District NaN 2705 2148 NaN NaN
12 Liberty City NaN 19725 3733 NaN 25.832-80.225
13 Little Haiti NaN 29760 3840 Lemon City (aka Little River) 25.824-80.191
14 Little Havana NaN 76163 8423 Riverside and South River Drive Historic District 25.773-80.215
15 Lummus Park NaN 3027 3680 NaN 25.777-80.201
16 Midtown Midtowner - - Edgewater and Wynwood 25.807-80.193
17 Overtown Towner 6736 3405 Spring Garden 25.787-80.201
18 Park West NaN 4655 3635 NaN 25.785-80.193
19 The Roads NaN 7327 4899 NaN 25.756-80.207
20 Upper Eastside Upper Eastsider 12525 2513 Bay Point Estates, Bayside District, Belle Mea... 25.830-80.183
21 Venetian Islands NaN NaN NaN Biscayne Island and San Marco Island 25.791-80.161
22 Virginia Key NaN 14 - NaN 25.736-80.155
23 West Flagler NaN 31407 4428 NaN 25.775-80.243
24 Wynwood Wynwoodian 7277 2983 Wynwood Art District and Wynwood Fashion District 25.804-80.199
25 Miami Miamian 399457 4687 NaN NaN

Step 04 - Cleaning and Preparing Datas

IMPORTANT: It is in this stage where we will most work and spend time.

04.1 - Isolating only names in a list

In [5]:
unwanted = {'Downtown', 'Edgewater', 'Health District', 'Liberty City', 'Miami'} ## Remove Possible Trash 
list_neighborhood = source_table['Neighborhood']
list_neighborhood = [e for e in list_neighborhood if e not in unwanted]

list_neighborhood
Out[5]:
['Allapattah',
 'Arts & Entertainment District',
 'Brickell',
 'Buena Vista',
 'Coconut Grove',
 'Coral Way',
 'Design District',
 'Flagami',
 'Grapeland Heights',
 'Little Haiti',
 'Little Havana',
 'Lummus Park',
 'Midtown',
 'Overtown',
 'Park West',
 'The Roads',
 'Upper Eastside',
 'Venetian Islands',
 'Virginia Key',
 'West Flagler',
 'Wynwood']

04.2 - Preparing, Cleaning and Sorting with DataFrame (Neighborhood, Latitude, Longitude and Details)

In [6]:
columns_names = ['Neighborhood', 'Latitude', 'Longitude']
neighborhood_miami = pd.DataFrame(columns = columns_names)

for row in list_neighborhood:
  Neighborhood = row
  coords = get_latlng(Neighborhood)
  neighborhood_miami = neighborhood_miami.append({'Neighborhood':Neighborhood, 'Latitude':coords[0], 'Longitude':coords[1]}, ignore_index = True)

neighborhood_miami
Out[6]:
Neighborhood Latitude Longitude
0 Allapattah 25.814800 -80.224130
1 Arts & Entertainment District 25.774810 -80.197730
2 Brickell 25.774810 -80.197730
3 Buena Vista 25.817840 -80.192570
4 Coconut Grove 25.732330 -80.254140
5 Coral Way 25.742701 -80.440205
6 Design District 25.812780 -80.192110
7 Flagami 25.762710 -80.315870
8 Grapeland Heights 25.788883 -80.253423
9 Little Haiti 25.828640 -80.198330
10 Little Havana 25.768060 -80.233060
11 Lummus Park 25.776780 -80.201240
12 Midtown 25.808990 -80.191506
13 Overtown 25.788050 -80.200910
14 Park West 25.773335 -80.330710
15 The Roads 25.755560 -80.205000
16 Upper Eastside 25.840880 -80.176360
17 Venetian Islands 25.790730 -80.162640
18 Virginia Key 25.744807 -80.145014
19 West Flagler 25.768130 -80.385446
20 Wynwood 25.774810 -80.197730

04.3 - Searching for Restaurants in The Areas

In [7]:
target = 'Restaurant'
columns_names = ['Neighborhood', 'ID', 'Name', 'Latitude', 'Longitude']
restaurants_miami = pd.DataFrame(columns = columns_names)
count=1

for row in neighborhood_miami.values.tolist():
  Neighborhood, Latitude, Longitude = row
  venues = get_venues(Latitude, Longitude)
  ##restaurants = venues[venues['Category'] == target]
  restaurants = venues[venues['Category'].str.contains(target)]
  print('(',count,'/',len(neighborhood_miami),')', 
        target + ' in ' + 
        Neighborhood + ': ' + 
        str(len(restaurants)))
  
  for detail in restaurants.values.tolist():
    id, name, category, food, lat, lng = detail
    restaurants_miami = restaurants_miami.append({
        'Neighborhood':Neighborhood, 'ID':id, 'Name':name, 
        'Category':category, 'Food':food, 
        'Latitude':lat, 'Longitude':lng}, ignore_index = True)
  count += 1
done!
( 1 / 21 ) Restaurant in Allapattah: 25
done!
( 2 / 21 ) Restaurant in Arts & Entertainment District: 39
done!
( 3 / 21 ) Restaurant in Brickell: 39
done!
( 4 / 21 ) Restaurant in Buena Vista: 30
done!
( 5 / 21 ) Restaurant in Coconut Grove: 34
done!
( 6 / 21 ) Restaurant in Coral Way: 22
done!
( 7 / 21 ) Restaurant in Design District: 28
done!
( 8 / 21 ) Restaurant in Flagami: 33
done!
( 9 / 21 ) Restaurant in Grapeland Heights: 29
done!
( 10 / 21 ) Restaurant in Little Haiti: 33
done!
( 11 / 21 ) Restaurant in Little Havana: 51
done!
( 12 / 21 ) Restaurant in Lummus Park: 38
done!
( 13 / 21 ) Restaurant in Midtown: 27
done!
( 14 / 21 ) Restaurant in Overtown: 30
done!
( 15 / 21 ) Restaurant in Park West: 39
done!
( 16 / 21 ) Restaurant in The Roads: 42
done!
( 17 / 21 ) Restaurant in Upper Eastside: 38
done!
( 18 / 21 ) Restaurant in Venetian Islands: 23
done!
( 19 / 21 ) Restaurant in Virginia Key: 13
done!
( 20 / 21 ) Restaurant in West Flagler: 19
done!
( 21 / 21 ) Restaurant in Wynwood: 39

04.4 - Checking Datas - All Restaurants

In [8]:
restaurants_miami = restaurants_miami.drop_duplicates(subset=['ID']).reset_index(drop = True) ## Remove items ID duplicates and reset index
restaurants_miami
Out[8]:
Neighborhood ID Name Latitude Longitude Category Food
0 Allapattah 4b80657df964a5207b6e30e3 Plaza Seafood Market 25.805638 -80.223992 Seafood Restaurant Seafood
1 Allapattah 4e4e169ebd4101d0d7a1e826 Snappers Fish & Chicken 25.824110 -80.224870 Seafood Restaurant Seafood
2 Allapattah 4b59e8fff964a5202ba028e3 Papo Llega y Pon 25.803466 -80.223886 Cuban Restaurant Cuban
3 Allapattah 5b0dbd3b237dee002c0ad551 Love Life Cafe 25.801454 -80.203389 Vegetarian / Vegan Restaurant Vegetarian / Vegan
4 Allapattah 56a93d4d498e2efdf0de09e5 KYU 25.800933 -80.200203 Asian Restaurant Asian
... ... ... ... ... ... ... ...
376 West Flagler 5647b92b498ebb7b54eaeee5 Dr. Limon Ceviche Bar - FIU 25.760445 -80.367000 Peruvian Restaurant Peruvian
377 West Flagler 4cfa691bfeec6dcbcaae3f36 El Centroamericano #2 25.766977 -80.368508 Latin American Restaurant Latin American
378 West Flagler 4b5f3558f964a52066ad29e3 La Carreta 25.785594 -80.368177 Cuban Restaurant Cuban
379 West Flagler 4b9d39f9f964a5203d9b36e3 Jardines De Confucio 25.775368 -80.368472 Chinese Restaurant Chinese
380 West Flagler 4bd71a88cfa7b713a2a928da Polo Norte - Flagler St. 25.768401 -80.364417 Cuban Restaurant Cuban

381 rows × 7 columns

04.5 - Show Map - All Restaurants

In [9]:
MIAMI_COORDINATES = (LATITUDE_MIAMI, LONGITUDE_MIAMI)
map_miami = folium.Map(location = MIAMI_COORDINATES, zoom_start = 11)

locations = restaurants_miami[['Latitude', 'Longitude']]
location_list = locations.values.tolist()

print("Number of point(s): " + str(len(location_list)))

for i in range(0, len(location_list)):

  point_name = restaurants_miami['Name'][i]
  point_yard = restaurants_miami['Neighborhood'][i]
  point_category = restaurants_miami['Category'][i]
  point_detail = '<b>' + point_name + '</b><br /><i>' + point_category + ' in '  + point_yard + '</i>'
  
  label = '{}'.format(point_detail)

  if (point_category == 'Brazilian Restaurant'):
    folium.Marker(
        location_list[i], 
        popup = point_detail, 
        tooltip = point_detail, 
        icon = folium.Icon(color='red')).add_to(map_miami)
  else:
    folium.Marker(
        location_list[i], 
        popup = point_detail, 
        tooltip = point_detail).add_to(map_miami)

folium.Circle([LATITUDE_MIAMI, LONGITUDE_MIAMI], radius = 20000).add_to(map_miami)
display(map_miami)
Number of point(s): 381
Make this Notebook Trusted to load map: File -> Trust Notebook

04.6 - Checking Datas - Only Brazilian Restaurant

In [10]:
br_restaurants_miami = restaurants_miami[restaurants_miami['Category'].str.contains('Brazilian Restaurant')]
br_restaurants_miami = br_restaurants_miami.drop_duplicates(subset=['ID']).reset_index(drop = True) ## Remove items ID duplicates and reset index
br_restaurants_miami
Out[10]:
Neighborhood ID Name Latitude Longitude Category Food
0 Arts & Entertainment District 514b7bb8e4b0476ac82afd55 Steak Brasil 25.772831 -80.192034 Brazilian Restaurant Brazilian
1 Upper Eastside 4ae2706ef964a5204d8e21e3 Boteco Miami 25.847920 -80.177723 Brazilian Restaurant Brazilian

04.7 - Show Map - Brazilian Restaurants

In [11]:
map_miami_br = folium.Map(location = MIAMI_COORDINATES, zoom_start = 11)

locations = br_restaurants_miami[['Latitude', 'Longitude']]
location_list = locations.values.tolist()

print("Number of point(s): " + str(len(location_list)))

for j in range(0, len(location_list)):

  point_name = br_restaurants_miami['Name'][j]
  point_yard = br_restaurants_miami['Neighborhood'][j]
  point_category = br_restaurants_miami['Category'][j]
  point_detail = '<b>' + point_name + '</b><br /><i>' + point_category + ' in '  + point_yard + '</i>'

  folium.Marker(
      location_list[j], 
      popup = point_detail, 
      tooltip = point_detail, 
      icon = folium.Icon(color='green')).add_to(map_miami_br)
  
folium.Circle([LATITUDE_MIAMI, LONGITUDE_MIAMI], radius = 20000).add_to(map_miami_br)
display(map_miami_br)
Number of point(s): 2
Make this Notebook Trusted to load map: File -> Trust Notebook

04.6 - Checking Datas - Categories Counter

In [12]:
restaurants_category = restaurants_miami.groupby('Food').size().reset_index(name='Count')
restaurants_category = restaurants_category.sort_values(by='Count', ascending = False).reset_index(drop = True) ## Remove items ID duplicates and reset index
restaurants_category
Out[12]:
Food Count
0 Italian 42
1 Latin American 38
2 Cuban 38
3 Restaurant 28
4 Seafood 28
5 Mexican 24
6 American 23
7 Spanish 14
8 Sushi 13
9 Fast Food 12
10 Japanese 10
11 Chinese 9
12 Asian 9
13 New American 9
14 Argentinian 8
15 Tapas 7
16 Peruvian 7
17 South American 7
18 Mediterranean 6
19 Caribbean 6
20 French 5
21 Vegetarian / Vegan 5
22 Thai 4
23 Tex-Mex 4
24 Arepas 3
25 Vietnamese 3
26 Portuguese 3
27 Korean 3
28 Middle Eastern 2
29 Greek 2
30 Brazilian 2
31 Indian 2
32 Venezuelan 2
33 Empanada 1
34 Comfort Food 1
35 Southern / Soul 1

04.7 - Checking Datas - Categories Counter by Neighborhood

In [13]:
restaurants_neighborhood = restaurants_miami.groupby(['Neighborhood', 'Food']).size().reset_index(name='Count')
restaurants_neighborhood = restaurants_neighborhood.sort_values(by='Food', ascending = False).reset_index(drop = True)
restaurants_neighborhood
Out[13]:
Neighborhood Food Count
0 West Flagler Vietnamese 1
1 Coconut Grove Vietnamese 1
2 Little Haiti Vietnamese 1
3 Little Havana Venezuelan 1
4 Little Haiti Venezuelan 1
... ... ... ...
203 Overtown American 1
204 Little Haiti American 1
205 Coconut Grove American 5
206 Buena Vista American 1
207 Allapattah American 1

208 rows × 3 columns

04.7 - Summary

In [14]:
 print('There are {} uniques categories.'.format(len(restaurants_miami['Category'].unique())))
 print('There are {} restaurants.'.format(len(restaurants_miami['ID'].unique())))
 print('There are {} restaurants brazilizan.'.format(len(br_restaurants_miami['ID'].unique())))
There are 36 uniques categories.
There are 381 restaurants.
There are 2 restaurants brazilizan.

Step 05 - Analize

In [15]:
# one hot encoding
neighborhood_onehot = pd.get_dummies(restaurants_miami[['Food']], prefix="", prefix_sep="")
neighborhood_onehot['Neighborhood'] = restaurants_miami['Neighborhood']
fixed_columns = [neighborhood_onehot.columns[-1]] + list(neighborhood_onehot.columns[:-1])
neighborhood_onehot = neighborhood_onehot[fixed_columns]


neighborhood_grouped = neighborhood_onehot.groupby('Neighborhood').mean().reset_index()
neighborhood_grouped
Out[15]:
Neighborhood American Arepas Argentinian Asian Brazilian Caribbean Chinese Comfort Food Cuban ... South American Southern / Soul Spanish Sushi Tapas Tex-Mex Thai Vegetarian / Vegan Venezuelan Vietnamese
0 Allapattah 0.040000 0.000000 0.000000 0.080000 0.000000 0.080000 0.040000 0.04 0.120000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.040000 0.000000 0.000000
1 Arts & Entertainment District 0.102564 0.000000 0.051282 0.051282 0.025641 0.000000 0.025641 0.00 0.000000 ... 0.000000 0.000000 0.025641 0.000000 0.025641 0.000000 0.000000 0.000000 0.000000 0.000000
2 Buena Vista 0.038462 0.038462 0.038462 0.038462 0.000000 0.000000 0.038462 0.00 0.038462 ... 0.038462 0.000000 0.000000 0.038462 0.038462 0.000000 0.000000 0.000000 0.000000 0.000000
3 Coconut Grove 0.147059 0.000000 0.029412 0.029412 0.000000 0.000000 0.000000 0.00 0.029412 ... 0.000000 0.000000 0.029412 0.000000 0.000000 0.000000 0.029412 0.000000 0.000000 0.029412
4 Coral Way 0.000000 0.045455 0.045455 0.000000 0.000000 0.000000 0.045455 0.00 0.181818 ... 0.045455 0.000000 0.045455 0.045455 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
5 Design District 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
6 Flagami 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.090909 0.00 0.363636 ... 0.030303 0.000000 0.090909 0.030303 0.000000 0.030303 0.000000 0.000000 0.000000 0.000000
7 Grapeland Heights 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.269231 ... 0.038462 0.000000 0.000000 0.038462 0.076923 0.038462 0.000000 0.000000 0.000000 0.000000
8 Little Haiti 0.071429 0.000000 0.071429 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 ... 0.000000 0.000000 0.000000 0.071429 0.071429 0.000000 0.000000 0.000000 0.071429 0.071429
9 Little Havana 0.000000 0.000000 0.000000 0.000000 0.000000 0.051282 0.000000 0.00 0.153846 ... 0.051282 0.000000 0.076923 0.025641 0.025641 0.000000 0.025641 0.000000 0.025641 0.000000
10 Lummus Park 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
11 Overtown 0.200000 0.000000 0.000000 0.200000 0.000000 0.000000 0.000000 0.00 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
12 Park West 0.111111 0.037037 0.000000 0.037037 0.000000 0.000000 0.000000 0.00 0.037037 ... 0.000000 0.000000 0.037037 0.074074 0.000000 0.037037 0.037037 0.000000 0.000000 0.000000
13 The Roads 0.047619 0.000000 0.095238 0.000000 0.000000 0.000000 0.000000 0.00 0.047619 ... 0.000000 0.000000 0.095238 0.047619 0.000000 0.047619 0.000000 0.000000 0.000000 0.000000
14 Upper Eastside 0.066667 0.000000 0.000000 0.000000 0.066667 0.066667 0.000000 0.00 0.000000 ... 0.000000 0.000000 0.066667 0.000000 0.066667 0.000000 0.000000 0.066667 0.000000 0.000000
15 Venetian Islands 0.181818 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 ... 0.000000 0.045455 0.045455 0.136364 0.000000 0.000000 0.045455 0.090909 0.000000 0.000000
16 Virginia Key 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.00 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
17 West Flagler 0.058824 0.000000 0.000000 0.058824 0.000000 0.000000 0.117647 0.00 0.117647 ... 0.058824 0.000000 0.000000 0.058824 0.000000 0.000000 0.000000 0.058824 0.000000 0.058824

18 rows × 37 columns

In [16]:
num_top_venues = 5

for hood in neighborhood_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = neighborhood_grouped[neighborhood_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')
----Allapattah----
        venue  freq
0  Restaurant  0.12
1   Fast Food  0.12
2       Cuban  0.12
3     Seafood  0.08
4       Asian  0.08


----Arts & Entertainment District----
         venue  freq
0      Seafood  0.18
1      Italian  0.15
2     American  0.10
3   Restaurant  0.10
4  Argentinian  0.05


----Buena Vista----
            venue  freq
0         Italian  0.19
1      Restaurant  0.12
2    New American  0.08
3  Latin American  0.08
4        American  0.04


----Coconut Grove----
            venue  freq
0         Italian  0.21
1        American  0.15
2    New American  0.12
3      Restaurant  0.09
4  Latin American  0.06


----Coral Way----
            venue  freq
0           Cuban  0.18
1  Latin American  0.14
2       Fast Food  0.14
3         Mexican  0.09
4           Sushi  0.05


----Design District----
            venue  freq
0        Peruvian   0.5
1      Restaurant   0.5
2        American   0.0
3          Arepas   0.0
4  Middle Eastern   0.0


----Flagami----
            venue  freq
0           Cuban  0.36
1  Latin American  0.12
2       Fast Food  0.09
3         Chinese  0.09
4         Spanish  0.09


----Grapeland Heights----
            venue  freq
0           Cuban  0.27
1  Latin American  0.23
2         Mexican  0.12
3      Restaurant  0.08
4           Tapas  0.08


----Little Haiti----
        venue  freq
0     Italian  0.36
1    American  0.07
2  Venezuelan  0.07
3       Tapas  0.07
4       Sushi  0.07


----Little Havana----
            venue  freq
0  Latin American  0.18
1           Cuban  0.15
2         Mexican  0.10
3         Spanish  0.08
4      Portuguese  0.08


----Lummus Park----
            venue  freq
0         Mexican   1.0
1          Arepas   0.0
2  Middle Eastern   0.0
3    New American   0.0
4        Peruvian   0.0


----Overtown----
             venue  freq
0       Restaurant   0.4
1         American   0.2
2            Asian   0.2
3          Seafood   0.2
4  Southern / Soul   0.0


----Park West----
            venue  freq
0         Mexican  0.15
1  Latin American  0.15
2        American  0.11
3      Restaurant  0.07
4         Seafood  0.07


----The Roads----
            venue  freq
0         Italian  0.14
1  Latin American  0.14
2     Argentinian  0.10
3         Spanish  0.10
4          French  0.10


----Upper Eastside----
                venue  freq
0             Italian  0.20
1            Japanese  0.13
2            American  0.07
3             Mexican  0.07
4  Vegetarian / Vegan  0.07


----Venetian Islands----
                venue  freq
0            American  0.18
1               Sushi  0.14
2             Italian  0.09
3  Vegetarian / Vegan  0.09
4            Peruvian  0.09


----Virginia Key----
        venue  freq
0     Seafood  0.38
1     Italian  0.31
2  Restaurant  0.23
3   Caribbean  0.08
4    American  0.00


----West Flagler----
            venue  freq
0  Latin American  0.24
1         Chinese  0.12
2           Cuban  0.12
3        American  0.06
4         Italian  0.06


In [17]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind + 1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind + 1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = neighborhood_grouped['Neighborhood']

for ind in np.arange(neighborhood_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neighborhood_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted
Out[17]:
Neighborhood 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue
0 Allapattah Cuban Fast Food Restaurant Latin American Mexican
1 Arts & Entertainment District Seafood Italian American Restaurant Mexican
2 Buena Vista Italian Restaurant Latin American New American Arepas
3 Coconut Grove Italian American New American Restaurant Latin American
4 Coral Way Cuban Latin American Fast Food Mexican South American
5 Design District Peruvian Restaurant Vietnamese Empanada Italian
6 Flagami Cuban Latin American Fast Food Chinese Spanish
7 Grapeland Heights Cuban Latin American Mexican Tapas Restaurant
8 Little Haiti Italian Vietnamese Mexican Argentinian Japanese
9 Little Havana Latin American Cuban Mexican Spanish Seafood
10 Lummus Park Mexican Vietnamese Korean Italian Indian
11 Overtown Restaurant American Seafood Asian Italian
12 Park West Latin American Mexican American Sushi Seafood
13 The Roads Latin American Italian Argentinian French Spanish
14 Upper Eastside Italian Japanese Brazilian Mexican Empanada
15 Venetian Islands American Sushi Mexican Vegetarian / Vegan Italian
16 Virginia Key Seafood Italian Restaurant Caribbean Vietnamese
17 West Flagler Latin American Chinese Cuban South American Asian

Step 06 - Analize (Classification)

Question: Which category / cookery appears the most?

In [18]:
sns.set(rc = {"font.size":20, "axes.titlesize":20, "axes.labelsize":20, 'figure.figsize':(10,10)})
sns.countplot(y = "Food", data = restaurants_miami, order = restaurants_miami['Food'].value_counts().index)
Out[18]:
<AxesSubplot:xlabel='count', ylabel='Food'>
In [19]:
text = restaurants_miami['Food'].values 

wordcloud = WordCloud().generate(str(text))

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()