Basics of building a Proximity Service to Discover Nearby Gems

Basics of building a Proximity Service to Discover Nearby Gems

Laying the Groundwork: Functional and Non-Functional Requirements, Back-of-the-Envelope Estimations, and High-Level Design

In this series, "Building a Proximity Service to Discover Nearby Gems," we will embark on an adventure to design and implement a robust proximity service. This service will help users discover nearby gems, such as restaurants, cafes, and other points of interest. In Part 1, we'll lay the groundwork by defining the functional and non-functional requirements, performing back-of-the-envelope estimations, and proposing a high-level design.

Functional Requirements

  • The service should accurately detect and utilize the user's current location (longitude and latitude pair) to find nearby gems within a specified radius.

  • Business owners should be able to add, delete, or update their business information. However, these changes do not need to be reflected in real-time.

  • Users should be able to view detailed information about each business, including address, hours of operation, contact information, and user reviews.

Non - Functional Requirements

  • Users should be able to see nearby businesses quickly with minimal delay, ensuring a smooth and responsive experience.

  • The service must encrypt user location data to protect privacy and ensure compliance with data protection laws such as GDPR

  • The system should be highly available, ensuring that users can access the service at any time without significant downtime

  • The service should be able to handle spikes in traffic during peak hours, especially in densely populated areas, without degradation in performance.

Back-of-the-envelope estimation

Seconds in a day :- 24*60*60 = 86,400 Lets round it up to 10^5 for easier calculation.

Lets assume we have 50 million daily users and 100 million active users.

A user makes 5 queries per day.

Search QPS (Query Per second) :- 50 million x 5 / 10^5 = 2500

High-Level Design

In this section we cover the following

  • API Design

  • High Level Design

  • Data Model

API Design

GET /v1/search/nearby

This endpoint returns businesses based on certain search criteria.

FieldDescriptionType
latitudeLatitude of a given locationdecimal
longitudeLongitude of a given locationdecimal
radiusOptional. (Let's set default 20 miles)int
{
    "total": 30,
    "businesses": [ [ business object ] ]
}

So now the business object has everything we need to render the result search page but we still my need additional attributes like images, description, ratings etc. For these usually a new endpoint is usually required.

APIs for businesses

APIDetails
GET /v1/businesses/:idReturn detailed information about business
POST /v1/businessesAdd a business
PUT /v1/businesses/:idUpdate Details of a business
DELETE /v1/businesses/:idDelete a business

Data Model

In this section we will discuss the read/write ratio and the schema designs.

Read/Write Ratio

In our case we high high read volume due to the below 2 features being commonly used :-

  • Search for nearby businesses

  • View detailed info about business

On the other hand the write volume is very low because adding, editing, removing business information are infrequent operations.

Data Schema

The key tables are Business table and Geospatial index table.

Business table

Geospatial Index table

This we will discuss in the next part of the series as this requires knowledge of geohash.

High Level Design

Below is the high level design. The system comprises of 2 parts:-

  • Location based service

  • Business Service

First we have the load balancer which automatically distributes incoming traffic across multiple services. Usually we would provide a single DNS entry point and internally route the APIs based on URL paths.

Location Based Service

This LBS service is the core part which helps us find the nearby businesses for a given radius and location. It has following characteristics :-

  • It is read heavy service with no write requests.

  • Its QPS (Query Per second) is very high during peak hours.

  • It is stateless so easy to scale horizontally.

Business Service

This has to deal with mainly 2 types of requests :-

  • Create, update, delete requests of restaurants which have significantly less QPS.

  • Customer view detailed info about a business. QPS is high during peak business hours.

Database Cluster

The database cluster can use a primary-secondary setup. In this configuration, the primary database handles all the write operations, while multiple replicas handle the read operations. Data is first saved to the primary database and then replicated to all the replicas. Due to replication delay, there might be some discrepancy between the data read by the Load Balancer Service (LBS) and the data written to the primary database. However, this is not an issue as the business information does not need to be updated in real-time.

Scalability of business service and LBS

Both the business service and the LBS are stateless, making it easy to automatically add more servers to accommodate peak traffic and remove servers during off-peak hours (sleep time). We can use different regions and availability zones to further improve availability.

Conclusion

In this first part of our "Building a Proximity Service to Discover Nearby Gems" series, we've laid a strong foundation by defining the functional and non-functional requirements for our proximity service. We've also conducted back-of-the-envelope estimations to understand the scale and resources needed, and we've proposed a high-level design to guide our implementation.

Stay tuned for Part 2, where we will dive deeper into various algorithms like Geohash, Quadtree, and Google S2 that can be used to fetch nearby businesses efficiently.