Building a simple redis  like data-store "crowRedis" in Python

Building a simple redis like data-store "crowRedis" in Python

Β·

15 min read

πŸ’‘
This is a very simplistic example of how Redis/any datastore works, it is more like a simulation of its core features for me to learn how it works internally, it covers most of the core things but their implementation is in a very simple manner, I am also learning currently.

Why I made this, what compelled me :

Two Reasons - First:

So I was tired of learning all that theory of databases, and there were lots of people who would yap about their great sage-like knowledge about databases. They would use fancy-sounding words and make it look like some hidden big-boy club knowledge.
And close to the ego-maniac I am, I decided to build most of the core functionality.
I don't like when people act like they know more than me and lecture me, You know the theory, Well I know how to build that.

ヽ( ̄~ ̄ )γƒŽ | Angry anime face, Anime sketch, Anime ...

Challenges:

So I did not just want to store some key-value pair using a hashmap in memory/RAM and then say yay I built a database, no one would take me seriously lol , I would look like a noob(I totally am).

Pin em κ·Έλ¦Ό

Nah I took a paper and wrote the things I wanna implement that are very core to Redis like the database:

Basic Operations like Get, Set, Delete

Persistence: Snapshot and AOF

Transaction Support: I wanted to support multiple concurrent transactions as a challenge.

ACID: I know I alone in 5 days can't make it fully ACID compatible, but the current version is quite atomic.

Cucurracny: Current code support, concurrent operations, multiple concurrent transactions, and separate threads for most of the tasks.

TTl(TIme to Live): I have implemented that too, but for some reason when I try to get data that was not set up with the TTL flag, I get errors, so I am working to fix that.

PUB/SUB: Yeah I implemented that too, so you can subscribe to a channel and when you push some data to it, others who have subscribed to the same channel get those msg's instantly, but currently in my client if I push msg from one instance, it does not appear in other instance running of the same client subscribed to the same channel. **Either it's my architectural issue in sockets, or I need to implement a RabbitMQ solution for global msg sharing.

How I started:

So the first thing I searched on Google was, Signs you are stupid?

cute ✨

Ok, I searched, for how databases or a datastore work, and how does redis work, and then I read some 5-10 blogs, pdf's about database and their functions.

Build your own database

Write your own miniature Redis with Python

`I also read the actual paper by F. Codd's , that famous IBM paper, I wanted to know the mindset of these people who were tackling this problem the first time and how they thought.

I also searched how a request works, I know it sounds like basic stupid stuff, but me is very stupid. It takes me a lot of time to understand even basic things and their working, and I don't like complex language with big-sounding jargon, for even small things I need like 3-to 4 real examples with different circumstances and parameters for me to understand.

Slowly and steadily I understand them, but I like learning how things work, I have my own pace and way of dealing with things.

Until I know the workings of something from the core, basics, my mind does not let me do anything.

lol see the questions I asked chat gpt :

Github:

https://github.com/biohacker0/crowRedis

Architecture

ComponentDescription
Redis Client App(s)Application(s) that interact with the Redis server over the network.
Redis Server (6381)Redis server instances running on port 6381, written in Python.
Data Store (Key-Value)In-memory storage for key-value pairs.
Snapshot (Persistence)Periodically saves data to a snapshot file for durability.
Append-Only File (AOF)Logs each command for recovery and replication.
Snapshot File (txt)Holds a snapshot of the data for data recovery.
TransactionsSupport for multi-command transactions using MULTI, EXEC, DISCARD, etc.
Transaction HandlingFunctionality for handling transaction commands and operations.
List Operations (LPUSH, RPUSH)Commands for adding elements to lists (left and right).
Data Retrieval (GET)Command to retrieve values associated with keys.
Data Storage (SET, DEL)Commands for storing and deleting key-value pairs.
Persistence (Snapshot, AOF)Functions for data persistence, including snapshot and AOF.
NetworkCommunication via TCP/IP between client applications and the server.

Socket Server

At the core of this crowRedis server is a socket server that listens for incoming connections. It accepts client connections and spawns a new thread to handle each client's requests concurrently. The server listens on a specified host and port.

Data Store

The server maintains an in-memory data store, a Python dictionary, to store key-value pairs. This data store is the heart of Redis, and it supports operations like SET, GET, and DEL.

Persistence

server supports two forms of data persistence: snapshots and AOF.

Snapshots

Periodically, the server creates snapshots of the data store by writing all key-value pairs to a snapshot file. This allows the server to recover its state in case of a crash. Snapshots are created at specified time intervals.

Append-Only File (AOF)

The Append-Only File records all write operations as commands. It ensures durability by replaying these commands in case of server crashes. This feature can be enabled or disabled as needed.

Transactions

Redis supports transactions, a sequence of commands executed as a single atomic operation. Our simplified Redis server implements basic transactional commands: MULTI, EXEC, and DISCARD.

Features

Key-Value Operations

The server supports the following key-value operations:

  • SET: Set a key to hold a string value.

  • GET: Get the value of a key.

  • DEL: Delete a key.

List Operations

Redis can handle lists, and our server supports the following list operations:

  • LPUSH: Insert values at the beginning of a list.

  • RPUSH: Insert values at the end of a list.

  • LPOP: Remove and return the first element of a list.

  • RPOP: Remove and return the last element of a list.

  • LRANGE: Get a range of elements from a list.

Data Persistence

The server can save its data to a snapshot file and recover from it on startup. It also supports an Append-Only File (AOF) for command logging and recovery.

Transactions

The server implements a basic form of transactions. It allows clients to initiate a transaction, add multiple commands to it, and then either execute or discard the transaction as a whole.

Exploring the Code

Let's dive into the code and understand the functions responsible for these features.

Key-Value Operations

The handle_set, handle_get, and handle_del functions handle SET, GET, and DEL operations, respectively. They interact with the data store and log changes to the AOF.

Data Persistence

The save_snapshot function creates snapshots of the data store, while load_snapshot and load_aof load data from the snapshot and AOF files, respectively.

Transactions

Transaction-related functions include handle_transaction, execute_transaction, and the functions for list operations. These functions ensure that commands within a transaction are executed atomically

Discover 75+ anime programming language latest - in.cdgdbentre


crowRedis Functions and their workings:

We can divide this into I guess, basic operations, persistence, and transactions.

1. Key-Value Operations (SET, GET, DEL) πŸͺ™

handle_set(key, value) πŸͺ™πŸ“

  • Input: Accepts a key and a value.

  • Output: Responds with "OK" upon success.

  • Purpose 🌟: To store a key-value pair in the data store and log the action.

  • How it Works πŸ”: This function takes a key and a value as input and stores them in the data store. It also records this action in the Append-Only File (AOF).

handle_get(key) πŸͺ™πŸ”

  • Input: Requires a key.

  • Output: Returns the associated value or "nil" if the key is not found.

  • Purpose 🌟: To retrieve the value associated with a key from the data store.

  • How it Works πŸ”: This function takes a key as input and looks up the associated value in the data store, returning the value if found, or "nil" if the key is not present.

handle_del(key) πŸͺ™πŸ—‘️

  • Input: Expects a key.

  • Output: Indicates success with "1" or "0" if the key is not found.

  • Purpose 🌟: To delete a key-value pair from the data store and log the action.

  • How it Works πŸ”: This function takes a key as input, removes the corresponding key-value pair from the data store if it exists, and records the deletion in the AOF.

2. Data Persistence (SAVE, Snapshot, AOF) πŸ’Ύ

handle_save() πŸ’Ύ

  • Input: No specific input required.

  • Output: Confirms with "Data saved to snapshot file."

  • Purpose 🌟: To create a snapshot of the current data and save it for later recovery.

  • How it Works πŸ”: This function generates a snapshot of the data and stores it in a snapshot file, ensuring that data is preserved.

save_snapshot() πŸ“Έ

  • Input: None, it simply captures the current data state.

  • Output: Quietly creates a snapshot for future reference.

  • Purpose 🌟: To create a snapshot of the current data state, suitable for recovery.

  • How it Works πŸ”: This function iterates through the data and writes key-value pairs to a snapshot file, effectively creating a snapshot of the data.

append_to_aof(command) πŸ“βž‘οΈπŸ“„

  • Input: Accepts a command to append to the AOF.

  • Output: Appends the command to the AOF file for future recovery.

  • Purpose 🌟: To log commands in the Append-Only File (AOF) for recovery purposes.

  • How it Works πŸ”: This function takes a command as input and appends it to the AOF file, ensuring that all commands are recorded for future recovery.

recover_from_aof() πŸ“šπŸ”

  • Input: Scans the AOF for stored commands.

  • Output: Restores data by executing the commands from the AOF.

  • Purpose 🌟: To recover data by reading and executing the commands stored in the AOF.

  • How it Works πŸ”: This function reads the AOF file, parses the stored commands, and executes them to reconstruct the data state.

3. Transactions (MULTI, EXEC, DISCARD) πŸ”„

handle_transaction() πŸ”

  • Input: Initiates a transaction with an empty command list.

  • Output: Prepares the transaction context for future commands.

  • Purpose 🌟: To start a transaction and prepare a context to collect commands.

  • How it Works πŸ”: This function initializes a transaction context and begins collecting commands for execution.

Transaction Commands (e.g., LPUSH, RPUSH) βž•

  • Input: Listens for various transactional commands.

  • Output: Adds received commands to the transaction context.

  • Purpose 🌟: To collect and store transactional commands in the context for later execution.

  • How it Works πŸ”: This function listens for transactional commands and adds them to the list of commands to be executed within the transaction.

handle_transaction_execute() ✨

  • Input: Executes the collected transaction commands.

  • Output: Applies the changes made by the transaction commands.

  • Purpose 🌟: To execute and apply changes made by the collected transaction commands.

  • How it Works πŸ”: This function takes the collected transaction commands, executes them sequentially, and applies the changes to the data store.

handle_transaction_discard() πŸ—‘οΈβŒ

  • Input: Discards the collected transaction commands.

  • Output: Clears the transaction context, discarding all collected commands.

  • Purpose 🌟: To discard all collected transaction commands and return to a clean state.

  • How it Works πŸ”: This function clears the transaction context, effectively discarding all previously collected commands

This was the explanation for the corwRedis.py file, but that is just a server, we need a client to interact with the server, where is that tho.

great, now let's discuss our client code (client.py)



HD wallpaper: anime, c++, programming, blue eyes, book cover | Wallpaper  Flare

Client Code explanation:

Why crowRedis and its Client? πŸ€”

crowRedis like actual redis is a versatile, in-memory data store that serves as a key-value database and a high-speed cache. To interact effectively with crowRedis, we need a clientβ€”a connector that enables the communication between Python and Redis. With a client, we can send commands to crowRedis and receive its responses, making our interaction with this powerful database smooth and efficient.

Getting Set Up βš™οΈ

Before we start coding, we need to make sure our environment is ready:

  1. Python Environment 🐍: Ensure Python is installed on your system.

  2. Socket Module πŸ”Œ: We'll be using the socket module to establish connections.

Understanding the RedisClient Class πŸ—οΈ

The core of our Redis client is the Client class. Here's an overview of how it works:

  • Initializing the Connection Parameters πŸš€: We set the host and port to connect to the Redis server.

  • Handling Connections 🌐: Our client establishes connections when needed and closes them when we're done.

Sending and Receiving Commands πŸ“€πŸ“₯

The Client is responsible for sending commands and receiving responses. Here's a high-level view of how this process works:

  • Sending Commands βœ‰οΈ: We send crowRedis commands to the server. The client ensures they are correctly formatted before transmitting them.

  • Receiving Responses πŸ“¬: After sending a command, we listen for and decode responses from the crowRedis server, converting them to a readable format.

Transaction Support πŸ’Ό

Transactions allow us to bundle multiple Redis commands into a single unit of work. Here's how our crowRedis client supports this:

  • Introduction to Transactions πŸ’Ό: Transactions play a crucial role in ensuring data consistency and reliability.

  • Using "MULTI" πŸ”„: We initiate transactions with the "MULTI" command.

  • Running Transactions πŸš€: Transactions are executed with "EXEC," and we can cancel them with "DISCARD."

    Bocchi the Rock! Episode 1 Discussion - Forums - MyAnimeList.net

πŸ¦… Think of the crowRedis.py server as the main boss. It has all the functions to do things, but hey, the boss doesn't talk to normies like us. πŸ€΅πŸ‘©β€πŸ’ΌπŸ§‘β€πŸ’Ό He's got big stuff to do. So, he hired a client(client.py), to handle all the requests and talk with the user. We the users, go to the client and say, "Ayo client, *looks around*, I need to set my stuff in your memory database, okay? Tell the big boss to do it."

The client goes to the boss (server) and gives him our request in a chit where it mentions what we want to do (A query).

All this talk happens via socket protocols. πŸ§¦πŸ“¦πŸŒ


Benchmark Test(basic) :

Benchmark Test

In this benchmark test, I compared crowRedis, postgreSQL, and real Redis against each other on the same hardware.
plz don't take this seriously, cause I am comparing a relational database with a memory database, but I wanted to see how much is a RAM-based database faster a disk-based one.

Also, real Redis uses a very complex mechanism for set,get,del of data and, mine is way too simplistic, that's why it's doing so fast operations.

I am also learning things and might make some stupid comparisons so plz forgive me, I will learn what I don't know and improve ✌️

PostgreSQL

MetricValueDatabase
INSERT0.1802 secondspostgreSQL
UPDATE1.6753 secondspostgreSQL
DELETE0.2250 secondspostgreSQL
TRANSACTIONS0.0680 secondspostgreSQL
Throughput1470.95 transactions per secondpostgreSQL
Average response time0.0007 secondspostgreSQL

crowRedis

MetricValueDatabase
Total time taken0.021941661834716797 secondscrowRedis
Throughput4557.54 transactions per secondcrowRedis
Average response time0.0002 secondscrowRedis
Benchmark SET1000 requests in 0.4349 secondscrowRedis
Benchmark GET1000 requests in 0.0271 secondscrowRedis
Benchmark DEL1000 requests in 0.0322 secondscrowRedis

Redis

MetricValueDatabase
Total time taken0.016948461532592773 secondsRedis
Throughput5900.24 transactions per secondRedis
Average response time0.0002 secondsRedis
Benchmark SET1000 requests in 0.0280 secondsRedis
Benchmark GET1000 requests in 0.0320 secondsRedis
Benchmark DEL1000 requests in 0.0315 secondsRedis

My Troubles:

So building a working database/datastore, even a simple one Like mine is not as easy as it looks in theory, I promise you, any change you make affects everything, if you change the get, and set, then you need to make sure that transactions do not falter.

Currently, I am facing Two Issues:

1: TTL Functionality: when I added TTL support and then tested to set some data with the TTL flag, it worked fine, the subsequent get also worked fine.

But when I then set normal data, just after I used a ttl operation, the get fails for some reason.

I ran the code on actual paper, and I don't see any issues , But I will fix it.

2: PUB/SUB: So I have my custom client, where if you subscribe to a channel ex: Channel-1 and then open another instance of my client in the terminal and subscribe to the same channel Channel-1.

Then if you publish a msg to channel-1 it appears in the terminal of instance-1, but not in in instance-2 and vice versa.

Hitori Gotō / Bocchi | Know Your Meme

I can set my sockets to handle inter-instance global msg sharing, but chat GPT says, it's better to implement RabbitMQ for this. I am gonna go with RabbitMQ, cause those sockets mess up with my normal set and get also, like if I subscribe and publish some data, and right after that set some data normally, the current code treats it as a published msg, so there is lots of these mix match thing.

I don't know if it's due to my client architecture or the main server architecture, every time I implement a new feature an old thing gets messed up, and I have to re-think how everything should work with the current change in mind and make changes.

But I like it, with this, I learned more about how computers work, the RAM, the threads, the OS, and communication protocols.

why is Redis used, when to use it, and when to use a relational database. How do these things work internally and much more.

**I am not saying I have become some Jedi coder, but now I don't fear or get anxious when I hear words like: concurrency, B-tree, rollback, ACID, Sharding, Partitions and whatnot.**

Bye Bye :3

Β