4  Manipulate data with standard libraries and co-locate code with classes and functions

In this chapter, we will learn Commonly used data structures Loops Classes, objects, and functions.

4.1 Use the appropriate data structure based on how the data will be used

Let’s go over some basics of the Python language:

  1. Variables: A storage location identified by its name, containing some value.

  2. Operations: We can do any operation (arithmetic for numbers, string transformation for text) on variables

  3. Data Structures: They are ways of representing data. Each has its own pros and cons, as well as specific situations where it is the right fit.

    3.1. List: A collection of elements that can be accessed by knowing the element’s location (aka index). Lists retain the order of their elements.

    3.2. Dictionary: A collection of key-value pairs where each key is mapped to a value using a hash function. The dictionary provides fast data retrieval based on keys.

    3.3. Set: A collection of unique elements that do not allow duplicates.

    3.4. Tuple: A collection of immutable(non changeable) elements, tuples retain their order once created.

# Variables
a = 10
b = 20

# operations
c = a + b
print(c) # prints the value

s = '  Some string '
# operations
print(s.strip())
# Data structures

# List
l = [1, 2, 3, 4]

print(l[0])  # Will print 1
print(l[3])  # Will print 4

# disctionary
d = {'a': 1, 'b': 2}

print(d.get('a')) # get value of a
print(d.get('b')) # get value of b

# Set
my_set = set() # set only stores unique values
my_set.add(10)
my_set.add(10) # we already have a 10
my_set.add(10) # we already have a 10
my_set.add(30)
print(my_set)

4.2 Manipulate data with control-flow loops

  1. Loops: Looping allows a specific chunk of code to be repeated several times. The most common type is the for loop.

  2. Comprehension: Comprehension is a shorthand way of writing a loop. This allows for concise code, great for representing simpler logic.

# Range(n) creates a list of values from o to n-1 (inclusive)

# we can pull out one element from a list and call it a vaiable (i in our case) 
for i in range(11):
    print(i)
# List based looping
print('############## index based looping ')
for idx in range(len(l)):
    print(l[idx])

# shorthand to loop through elements in a list
print('############## shorthand to loop through elements in a list')
for elt in l: 
    print(elt)

# List based looping, while getting the index number
print('############## List based looping, while getting the index number ')
for idx, elt in enumerate(l):
    print(idx, elt)
# Looping element in dictionary

# only keys
for i in d:
    print(i)

# keys and values
for k, v in d.items():
    print(f'Key: {k}, Value: {v}')
# list comprehension
# instead of writing a loop, we can use the loop inside a [] to create another list
# Here we multiply each element in l by 2 and create a new list
[elt*2 for elt in l]
# dictionary comprehenmsion
# we can create a dictionary using comprehension as well
{f'key_{elt}': elt*2 for elt in l}

4.3 Co-locate logic with classes and functions

  1. Functions: A block of code that can be reused as needed. This allows us to have logic defined in one place, making it easy to maintain and use. Using it in a location is referred to as calling the function.
  2. Class and Objects: Think of a class as a blueprint and objects as things created based on that blueprint.
  3. Library: Libraries are code that can be reused. Python comes with standard libraries for common operations, such as a datetime library to work with time (although there are better libraries)—Standard library.
  4. Exception handling: When an error occurs, we need our code to gracefully handle it without stopping.
# let's create a function to create an age_bucket for our customer data
customer_data = [
    {'name': 'customer_1', 'id': 1, 'age': 100},
    {'name': 'customer_2', 'id': 2, 'age': 42},
    {'name': 'customer_3', 'id': 3, 'age': 25},
    {'name': 'customer_4', 'id': 4, 'age': 19},
]

def get_age_bucket(customer):
    customer_age = customer['age']
    if customer_age > 85:
        return '85+'
    elif customer_age > 50:
        return '50_85'
    elif customer_age > 30:
        return '30_50'
    else:
        return '0_30'

for customer in customer_data:
    print(customer['age'], get_age_bucket(customer))
class DataExtractor:

    def __init__(self, some_value):
        self.some_value = some_value

    def get_connection(self):
        pass

    def close_connection(self):
        pass

de_object = DataExtractor(10)
print(de_object.some_value)
class Pipeline:

    def __init__(self, pipeline_type):
        self.pipeline_type = pipeline_type # called an object variable, will be specific for individual objects

    def extract(self):
        print(f'Data is being extracted for {self.pipeline_type}')

    def transform(self):
        print(f'Data is being transformed for {self.pipeline_type}')

    def load(self):
        print(f'Data is being loaded for {self.pipeline_type}')

    def run(self):
        self.extract()
        self.transform()
        self.load()

p1 = Pipeline('customer_pipeline') # we create an object of type Pipeline class
p2 = Pipeline('orders_pipeline') # we create another object of type Pipeline class

p1.run() # note how the extract, transform and load methods will print customer pipeline
print('###########################')
p2.run()
# let's use a standard library to get the current date in YYYY-mm-dd format
from datetime import datetime
print(datetime.now().strftime('%Y-%m-%d'))
# When we try to access an element that is not part of a list we get an out of index error, 
# with the try block, the error will be
# caught by the except block
# finally will be executed irrespective of if there was an error or not
l = [1, 2, 3, 4, 5]
index = 10
try:
    element = l[index]
    print(f"Element at index {index} is {element}")
except IndexError:
    print(f"Error: Index {index} is out of range for the list.")
finally:
    print("Execution completed.")
index = 2
try:
    element = l[index]
    print(f"Element at index {index} is {element}")
except IndexError:
    print(f"Error: Index {index} is out of range for the list.")
finally:
    print("Execution completed.")

4.4 Exercises

  1. Customer Order Analysis: Write python code that processes a list of customer orders to calculate the total revenue and find top 3 the most frequent customer.

We have a list of orders, where each order is a dictionary with keys: customer_id, product, quantity, and price.

revenue = quantity * price

frequency of customer is defined as the number of orders

orders = [
    {"customer_id": "C001", "product": "laptop", "quantity": 2, "price": 1200.00},
    {"customer_id": "C002", "product": "mouse", "quantity": 1, "price": 25.99},
    {"customer_id": "C001", "product": "keyboard", "quantity": 1, "price": 89.50},
    {"customer_id": "C003", "product": "monitor", "quantity": 1, "price": 299.99},
    {"customer_id": "C002", "product": "laptop", "quantity": 1, "price": 1200.00},
    {"customer_id": "C004", "product": "headphones", "quantity": 3, "price": 79.99},
    {"customer_id": "C001", "product": "webcam", "quantity": 1, "price": 45.00},
    {"customer_id": "C003", "product": "mouse", "quantity": 2, "price": 25.99},
    {"customer_id": "C002", "product": "speaker", "quantity": 1, "price": 150.00},
    {"customer_id": "C005", "product": "tablet", "quantity": 1, "price": 399.99}
]
  1. Data Quality Checker: Write a Python function that takes a list of email addresses and returns a dictionary with two keys: valid_emails (list) and invalid_emails (list).

Use basic validation rules 1. must contain @ 2. . must be after @ 3. must contain text before the @

email_list = [
    "john.doe@company.com",
    "jane.smith@email.co.uk",
    "invalid-email",
    "bob@gmail.com",
    "alice.brown@company.com",
    "john.doe@company.com",  # duplicate
    "missing@domain",
    "test@example.org",
    "@nodomain.com",
    "jane.smith@email.co.uk",  # duplicate
    "valid.user@site.net",
    "no-at-symbol.com",
    "another@test.io"
]
  1. Sales Performance Tracker: Create a class called SalesRep that stores a representative’s name and a list of their sales amounts.

Include methods to add sales amounts, calculate average sales, and determine if they hit a target (parameter).

# Sample data for creating SalesRep objects
sales_data = {
    "Alice Johnson": [15000, 18000, 22000, 16000, 19000, 21000],
    "Bob Smith": [12000, 14000, 11000, 13000, 15000, 16000],
    "Carol Davis": [25000, 28000, 30000, 27000, 32000, 29000]
}