# Variables
= 10
a = 20
b
# operations
= a + b
c print(c) # prints the value
= ' Some string '
s # operations
print(s.strip())
4 Manipulate data with standard libraries and co-locate code with classes and functions
In this chapter, we will learn Commonly used data structures Loops Classes, objects, and functions.
4.1 Use the appropriate data structure based on how the data will be used
Let’s go over some basics of the Python language:
Variables
: A storage location identified by its name, containing some value.Operations
: We can do any operation (arithmetic for numbers, string transformation for text) on variablesData Structures
: They are ways of representing data. Each has its own pros and cons, as well as specific situations where it is the right fit.3.1.
List
: A collection of elements that can be accessed by knowing the element’s location (aka index). Lists retain the order of their elements.3.2.
Dictionary
: A collection of key-value pairs where each key is mapped to a value using a hash function. The dictionary provides fast data retrieval based on keys.3.3.
Set
: A collection of unique elements that do not allow duplicates.3.4.
Tuple
: A collection of immutable(non changeable) elements, tuples retain their order once created.
# Data structures
# List
= [1, 2, 3, 4]
l
print(l[0]) # Will print 1
print(l[3]) # Will print 4
# disctionary
= {'a': 1, 'b': 2}
d
print(d.get('a')) # get value of a
print(d.get('b')) # get value of b
# Set
= set() # set only stores unique values
my_set 10)
my_set.add(10) # we already have a 10
my_set.add(10) # we already have a 10
my_set.add(30)
my_set.add(print(my_set)
4.2 Manipulate data with control-flow loops
Loops
: Looping allows a specific chunk of code to be repeated several times. The most common type is thefor
loop.Comprehension
: Comprehension is a shorthand way of writing a loop. This allows for concise code, great for representing simpler logic.
# Range(n) creates a list of values from o to n-1 (inclusive)
# we can pull out one element from a list and call it a vaiable (i in our case)
for i in range(11):
print(i)
# List based looping
print('############## index based looping ')
for idx in range(len(l)):
print(l[idx])
# shorthand to loop through elements in a list
print('############## shorthand to loop through elements in a list')
for elt in l:
print(elt)
# List based looping, while getting the index number
print('############## List based looping, while getting the index number ')
for idx, elt in enumerate(l):
print(idx, elt)
# Looping element in dictionary
# only keys
for i in d:
print(i)
# keys and values
for k, v in d.items():
print(f'Key: {k}, Value: {v}')
# list comprehension
# instead of writing a loop, we can use the loop inside a [] to create another list
# Here we multiply each element in l by 2 and create a new list
*2 for elt in l] [elt
# dictionary comprehenmsion
# we can create a dictionary using comprehension as well
f'key_{elt}': elt*2 for elt in l} {
4.3 Co-locate logic with classes and functions
Functions
: A block of code that can be reused as needed. This allows us to have logic defined in one place, making it easy to maintain and use. Using it in a location is referred to as calling the function.Class and Objects
: Think of a class as a blueprint and objects as things created based on that blueprint.Library
: Libraries are code that can be reused. Python comes with standard libraries for common operations, such as a datetime library to work with time (although there are better libraries)—Standard library.Exception handling
: When an error occurs, we need our code to gracefully handle it without stopping.
# let's create a function to create an age_bucket for our customer data
= [
customer_data 'name': 'customer_1', 'id': 1, 'age': 100},
{'name': 'customer_2', 'id': 2, 'age': 42},
{'name': 'customer_3', 'id': 3, 'age': 25},
{'name': 'customer_4', 'id': 4, 'age': 19},
{
]
def get_age_bucket(customer):
= customer['age']
customer_age if customer_age > 85:
return '85+'
elif customer_age > 50:
return '50_85'
elif customer_age > 30:
return '30_50'
else:
return '0_30'
for customer in customer_data:
print(customer['age'], get_age_bucket(customer))
class DataExtractor:
def __init__(self, some_value):
self.some_value = some_value
def get_connection(self):
pass
def close_connection(self):
pass
= DataExtractor(10)
de_object print(de_object.some_value)
class Pipeline:
def __init__(self, pipeline_type):
self.pipeline_type = pipeline_type # called an object variable, will be specific for individual objects
def extract(self):
print(f'Data is being extracted for {self.pipeline_type}')
def transform(self):
print(f'Data is being transformed for {self.pipeline_type}')
def load(self):
print(f'Data is being loaded for {self.pipeline_type}')
def run(self):
self.extract()
self.transform()
self.load()
= Pipeline('customer_pipeline') # we create an object of type Pipeline class
p1 = Pipeline('orders_pipeline') # we create another object of type Pipeline class
p2
# note how the extract, transform and load methods will print customer pipeline
p1.run() print('###########################')
p2.run()
# let's use a standard library to get the current date in YYYY-mm-dd format
from datetime import datetime
print(datetime.now().strftime('%Y-%m-%d'))
# When we try to access an element that is not part of a list we get an out of index error,
# with the try block, the error will be
# caught by the except block
# finally will be executed irrespective of if there was an error or not
= [1, 2, 3, 4, 5] l
= 10
index try:
= l[index]
element print(f"Element at index {index} is {element}")
except IndexError:
print(f"Error: Index {index} is out of range for the list.")
finally:
print("Execution completed.")
= 2
index try:
= l[index]
element print(f"Element at index {index} is {element}")
except IndexError:
print(f"Error: Index {index} is out of range for the list.")
finally:
print("Execution completed.")
4.4 Exercises
- Customer Order Analysis: Write python code that processes a list of customer orders to calculate the total revenue and find top 3 the most frequent customer.
We have a list of orders, where each order is a dictionary with keys: customer_id, product, quantity, and price.
revenue = quantity * price
frequency of customer is defined as the number of orders
= [
orders "customer_id": "C001", "product": "laptop", "quantity": 2, "price": 1200.00},
{"customer_id": "C002", "product": "mouse", "quantity": 1, "price": 25.99},
{"customer_id": "C001", "product": "keyboard", "quantity": 1, "price": 89.50},
{"customer_id": "C003", "product": "monitor", "quantity": 1, "price": 299.99},
{"customer_id": "C002", "product": "laptop", "quantity": 1, "price": 1200.00},
{"customer_id": "C004", "product": "headphones", "quantity": 3, "price": 79.99},
{"customer_id": "C001", "product": "webcam", "quantity": 1, "price": 45.00},
{"customer_id": "C003", "product": "mouse", "quantity": 2, "price": 25.99},
{"customer_id": "C002", "product": "speaker", "quantity": 1, "price": 150.00},
{"customer_id": "C005", "product": "tablet", "quantity": 1, "price": 399.99}
{ ]
- Data Quality Checker: Write a Python function that takes a list of email addresses and returns a dictionary with two keys: valid_emails (list) and invalid_emails (list).
Use basic validation rules 1. must contain @
2. .
must be after @
3. must contain text before the @
= [
email_list "john.doe@company.com",
"jane.smith@email.co.uk",
"invalid-email",
"bob@gmail.com",
"alice.brown@company.com",
"john.doe@company.com", # duplicate
"missing@domain",
"test@example.org",
"@nodomain.com",
"jane.smith@email.co.uk", # duplicate
"valid.user@site.net",
"no-at-symbol.com",
"another@test.io"
]
- Sales Performance Tracker: Create a class called SalesRep that stores a representative’s name and a list of their sales amounts.
Include methods to add sales amounts, calculate average sales, and determine if they hit a target (parameter).
# Sample data for creating SalesRep objects
= {
sales_data "Alice Johnson": [15000, 18000, 22000, 16000, 19000, 21000],
"Bob Smith": [12000, 14000, 11000, 13000, 15000, 16000],
"Carol Davis": [25000, 28000, 30000, 27000, 32000, 29000]
}