Study Notes
Everything covered from Python Foundations through Machine Learning — detailed notes, code examples, and projects all in one place.
Variables & Datatypes
A variable is a named container that stores a value. Python automatically detects the type — you don't need to declare it.
str (String)
Text. Wrap in quotes."hello", 'world'
int (Integer)
Whole numbers, no decimal.1, 42, -7
float
Numbers with a decimal point.3.14, 98.5
bool (Boolean)
Only two values.True or False
Conditional Statements
Conditionals let the program make decisions — run different code depending on whether a condition is True or False.
Looping Constructs
Loops let you repeat a block of code multiple times without writing it over and over.
for loop — iterate over a sequence
while loop — repeat while a condition is True
Functions
A function is a reusable block of code. Define it once, call it anywhere. Keeps code clean and avoids repetition.
def
Keyword to define a function.
Parameters
Inputs the function receives — avg in the example above.
return
Sends a value back to wherever the function was called.
Call
Execute the function by writing its name with arguments: calculate_grade(82)
Data Structures
Ways to store and organise collections of data in Python.
- Ordered
- Mutable (can change)
- Allows duplicates
- Ordered
- Immutable (cannot change)
- Allows duplicates
- Key : Value pairs
- Mutable
- Keys must be unique
{"category": "food", "amount": 50}, and all expenses are collected in a list.
Student Report Card Generator
Student Report Card
CLI app that collects student data, calculates averages and grades, and prints formatted report cards.
What it does
Input
Student ID, name, and marks for Maths, Science, English
Processing
Calculates total, average, and assigns a grade using conditionals
Storage
Each student stored as a dictionary inside a list
Output
Prints a formatted report card for every student
Key concepts applied
Personal Expense Tracker (CLI)
Expense Tracker
Interactive CLI menu app to track daily expenses — add, view, filter, and summarise spending.
Features built
| Feature | How it works |
|---|---|
| Add Expense | Input category + amount → append dict to list |
| Show All | Loop through list, print each expense |
| Total Spent | sum() with a generator expression |
| Highest Expense | max() with a lambda key |
| Filter by Category | List comprehension to filter matching items |
| Group by Category | Dictionary to accumulate totals per category |
Request & Response / JSON
An API (Application Programming Interface) is a way for two systems to communicate. You send a Request, and the server sends back a Response.
HTTP Methods
| Method | Purpose | Example |
|---|---|---|
| GET | Retrieve data | Get all expenses |
| POST | Create new data | Add an expense |
| PUT | Update existing data | Edit an expense |
| DELETE | Remove data | Delete an expense |
JSON Structure
JSON (JavaScript Object Notation) is the standard format for sending data between client and server. It looks just like a Python dictionary.
Request & Response Flow
FastAPI
FastAPI is a modern Python framework for building APIs quickly. It uses type hints to validate data automatically and generates interactive docs at /docs.
Key Concepts
Pydantic / BaseModel
Defines the structure of request body data. FastAPI validates incoming data against it automatically.
Path Parameters
Part of the URL — /expenses/5
Defined with {expense_id} in the route.
Query Parameters
After the ? in the URL — /search?q=food
Passed as function arguments.
Request Body
JSON data sent with POST/PUT. Mapped to a Pydantic model in the function parameter.
/expenses/highest before dynamic ones like /expenses/{id}, otherwise "highest" gets treated as an ID.
SQL Basics (PostgreSQL)
SQL (Structured Query Language) is used to create, read, update, and delete data in relational databases like PostgreSQL.
Core Commands
| Command | Purpose |
|---|---|
| SELECT | Read / retrieve data |
| INSERT | Add new rows |
| UPDATE | Modify existing rows |
| DELETE | Remove rows |
| CREATE TABLE | Define a new table structure |
Python + PostgreSQL (psycopg2)
psycopg2 is the Python library used to connect to and interact with a PostgreSQL database from code.
Connection Pattern
Execute a Query
%s placeholders instead of string formatting to pass values into queries. This prevents SQL injection attacks.
try / finally Pattern
Always close the connection in a finally block so it gets cleaned up even if an error occurs.
Expense Tracker — API + Database
Expense Tracker API
FastAPI + psycopg2 + Pydantic. Data stored permanently in PostgreSQL instead of in-memory.
Combined everything from Phase 1 — FastAPI for the HTTP layer, psycopg2 for the database layer, and Pydantic for request validation.
APIs built
| Method | Endpoint | Description |
|---|---|---|
| POST | /expenses | Add a new expense to DB |
| GET | /expenses | Get all expenses from DB |
| GET | /expenses/highest | Get the highest expense |
| GET | /expenses/summary | Total per category |
| GET | /expenses/category/{cat} | Filter by category |
| GET | /expenses/category/{cat}/total | Total for one category |
| PUT | /expenses/{id} | Update an expense |
| DELETE | /expenses/{id} | Delete an expense |
Architecture
Student API — Refactored
Student API
Full CRUD REST API for student records with reusable DB connection helpers and env-based config.
APIs built
| Method | Endpoint | Description |
|---|---|---|
| POST | /students | Add a new student |
| GET | /students | Get all students |
| GET | /students/{id} | Get student by ID |
| PUT | /students/{id} | Update student record |
| DELETE | /students/{id} | Delete a student |
Reusable DB helper
get_connection() to its own module means every route file imports it from one place — changes to DB config only need to happen once.
What is Machine Learning?
Machine Learning is a way of teaching computers to learn from data — instead of writing explicit rules, you show the model examples and let it figure out the patterns.
ML is broadly split into three categories based on how the model learns:
Supervised Learning
Learn from labeled data. You know the correct answers during training.
Unsupervised Learning
No labels. The model discovers hidden patterns and structure on its own.
Reinforcement Learning
Learn through trial and error by receiving rewards or penalties. (Not in Phase 2)
Semi-Supervised
Mix of labeled and unlabeled data. (Not in Phase 2)
Supervised Learning
You teach the model using labeled data — data where you already know the correct answer. The model learns the relationship between inputs and outputs, then predicts outputs for new unseen inputs.
How it works
Two Types
Regression
Predicts a continuous number
e.g. "What score will this student get?" → 78.5
Classification
Predicts a category/label
e.g. "Is this email spam?" → Spam / Not Spam
Unsupervised Learning
No labels. No answer key. You give the model raw data and it finds hidden structure, patterns, or groupings by itself.
Key Type — Clustering
Groups similar data points together into clusters. Points in the same cluster are more similar to each other than to those in other clusters.
Your Project Example
Input: customer age, income, spending score
No labels given
Output: Group A (high spenders), Group B (budget), Group C (casual)
Topics You'll Cover
K-Means Elbow Method Matplotlib Seaborn
Supervised vs Unsupervised
| Supervised | Unsupervised | |
|---|---|---|
| Labeled data? | ✓ Yes | ✗ No |
| Goal | Predict a known output | Discover hidden patterns |
| Output type | Number or Category | Groups / Clusters |
| Your projects | Student Predictor, Spam Classifier | Customer Segmentation |
| Evaluation | MSE, Accuracy, Precision, Recall | Visual inspection, Elbow Method |
Regression
Predicts a continuous numerical value. The model learns from input-output pairs and finds the best-fitting line through the data.
Example Dataset
| Hours Studied | Exam Score |
|---|---|
| 1 | 40 |
| 2 | 50 |
| 3 | 60 |
| 5 | 75 |
| 8 | 90 |
The Equation — Linear Regression
How the Model Learns — Cost Function (MSE)
Train / Test Split
Training Data — 80%
The model learns patterns from this data.
Testing Data — 20%
Model is evaluated on this. Never seen during training.
End-to-End Flow
Classification
Predicts a category/label — the output is one of a fixed set of classes. Instead of a number, the model decides which group something belongs to.
Examples
Binary Classification (2 classes)
Email → Spam or Not Spam
Multi-Class Classification
Review → Positive Neutral Negative
Text Preprocessing (for Spam Classifier)
Evaluation Metrics
How many total predictions were correct?
Of all predicted spam, how many were actually spam?
Of all actual spam, how many did we catch?
Confusion Matrix
Student Score Predictor
Student Score Predictor
Trains a Linear Regression model on the Kaggle Students Performance dataset to predict exam scores from parental education, lunch type, test prep, and gender.
Dataset — StudentsPerformance.csv
| Feature (Input) | Description |
|---|---|
| gender | male / female |
| race/ethnicity | group A–E |
| parental level of education | high school → master's degree |
| lunch | standard / free-reduced |
| test preparation course | completed / none |
| math score Target | Score to predict (0–100) |
Pipeline
Spam Email Classifier
Spam Email Classifier
Trains a Multinomial Naive Bayes model on 5,572 real SMS messages to classify them as spam or ham. Uses TF-IDF vectorization to convert text into numbers the model can learn from.
Dataset — spam.csv
| Column | Description |
|---|---|
| v1 (label) | ham (not spam) or spam |
| v2 (message) | Raw SMS text content |
| 5,572 messages — 4,825 ham · 747 spam | |
Pipeline
Performance Metrics
Fraction of all messages labelled correctly
Zero legitimate emails wrongly flagged as spam
~70% of spam caught — model errs on side of caution
Customer Segmentation
Customer Segmentation
Unsupervised K-Means clustering that automatically groups 200 mall customers into 5 segments based on annual income and spending score — no labels needed.
Dataset — Mall_Customers.csv
| Feature | Description |
|---|---|
| CustomerID | Unique identifier (dropped before training) |
| Gender | Male / Female → encoded 0 / 1 |
| Age | Customer age in years |
| Annual Income (k$) | Annual income in thousands of dollars |
| Spending Score (1-100) Cluster feature | Mall-assigned score based on spending behaviour |
Pipeline
Clusters Discovered
| Cluster | Count | Avg Income | Avg Spend | Segment |
|---|---|---|---|---|
| 0 | 81 | $55k | 49.5 | Average / Mixed |
| 1 | 39 | $87k | 82.1 | High Income, High Spenders ⭐ |
| 2 | 22 | $26k | 79.4 | Low Income, High Spenders |
| 3 | 35 | $88k | 17.1 | High Income, Low Spenders |
| 4 | 23 | $26k | 20.9 | Low Income, Low Spenders |
Build Timeline
Every commit in the order it was built — from first FastAPI setup to the first trained ML model.
Customer Segmentation
Unsupervised K-Means clustering on Mall_Customers.csv — Elbow Method, feature scaling, 5 cluster segments, Matplotlib + Seaborn visualizations.
Spam Email Classifier
Multinomial Naive Bayes on 5,572 SMS messages — TF-IDF vectorization, text preprocessing, 96% accuracy, 100% precision.
Student Score Predictor — Updated
Added results.txt output, LabelEncoder for categoricals, predicted vs actual table, RMSE evaluation report.
ML Score Predictor
Trained Linear Regression model on StudentsPerformance.csv — Phase 2 first project.
Code Cleanup
Refactored and tidied existing API and app files.
Expense Tracker API + DB
Full CRUD, category filter, summary & totals with PostgreSQL — Phase 1 capstone.
Student API Refactor
Extracted reusable DB connection helper + complete CRUD with PostgreSQL.
FastAPI + PostgreSQL Setup
First FastAPI student API with env-based configuration — first real API with a database.