Course Description: This course provides an introduction to Data Science using Python, covering key concepts, techniques, and tools for data analysis and visualization. Students will gain practical skills in data manipulation, statistical analysis, machine learning, and data visualization.
Introduction to Data Science and Python
- Overview of Data Science and its applications
- Setting up the Python environment for data analysis
- Introduction to Python libraries for data science (NumPy, Pandas)
Data Wrangling with Pandas
- Data manipulation and cleaning with Pandas
- Data loading, handling missing values, and reshaping dataframes
- Basic data exploration and descriptive statistics
Data Visualization with Matplotlib and Seaborn
- Creating static and interactive visualizations
- Plotting various types of charts (scatter plots, bar plots, histograms)
- Customizing and styling visualizations
Statistical Analysis and Probability
- Descriptive statistics (mean, median, variance, etc.)
- Probability distributions (normal, binomial, etc.)
- Hypothesis testing and confidence intervals
Introduction to Machine Learning with Scikit-Learn
- Overview of machine learning concepts and applications
- Supervised learning vs. unsupervised learning
- Building and evaluating a simple regression model
Supervised Learning - Classification
- Logistic regression and support vector machines
- Decision trees and ensemble methods (Random Forest, Gradient Boosting)
- Model evaluation metrics (accuracy, precision, recall)
- Unsupervised Learning - Clustering
- K-Means clustering and hierarchical clustering
- DBSCAN and other density-based clustering methods
- Evaluating clustering performance
Feature Engineering and Selection
- Handling categorical variables and encoding schemes
- Feature scaling and normalization
- Selecting relevant features for modeling
Dimensionality Reduction and PCA
- Principal Component Analysis (PCA) for dimensionality reduction
- Understanding eigenvalues, eigenvectors, and variance explained
- Visualizing high-dimensional data in reduced space
Introduction to Natural Language Processing (NLP)
- Processing text data and tokenization
- Building a basic sentiment analysis model
- NLP applications and techniques
Time Series Analysis and Forecasting
- Working with time series data
- Seasonal decomposition, trend analysis, and stationarity
- Building and evaluating time series forecasting models
Final Projects and Capstone
- Applying Data Science concepts to real-world datasets
- Conducting a complete Data Science project
- Final project presentation and evaluation
Assessment:
- Class participation and engagement
- Homework assignments and coding exercises
- Mid-term and final projects
- Final project presentation and evaluation