Undergraduate Course Descriptions
Data Analytics and Computational Statistics 1
Provides an overview of quantitative methods essential for analyzing data, with an emphasis on business applications. Topics include identification of appropriate metrics and measurement methods, descriptive and inferential statistics, experimental design, parametric and non-parametric tests, simulation, and linear and logistic regression, categorical data analysis, and select unsupervised learning techniques. Standard and open source statistical packages are used to apply techniques to real-world problems.
- Learning Objectives
- Understand and apply experimental design and sampling methodologies.
- Understand and apply appropriate parametric and non-parametric tests.
- Develop and articulate results from linear regression models.
- Apply categorical data analysis methods.
- Apply statistical software tools to perform data analysis projects.
- Apply concepts learned in course to real world case studies.
- Topics
- Review of fundamentals of data analysis
- Review of probability
- Parameter estimates
- Testing hypotheses and goodness of fit
- ANOVA
- Analysis of categorical data
- Linear and multiple regression
- Logistic regression
Data Analytics Computational Statistics 2
Contemporary techniques of multivariate analysis, including association rules, classification methods, time series, text analysis and machine learning methods with an emphasis on applications in business and industry. Introduction to state-of-practice computational statistical and data analysis methods and tools.
- Learning Objectives
- Understand and be able to apply a variety of multivariate data analysis techniques.
- Apply advanced time series methods in context.
- Understand and be able to apply techniques for analysis of text and unstructured data.
- Perform various clustering techniques to data as appropriate and explain results
- Apply concepts learned in course to real world case studies.
- Topics
- Survey of multivariate analysis methods – supervised and unsupervised learning techniques
- Overview of classification methods
- Discriminant analysis
- Clustering/segmentation methods
- Decision trees
- Time series analysis
- Text Analysis
- Team projects and presentations
Principles of Data Structures, Harvesting and Wrangling
Introduction to collecting, wrangling, storing, managing, retrieving and processing datasets. Topics include fundamental concepts and techniques of data engineering, large-scale data harvesting, data wrangling methodologies, and storage and process architectures. Emphasizes applications and includes many hands-on projects
- Learning Objectives
- Develop an understanding of the fundamental concepts of the modern data management, including data science life cycle, data scaling, structuring data, and data lakes
- Develop knowledge and skills in storing, retrieving, and processing data with the Apache Hadoop framework using the cloud technology
- Develop knowledge and skills in working with the Apache Hadoop framework including Hadoop Distributed File System (HDFS), MapReduce, and Hive
- Develop knowledge and skills in working with HDFS and Spark/pySpark
- Develop knowledge and skills in cleansing/wrangling data with Google/Open Refine
- Develop knowledge and skills in collecting data using streaming technologies
- Introduce students to real-time big data using Spark Streaming
- Topics
- Apache Hadoop framework and the cloud technology
- Storing & Retrieving data with Apache Hadoop HDFS, MapReduce, and Hive
- Storing & Retrieving data with Apache Hadoop HDFS and Spark
- Data lakes: A storage of choice for the modern data management
- Data cleansing and wrangling with Google/Open Refine
- Data collection with the streaming technologies
- Introduction to real-time big data with Spark Streaming
- Introduction to the Python API for Spark: pySpark
Methods for Discovery and Learning from Data
Introduction to contemporary methods for discovery and learning from data sets. Emphasizes applications of predictive and pattern recognition techniques in deriving insights and making decisions in business contexts. Topics complemented by hands-on projects using data discovery and statistical learning software.
- Learning Objectives
- Develop an understanding of the fundamental concepts of big data and machine learning
- Develop knowledge and skills in data analytics with the Apache Hadoop ecosystem and using cloud technology
- Develop knowledge and skills in data analytics life cycle
- Develop knowledge and skills in EDA and data preprocessing
- Develop knowledge and skills in supervised learning - both linear and non-linear
- Develop knowledge and skills in unsupervised learning
- Develop knowledge and skills in evaluating machine learning algorithms
- Develop knowledge and skills in programming for data science with Python
- Develop knowledge and skills in using various software tools in machine learning
- Topics
- Data analytics life cycle
- Data preprocessing
- Exploratory Data Analysis (EDA)
- Overview of big data and machine learning
- Supervised Linear Algorithms.
- Supervised Non-Linear Algorithms.
- Unsupervised Algorithms
- Evaluating Algorithms
- Big data analytics and machine learning with NumPy, Pandas, Scikit-Learn in Python
- Apache Hadoop ecosystem and cloud technology
Principles of Data Visualization for Large Data
Principles and methods for effective visualization and communication for large data sets. Standard and open source data visualization packages are used to develop presentations that convey findings, answer science and industry questions, drive decisions, and provide persuasive evidence supported by data.
- Learning Objectives
- Provide an overview and brief history of the practice of data visualization
- Introduce key design principles and techniques for visualizing data
- Develop an understanding of the fundamentals of communication and concepts required for effective data presentation
- Develop competency in the use of several available software data visualization tools, such as Tableau and Python’s libraries
- Apply principles to business-related projects to identify, understand, analyze, prepare, and present effective visualizations
- Topics
- Introduction to Data Visualization
- Data and Image Models & Properties of Images
- Data Visualization & Exploratory Data Analysis (EDA)
- Multivariate Data Visualization
- Narrative Data Visualization
- Data Visualization: Graphs
- Data Visualization: Colors
- Data Visualization: Design & Best Practices
- Data Visualization Software Applications: Tableau, Python Pandas, Matplotlib