Courses
Data Analytics 1
Provides an overview of quantitative methods essential for analyzing data, with an emphasis on business and industry applications. Topics include identification of appropriate metrics and measurement methods, descriptive and inferential statistics, experimental design, parametric and non-parametric tests, simulation, and linear and logistic regression, categorical data analysis, and select unsupervised learning techniques. Standard and open source statistical packages are used to apply techniques to real-world problems.
- Learning Objectives
- Understand and apply experimental design and sampling methodologies.
- Understand and apply appropriate parametric and non-parametric tests.
- Develop and articulate results from linear regression models.
- Apply categorical data analysis methods.
- Apply statistical software tools to perform data analysis projects.
- Apply concepts learned in course to real world case studies.
- Topics
- Review of fundamentals of data analysis
- Review of probability
- Parameter estimates
- Testing hypotheses and goodness of fit
- ANOVA
- Analysis of categorical data
- Linear and multiple regression
- Logistic regression
Data Analytics 2
Course extends the concepts developed in ADTA 5130 Data Analytics 1 to multivariate and unstructured data analysis. Modern techniques of multivariate analysis, including association rules, classification methods, time series, text analysis and machine learning methods are explored and implemented with real-world business and industry data. Course provides hands-on introduction to state-of-practice technology and tools. The focus of the course is on the application and interpretation of the methods discussed.
- Learning Objectives
- Understand and be able to apply a variety of multivariate data analysis techniques.
- Apply advanced time series methods in context.
- Understand and be able to apply techniques for analysis of text and unstructured data.
- Perform various non-supervised methods (e.g., clustering techniques) to data as appropriate and explain results
- Apply concepts learned in course to real world case studies.
- Topics
- Survey of multivariate analysis methods – supervised and unsupervised learning techniques
- Overview of classification methods
- Discriminant analysis
- Clustering/segmentation methods
- Decision trees
- Naïve Bayes
- Time series analysis
- Text Analysis
- Team projects and presentations
Harvesting, Storing, and Retrieving Big Data
Introduction to the fundamentals of data engineering, including collecting, wrangling, storing, retrieving, and processing data. Data wrangling methodologies are introduced for cleaning and merging datasets, storing data for later analysis and constructing derived datasets. Various storage and process architectures are introduced with a focus on how approaches depend on applications, data velocity and end users. Emphasizes applications and includes many hands-on projects.
- Learning Objectives
- Develop an understanding of the fundamental concepts of modern data management, including data science life cycle, data scaling, structuring data, and data lakes
- Develop knowledge and skills in storing, retrieving, and processing data with the Apache Hadoop framework using the cloud technology
- Develop knowledge and skills in working with the Apache Hadoop framework including Hadoop Distributed File System (HDFS), MapReduce, and Hive
- Develop knowledge and skills in working with HDFS and Spark/pySpark
- Develop knowledge and skills in cleansing/wrangling data with Google/Open Refine
- Develop knowledge and skills in collecting data using streaming technologies
- Introduce students to real-time big data using Spark Streaming
- Topics
- Apache Hadoop framework and the cloud technology
Storing & Retrieving data with Apache Hadoop HDFS, MapReduce, and Hive
- Storing & Retrieving data with Apache Hadoop HDFS and Spark
- Data lakes: A storage of choice for the modern data management
- Data cleansing and wrangling with Google/Open Refine
- Data collection with the streaming technologies
- Introduction to real-time big data with Spark Streaming
- Introduction to the Python API for Spark: pySpark
Discovery and Learning with Big Data
Introduction to the fundamentals of data analytics and machine learning with big data. Provides theoretical knowledge and practical experience leading to mastery of big data analytics and machine learning, using both small and large datasets. Exemplary technologies will be employed to illustrate how machine learning can be applied to obtain real-world solutions. Exercises and examples are explored that address both simple and complex data structures, as well as data ranges from clean and structured to dirty and unstructured.
- Learning Objectives
- Develop an understanding of the fundamental concepts of big data and machine learning
- Develop knowledge and skills in data analytics with the Apache Hadoop ecosystem and using cloud technology
- Develop knowledge and skills in data analytics life cycle
- Develop knowledge and skills in EDA and data preprocessing
- Develop knowledge and skills in supervised learning - both linear and non-linear
- Develop knowledge and skills in unsupervised learning
- Develop knowledge and skills in evaluating machine learning algorithms
- Develop knowledge and skills in programming for data science with Python
- Develop knowledge and skills in using various software tools in machine learning
- Topics
- Data analytics life cycle
- Data preprocessing
- Exploratory Data Analysis (EDA)
- Overview of big data and machine learning
- Supervised Linear Algorithms.
- Supervised Non-Linear Algorithms.
- Unsupervised Algorithms
- Evaluating Algorithms
- Big data analytics and machine learning with NumPy, Pandas, Scikit-Learn in Python
- Apache Hadoop ecosystem and cloud technology
Large Data Visualization
Strategies and methods for effective visualization and communication of data analyses, especially from large data sets. Standard and open source data visualization software packages are used to develop presentations that convey findings, answer business questions, drive decisions, and provide persuasive evidence supported by data. The course is targeted towards students interested in using visualization to understand data better and improve their analytics work.
- Learning Objectives
- Provide an overview and brief history of the practice of data visualization
- Introduce key and advanced design principles and techniques for visualizing data
- Develop an understanding of the fundamentals of communication and concepts required for effective data presentation
- Develop competency in the use of software data visualization tools, such as Tableau and Python’s libraries
- Apply principles to business-related projects to identify, understand, analyze, prepare, and present effective visualizations
- Topics
- Introduction to Data Visualization
- Data and Image Models & Properties of Images
- Data Visualization & Exploratory Data Analysis (EDA)
- Multivariate Data Visualization
- Narrative Data Visualization
- Data Visualization: Graphs
- Data Visualization: Colors
- Data Visualization: Design & Best Practices
- Data Visualization Software Applications: Tableau, Python Pandas, Matplotlib