CSCI-UA 475 Predictive Analytics

Term: Summer 2023
Instructor: Dr. Anasse Bari
Level: Undergraduate

Topics

Predictive Analytics Lifecycle; Defining Predictive Analytics and related disciplines, phases of data analytics projects, Managing a Predictive Analytics Project;

Defining Analytics Problems: Finding Similar Data Items, Data Classification, Link analysis and Mining Associations, Recommender Systems;

Data Pre-processing Algorithms: Common problems with raw data, Statistical Summaries, Data Cleaning and Preprocessing, Record Linkage and Entity Resolution;

Data Dimensionality Reduction Algorithms: Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Missing Values Ratio (MVR), Features Correlation Threshold (FVT), Features Variance Threshold (FCT), Data Fusion and Data Discretization;

Feature Selection: Feature Selection and Feature Extraction, Feature Extraction in Text Mining, Feature Selection Algorithms: Feature Subset Selection (Sequential Forward Selection (SFS)), Feature Ranking (Entropy, Information Gain, Fisher Score (F-score) and Chi-square);

Data Similarity Measures: Similarity in data, Similarity Distance Properties, Distance measures: Euclidean distance, Cosine Similarity, Jacquard Similarity, Pearson Correlation;

Data Clustering Algorithms: Partitioning Data Clustering Algorithms: K-means Algorithm, K-modes Algorithm; Hierarchal Algorithms; Large-scale Clustering Algorithms: Bradley-Fayyad-Reina (BFR) algorithm, CURE algorithm; Density-based Algorithms: DBSCAN Algorithm; Biologically Inspired Algorithms: Flock by Leader Machine Learning Algorithms, Bird Flocking Algorithms for Data Clustering;

Data Classification Algorithms: Decisions Trees Algorithm, Support Vector Machines, Naïve Bayes Classification Algorithm, Neural Networks, Linear Regression;

Mining Association Rules: Apriori Algorithm, Generating Frequent Itemset; Opinion Mining;

Recommender Systems: Collaborative Filtering, Content based Filtering, Trust based Recommendations;

Large-scale Data Analytics Frameworks: Hadoop and MapReduce, Mahout, Apache Spark, BlinkDB, KIJI Project;

Description

Predictive analytics is the art and science of extracting useful information from historical data and present data for the purpose of predicting future trends. In this course, students will have an introduction to the phases of the analytics lifecycle and gain a basic understanding of a variety of tools and machine learning algorithms to analyze data and discover forward insights. Several techniques will be introduced including: data pre-processing techniques, data reduction algorithms, data clustering algorithms, data classification algorithms, association rules data mining algorithms, recommender systems, and more.

Applications from financial markets, bio-informatics, social networks analytics, and text mining will be covered. Highlights from industrial use cases will be covered to demonstrate how Predictive Analytics relates to improving business performance and impacting better decisions. This is an introductory course that will provide students with basic skills of the new generation of data scientists that will allow them to structure, analyze and derive useful insights from data that could help make better decisions.