Data Mining and Information Retrieval
Synopsis: This course builds upon previous database concepts to focus on interrogation of vast structured and unstructured data stores. Various software ecosystems are explored for data mining and information extraction from heterogeneous data sources. Learning activities focus on scripting interactions with data stores for acquisition of data, manipulation, and refinement, and retrospective data modeling. Additional learning activities focus on the modeling, management, and exploitation of streaming data. Various Big Data software ecosystems are explored.
3 – Credit Hour
Cloud Computing for Data Analytics
Synopsis: This course introduces students to cluster and cloud computing big data ecosystems. Topics include a survey of cloud computing platforms, architectures, and use- cases. Students will examine scaling data science techniques and algorithms using a variety of cluster and cloud paradigms, such as those built atop Hadoop (Map- Reduce) concepts, and cloud services, and others.
3 – Credit Hour
Parallel Computing for Data Science
Synopsis: This course will provide in-depth treatment of the evolution of high performance, parallel computing architectures and how these architectures and computational ecosystems support data science. We will cover topics such as: parallel algorithms for numerical processing, parallel data search, and other parallel computing algorithms which facilitate advanced analytics. To reinforce lecture topics, learning activities will be completed using parallel computing techniques for modern multicore and multi-node systems. Parallel algorithms will be investigated, selected, and then developed for various scientific data analytics problems. Programming projects will be completed using Python and R, leveraging various parallel and distributed computing infrastructure such as AWS Elastic Map Reduce and Google Big Query and various other parallel computing architectures. Students will research emerging parallel and scalable architectures for data analytics.
3 – Credit Hour
Graduates of the Master of Science in Data Science and Analytics who pursue the High-Performance Computing (HPC) Emphasis Area will achieve the following educational objectives, in addition to the core program objectives while becoming immersed in HPC concepts:
- Students will have an in depth understanding of the state-of-the-art technologies which enable big data analytics and high performance computing; such that they can successfully investigate the data and analytical needs, then guide the decision making process on deployments into HPC infrastructure.
- Students will acquire knowledge to exploit cloud-based computing infrastructure, including virtualization, distributed architectures, on-demand resource scaling, container technology, and other cloud-based computing concepts in support of Big Data management, processing, and analytics.
- Students will have a thorough understanding of advanced technologies and techniques in Big Data analytics which facilitate the extraction of new data intelligence using state-of-the-art, leading analytical platforms.
- Students will gain a solid understanding of techniques for exploiting advanced co-processing hardware, including graphics processing units (GPU) and many-core units (e.g., Intel Phi) to achieve cost effective, massively parallel data analytics.
High-Performance Computing Faculty
234 Express Scripts Hall-St Louis campus
Assistant Research Professor; Co-Director for Industry Outreach, Course Coordinator
W3038 Lafferre Hall
High Performance Computing Course Coordinator
207 Naka Hall
Sample Course Path
Students move through 8 week modules completeting core courses and then progressing through emphasis area concentration courses directly applicable with their area of study.