Course work is hands on, presenting students with increasingly complex data curation as they continue to learn concepts relevant to each particular course.
Additionally, students are continually performing exploratory data analysis and preliminary statistical modeling. Statistical modeling and machine learning are thematic throughout the program.
Finally, the program continually emphasizes the goal of the data science lifecycle, namely achieving business intelligence for the stakeholders (end consumer of the analytics). This storytelling of the data and the analytical processes is also thematic as part of our process to continually develop and refine the students’ soft skills.
An introductory course in data science and analytics. The objective of the course is to give students a broad overview of the various aspects of data analytics such as accessing, cleansing, modeling, visualizing, and interpreting data. Students will perform hands on learning of data analytic topics, using technologies such as Python, R, and open-source analytic tools. This is a 3- credit hour course.
An intermediate statistics class designed to build the mathematical foundation for students dealing with Big Data phenomena. Topics include discussions of probability, data sampling, data summarization, sampling distributions, statistical inference, statistical pattern analysis, hypothesis testing, regression, and nonparametric inference over multidimensional data collections. Students will engage in Big Data projects using various publicly available data sets and leveraging modern Data Science tools, techniques, and cyberinfrastructure. This is a 3-credit hour course.
Synopsis: This course leverages the foundations in statistics and modeling to teach applied concepts in machine learning. Participants will learn various classes of machine learning and modeling techniques, and gain an in-depth understanding how to select appropriate techniques for various data science tasks. Topics cover a spectrum from simple Bayesian modeling to more advanced algorithms such as support vector machines, decision trees/forests, and neural networks. Students learn to incorporate machine learning workflows into data-intensive analytical processes.
This course covers core concepts for heterogeneous data management, including relational databases, NoSQL databases, and other data storage systems. Advanced database topics are covered related to relational modeling, normalization, and optimization. The data management lifecycle for various data types and storage systems is investigated by participants, allowing them to learning to balance the data characteristics and the analytical needs when constructing and exploiting database solutions. Additionally, predictive modeling and machine learning topics are linked into this course to provide thematic linkages to data science. This is a 3-credit hour course.
This course provides an overview of state-of-the-art topics in Big Data Security, looking at data collection (smartphones, sensors, the Web), data storage and processing (scalable relational databases, Hadoop, Spark, etc.), extracting structured data from unstructured data, systems issues (exploiting multicore, security). Securing sensitive data, personal data and behavioral data while ensuring a respect for privacy will be a focus point in the course. This is a 3-credit hour class.
Covers the Fundamental concepts of current visualization concepts and technologies. Unlike many data visualization courses, this one focuses on principles of visualization design and the grammar of graphics. These principles are then implemented in popular contemporary visualization technologies. Students will develop an advanced knowledge of the appropriate selection, modeling, and evaluation of data visualizations. This is a 3-credit hour course.
Introduces the ethics related to Big Data in industry, business, academia, and research settings. Students will learn the social, ethical, legal and policy issues that underpin the big data phenomenon. Discussions and case studies will help guard against the repetition of known mistakes and inadequate preparation. The course content will follow the guidelines to be developed by the Council for Big Data, Ethics, and Society. This is a 1-credit hour course.
Case studies and capstone allow students to specialize in one or a couple particular domains. Interdisciplinary faculty from other MU colleges and schools help lead domain specific learning case study and capstone mentoring.
Using a case-study approach, students will engage in discussions on a variety of big data topics relevant to their emphasis area and the realm of Big Data. This course will help students generate ideas and prepare them for the Big Data Capstone. Course work will be performed in small teams, mentored by faculty and/or industry advisors. Teams will research, cultivate, curate, and leverage large data sets. Students will gain hands-on experience applying relevant data science and analytical technology and techniques to gain insight and information from these real-world data sets. This is a 3-credit hour course.
This course provides an opportunity for participants to tackle a real-world data science project, delivered as a problem-based exercise. Participants will perform the full data science lifecycle methodology on a relevant challenge problem as final learning activity that draws upon all the foundational data science concepts and technologies, as well as specialized technologies and concepts relative to a particular concentration area. This is a 3-credit hour course.
Emphasis courses represent the final stage in the further refinement of learning with domain specific data and challenges. Interdisciplinary faculty from other MU colleges and schools help lead domain specific learning through emphasis area courses.
The course provides a complete overview of the domain knowledge, technical skills and applications of genomics commonly used in industry. Students will review biological and technological foundations for genomics, and explore BLAST, HMM and other biological database search methods. Biological programming tools will be introduced to facilitate assembly and annotation in genomics. This sets students up to use the GATK library for variant analysis, explore comparative genomics, expression analysis and methyl-seq analysis in the latter part of the course.
This course provides hands on experience using several digital platforms such as Facebook Insights, Google AdWords, Google Analytics, Adobe Analytics, Clarabridge and Topsy. In this course you’ll learn digital advertising terminology and jargon, the importance of digital analytics, the role of analysts, qualities of effective analysts, the digital optimization process, web metrics and key performance indicators, as well as the essentials of collaboration and generating support and buy-in while gaining your executive’s attention.
This course is intended to review theoretical, conceptual, and analytic issues associated with network perspectives on communicating and organizing. The course applies the science of social network analysis and computational linguistics as computing strategies for making sense of electronic communications for, by and between people. The curriculum builds across a wide array of disciplines in order to take an in-depth look at theories, methods, and tools to examine the structure and dynamics of networks.
In Process of Development
An intermediate data wrangling and analysis class designed to provide students with an in-depth overview of collecting and analyzing Twitter data. Computational topics include composing, sending, and receiving Hypertext Transfer Protocol (HTTP) messages. Data wrangling topics include parsing json files, navigating recursively nested structures, and processing textual data. Analysis methods include machine learning, network analysis, topic modeling, time series, etc.
This course will provide a practical overview of key issues encountered when working with and analyzing spatial data as well as an overview of major spatial analysis approaches. Discussions and laboratory work will focus on implementation, analysis, and interpretive issues given constraining factors that commonly arise in practice.
This course provides an overview of theoretical and practical issues encountered when working with geospatial data for both the vector and raster data models with a focus on incorporating geospatial data into the data science lifecycle. Data access, indexing, retrieval, and other technical concepts are investigated. Important data storage paradigms such as enterprise geospatial databases and desktop GIS systems are explored along with scalable computational tools beyond desktop computing for Geospatial Big Data. Core issues in geospatial data storage, management, exploitation, and multi-data set entity resolution / correlation are examined.
This course provides an introduction to the principles of remote sensing of the environment leading to information extraction from remote sensing geospatial raster data sets. Examines theoretical and practical issues associated with digital imagery from spacecraft, conventional and high-altitude aerial photography, thermal imaging, and microwave remote sensing. Covers standard processing techniques, including preprocessing and normalization, pixel-level feature extraction, information extraction, and structural/object extraction.
This course builds upon previous database concepts to focus on interrogation of vast structured and unstructured data stores. Various software ecosystems are explored for data mining and information extraction from heterogeneous data sources. Learning activities focus on scripting interactions with data stores for acquisition of data, manipulation, and refinement, and retrospective data modeling. Additional learning activities focus on the modeling, management, and exploitation of streaming data. Various Big Data software ecosystems are explored. This is a 3-credit hour course.
This course introduces students to cluster and cloud computing big data ecosystems. Topics include a survey of cloud computing platforms, architectures, and use-cases. Students will examine scaling data science techniques and algorithms using a variety of cluster and cloud paradigms, such as those built atop Spark (Map-Reduce) concepts, AWS, GCP, and others.
This course will provide in-depth treatment of the evolution of high performance, parallel computing architectures and how these architectures and computational ecosystems support data science. We will cover topics such as: parallel algorithms for numerical processing, parallel data search, and other parallel computing algorithms which facilitate advanced analytics. To reinforce lecture topics, learning activities will be completed using parallel computing techniques for modern multicore and multi-node systems. Parallel algorithms will be investigated, selected, and then developed for various scientific data analytics problems. Programming projects will be completed using Python and R, leveraging various parallel and distributed computing infrastructure such as AWS Elastic Map Reduce and Google Big Query and various other parallel computing architectures. Students will research emerging parallel and scalable architectures for data analytics.
Covers the fundamental concepts of current visualization concepts and technologies, adding in Infographic and Interactive Visualization Design. Unlike many data visualization courses, this one focuses on principles of visualization design and the grammar of graphics as they can be applied to combining art and technology to tell data stories. These principles are then implemented in popular contemporary visualization technologies. Students will develop an advanced knowledge of the appropriate selection, modeling, and evaluation of data visualizations.
Focuses on animated visualization design that builds on Infographic and Interactive Visualization Design techniques. Unlike many data visualization courses, this one focuses building animations and highly interactive representations of data. These principles are then implemented in popular contemporary visualization technologies. Students will develop an advanced knowledge of the appropriate selection, modeling, and evaluation of data visualizations.
Students move through 8-week modules completing core courses and then progressing through emphasis area concentration courses directly applicable with their area of study.