Data Science & Analytics (DSA)

MU wins $12M contract to provide data science training to the National Geospatial-Intelligence Agency (NGA) and other DoD/IC partners.

For details, see :  MU News Release 

Defense and Intelligence (D&I) Program

Historical data archives combined with an ever increasing volume of government (both classified and unclassified), commercial, and open-source information have created an unprecedented deluge of data, ripe for information mining, exploitation, and analysis. The D&I DSA program will help train participants to apply modern, data-driven analytical methods to problem solving in their daily execution of their mission. The D&I DSA training can also greatly assist government agencies in supporting a cultural shift towards a future workforce that is highly robust and flexible, with the data science skills needed to turn present-day data challenges into information, insights, and ultimately intelligence.

The D&I data science program is a comprehensive and linked program of study comprised of eighteen for-credit, graduate-level courses. A Graduate Certificate in Data Science can be earned from the University of Missouri for participants completing a 16 credit hour course sequence with specialization in one of four concentration areas: 1) programming, 2) statistics, 3) visualization, and 4) geospatial. In addition, a Master’s of Science degree from the University of Missouri in Data Science & Analytics can be earned for participants completing a full 34 credit hour program. The specialized DSA program was designed specifically for working defense and intelligence professionals, and unlike the regular online DSA program provides for an extensive amount of on-site instruction at government and/or defense contractor facilities.

D&I Course Offerings

Introduction to Data Science

Synopsis: This course is an introduction to the D&I Data Science program, the concentration areas, and the role of each concentration area in data science. Participants will receive an introduction to software, tools, and resources to be utilized throughout the program. Participants will learn of systematic methodologies for data science projects and the data science pipeline through review of case studies.

Course Design: This is a 2 credit hour / 3 day course. All three course days are on site, instructor lead to facilitate discussions and build familiarity between instructors and participants.

 

Data Science for Managers

Synopsis: This course is an introduction to the D&I Data Science program, the concentration areas, and the role of each concentration in data science for managers and supervisors. Participants will learn of systematic methodologies for data science projects and the data science pipeline through review of case studies.

These managers and supervisors will experience and learn the value of data science for informed decision making.

Course Design: This is a 2 credit hour / 3 day course. All three course days are on site, instructor lead to facilitate discussions between instructors and participants.

 

Python Programming Boot Camp

Synopsis: This course teaches participants how to program in Python, including use of auxiliary libraries in various Python ecosystems. Participants are introduced to the iPython notebooks from the SciPy ecosystem, as well Python’s use across the spectrum of data science courses and topics. Many activities are focused on data ingestion, cleaning, manipulation, and restructuring (e.g., ETL).

Course Design: This course is a 1 credit hour / 5 day course. This course is delivered in an asynchronous online mode. The instructor virtually kicks off the course on day one, then four additional days over a two-week period are used for self-paced, online activities using the JupyterHub learning environment.

 

Database Basics and SQL Boot Camp

Synopsis: This course introduces participants to the basics of database management systems and structured query language (SQL). It covers the design and development of databases, data loading, access, manipulation, and exportation. In particular, SQL is covered in detail, with brief introductions to using SQL in Python and R.

Course Design: This course is a 1 credit hour / 5 day course. This course is delivered in an asynchronous online mode. The instructor virtually kicks off the course on day one, then four additional days over a two-week period are used for self-paced, online activities leveraging SQLite and PostgreSQL within the JupyterHub learning environment.

 

Introductory Statistics for Data Analytics

Synopsis: This course is an introductory probability and statistics course, providing baseline statistical vocabulary and understanding of incorporating probability and statistics into decision making. Participants are immersed in key statistical concepts including experimental design and data collection, basic statistics, distributions, CTL, etc. Participants also develop an understanding of foundational probability theory, including conditional probability, Bayesian techniques, predictive modeling, stochastic processes, etc. Learning activities continually integrate basic data and statistical visualization techniques and participants are introduced to basic statistical modeling.

Course Design: This is a 2 credit hour / 4 day course. This course is delivered in a blended on-site/instructor-led and asynchronous online mode. The on-site instructor kicks off the course on day one; followed by three additional days over a two-week period which are used for self-paced, online activities. Course learning activities are self-paced and online, leveraging IRKernel within JupyterHub (optionally, RStudio on site).

 

R Statistical Programming Boot Camp

Synopsis: This course teaches participants how to program, perform basic statistical modeling, and basic visualization using R and RStudio. Various key libraries and ecosystems are introduced to teach participants how R is integrated across the entire data science curriculum and lifecycle.

Course Design: This course is a 1 credit hour / 5 day course. This course is delivered in an asynchronous online mode. The off-site instructor virtually kicks off the course on day one, then four additional days over a two-week period are used for self-paced, online activities leveraging IRKernel within JupyterHub (optionally, RStudio on site).

 

Statistical and Mathematical Foundations of Data Analytics

Synopsis: This course explores the use of inferential and predictive statistics for data modeling and analytics. Single-variate and multivariate statistical concepts are discussed, as well as intermediate exposure to statistical modeling. Participants learn to evaluate model effectiveness and conduct results-driven model selection. Statistical and modeling techniques focus on high-dimensional data analytics. Topics related to dimensionality reduction are also covered, such as principal component analysis and factor analysis.

Course Design: This is a 3 credit hour / 5 day course. This course is delivered in a blended on-site/instructor-led and asynchronous online mode. The on-site instructor kicks off the course on day-one; followed by three additional days over a two-week period which are used for self-paced, online activities; and then a final on-site instructor day to recap material, lead contextual discussions, and finalize the course. Course learning activities are self-paced and online, leveraging IRKernel within JupyterHub (optionally, RStudio on site).

 

Spatial and Geostatistical Analysis

Synopsis: This course will provide a practical overview of key issues encountered when working with and analyzing spatial data as well as an overview of major spatial analysis approaches. Discussions and laboratory work will focus on implementation, analysis, and interpretive issues given constraining factors that commonly arise in practice.

Course Design: This is a 3 credit hour / 5 day course. This course is delivered in a blended on-site/instructor-led and asynchronous online mode. The instructor kicks off the course on day-one; followed by three additional days over a two-week period which are used for self-paced, online activities; and then a final on-site instructor day to recap material, lead contextual discussions, and finalize the course. Course learning activities are self-paced and online, leveraging IRKernel within JupyterHub (optionally, RStudio on site). Additional learning activities will be conducted using ArcGIS Desktop Software.

 

Applied Machine Learning

Synopsis: This course leverages the foundations in statistics and modeling to teach applied concepts in machine learning. Participants will learn various classes of machine learning and modeling techniques, and gain an in-depth understanding how to select appropriate techniques for various data science tasks.

Topics cover a spectrum from simple Bayesian modeling to more advanced algorithms such as support vector machines, decision trees/forests, and neural networks.

Course Design: This is a 3 credit hour / 5 day course. This course is delivered in a blended on-site/instructor-led and asynchronous online mode. The on-site instructor kicks off the course on day one; followed by three additional days over a two-week period which are used for self-paced, online activities; and then a final on-site instructor day to recap material, lead contextual discussions, and finalize the course. Course learning activities are instructor-guided during the two-week period, leveraging iPython/SciPy and IRKernel within JupyterHub (optionally, RStudio on site).

 

Database and Analytics

Synopsis: This course covers core concepts for heterogeneous data management, including relational databases, NoSQL databases, and other data storage systems. Advanced database topics are covered related to relational modeling, normalization, and optimization. The data management lifecycle for various data types and storage systems is investigated by participants, allowing them to learning to balance the data characteristics and the analytical needs when constructing and exploiting database solutions. Additionally, predictive modeling and machine learning topics are linked into this course to provide thematic linkages to data science.

Course Design: This is a 3 credit hour / 5 day course. This course is delivered in a blended on-site/instructor-led and asynchronous online mode. The on-site instructor kicks off the course on day one; followed by three additional days over a two-week period which are used for self-paced, online activities; and then a final on-site instructor day to recap material, lead contextual discussions, and finalize the course. Course learning activities are self-paced and online, using iPython within JupyterHub to interact with SQLite, PostgreSQL/GIS, and other online data storage solutions.

 

Information Retrieval and Data Mining

Synopsis: This course builds upon previous database concepts to focus on interrogation of vast structured and unstructured data stores. Various software ecosystems are explored for data mining and information extraction from heterogeneous data sources. Learning activities focus on scripting interactions with data stores for acquisition of data, manipulation, and refinement, and retrospective data modeling. Additional learning activities focus on the modeling, management, and exploitation of streaming data.  Various Big Data software ecosystems are explored.

Course Design: This is a 3 credit hour / 5 day course. All five course days are on-site, instructor lead. Course learning activities are instructor guided during a one-week period. Course learning activities using iPython within JupyterHub to interact with online data storage solutions such as current Big Data software ecosystems.

 

Geospatial Data Management

Synopsis: This course provides an overview of the various geospatial data formats for both vector and raster data. Data storage paradigms, including enterprise geospatial databases and desktop GIS systems are investigated. The storage, management, exploitation, and multi-data set entity resolution / correlation is covered.

Course Design: This is a 3 credit hour / 5 day course. This course is delivered in a blended on-site/instructor-led and asynchronous online mode. The on-site instructor kicks off the course on day one; followed by three additional days over a two-week period which are used for self-paced, online activities; and then a final on-site instructor day to recap material, lead contextual discussions, and finalize the course. Course learning activities are self-paced and online, using iPython within JupyterHub to interact with SQLite, PostgreSQL/GIS, and other online data storage solutions.

 

Remote Sensing Data Analytics

Synopsis: Introduction to the principles of remote sensing of the environment leading to information extraction from remote sensing geospatial raster data sets. Digital imagery from spacecraft, conventional and high-altitude aerial photography, thermal imaging, and microwave remote sensing. Covers standard processing techniques, including preprocessing and normalization and; pixel-level feature extraction, information extraction, and structural/object extraction.

Course Design: This is a 3 credit hour / 5 day course. This course is delivered in a blended on-site/instructor-led and asynchronous online mode. The on-site instructor kicks off the course on day one; followed by three additional days over a two-week period which are used for self-paced, online activities; and then a final instructor day to recap material, lead contextual discussions, and finalize course. Course learning activities are self-paced and online, using iPython within JupyterHub to interact with SQLite, PostgreSQL/GIS, and other online data storage solutions.

 

Cloud Computing for Data Analytics

Synopsis: This course introduces participants to cluster and cloud computing big data ecosystems. Topics include a survey of cloud computing platforms, architectures, and use cases. Participants will examine scaling data science techniques and algorithms using a variety of cluster and cloud paradigms, such as those built atop Hadoop (Map-Reduce) concepts, and others.

Course Design: This is a 3 credit hour / 5 day course. All five course days are on-site, instructor lead. Course learning activities are instructor guided during a one-week period. Participants conduct learning activities using various cloud and cluster technology and Big Data software ecosystems.

 

Data Visualization

Synopsis: This course provides a thorough introduction to graphical design and data visualization.

Participants will understand the cognitive ergonomics of human visual information processing, and the underlying design concepts to maximize interpretability; as well as develop exploratory data visualization skills. Participants also develop further understanding of data visualization concepts introduced in previous courses in R and Python. Learning activities focus on a breadth of visualization modalities, interfaces, and technologies.

Course Design: This is a 3 credit hour / 5 day course. This course is delivered in a blended on-site/instructor-led and asynchronous online mode. The on-site instructor kicks off the course on day one; followed by three additional days over a two-week period which are used for self-paced, online activities; and then a final on-site instructor day to recap material, lead contextual discussions, and finalize the course. Course learning activities are self-paced and online, utilizing iPython and IRKernel within JupyterHub. Participants will complete additional activities using alternative data visualization APIs.

 

Advanced Visualization & Communication I

Synopsis: This course continues the development of participants data visualization and digital storytelling skills. Discussion topics address alignment of visualization, level-of-detail, and technology to the intended audience. Alternative visualization media and delivery methods are explored, as well as integration of visualizations into complex analytical reports.  Infographics, images, and map creation are incorporated into the learning activities.

Course Design: This is a 3 credit hour / 5 day course.  All five course days are on-site instructor lead. Course learning activities are instructor guided, utilizing iPython and IRKernel within JupyterHub. Participants will complete additional activities using alternative data visualization APIs, media, and delivery methods.

 

Advanced Visualization & Communication II

Synopsis: This course completes the visualization concentration sequence, building on the topics of the previous two courses. The data visualizations and communication concepts in this course tackle the complex domains of animations, interactive visualizations and models, 3D modeling, and the challenges of 3D rendering within 2D display technologies.

Course Design: This is a 3 credit hour / 5 day course.  All five course days are on-site instructor lead. Course learning activities are instructor guided using advanced data visualization tools and techniques.

 

Data Science Capstone

Synopsis: This course provides an opportunity for participants to tackle a real-world/mission data science project, delivered as a problem-based exercise. Participants will perform the full data science lifecycle methodology on a relevant challenge problem as final learning activity that draws upon all the foundational data science concepts and technologies, as well as specialized technologies and concepts relative to a particular concentration (visualization, statistics, or programming).

Course Design: This is a 3 credit hour / 5 day course.  All five course days are on-site instructor guided.

 

For more information contact:

  • Dr. Grant Scott, Director, Data Science and Analytics Masters Program
  • GrantScott@missouri.edu