Today, professional engineers need to master a specific discipline. Whether that discipline is mechanical, civil, or even industrial engineering, an understanding of computational tools and data analysis is necessary. The Berkeley MEng program provides a unique opportunity that captures both the mastery of a specific field, and computational analytics, by providing an **optional and unofficial Area of Emphasis in Computational Analytics**. Earning the professional Masters Degree and fulfilling the requirements for the emphasis makes for a challenging year. This guide contains advice on appropriate elective courses and extracurricular activities.

If you would like to indicate an *“Emphasis in Computational Analytics”* on your resume, take at least TWO of the recommended (or other advisor-approved) courses from the list below.

One of these course may be counted towards other MEng course requirements, only if it is offered in your degree-granting department and approved as accepted towards your concentration.

We also recommend being involved in at least one activity or workshop (hackathon, speaker series, etc.); see the following list for details.

## Recommended Electives

Refer to the UC Berkeley Course Schedule (course schedules) and Berkeley Bulletin (course descriptions) for further enrollment information. The courses listed here are not guaranteed to be offered or have space available, and the course schedule may change without notice.

A limited number of seats are reserved for MEng students in each of the following courses. Please contact the Fung Institute if you are having any problems enrolling.

**STAT 133: Concepts in Computing with Data**

[recommended foundation course; offered fall/spring/summer]

An introduction to computationally intensive applied statistics. Topics will include organization and use of databases, visualization and graphics, statistical learning and data mining, model validation procedures, and the presentation of results.

**STAT 201A: Introduction to Probability at an Advanced Level**

Distributions in probability and statistics, central limit theorem, Poisson processes, modes of convergence, transformations involving random variables

**STAT 201B: Introduction to Statistics at an Advanced Level**

Estimation, confidence intervals, hypothesis testing, linear models, large sample theory, categorical models, decision theory.

**STAT 230A: Linear Models **

Theory of least squares estimation, interval estimation, and tests under the general linear fixed effects model with normally distributed errors. Large sample theory for non-normal linear models. Two and higher way layouts, residual analysis. Effects of departures from the underlying assumptions. Robust alternatives to least squares.

**STAT 232: Experimental Design**

Randomization, blocking, factorial design, confounding, fractional replication, response surface methodology, optimal design. Applications.

**CE 262: Analysis of Transportation Data (Hansen) Probabilistic models in transportation. **The use of field data. Data gathering techniques, sources of errors, considerations of sample size. Experiment design for demand forecasting and transportation operations analysis. Analysis techniques.

**CE 290P (soon to be regularized as CE 263): Scalable Spatial Analytics (Pozdnukhov). **Introduction to modern methods of data analysis, spatial data handling and visualization technologies for engineers and data scientists. Theoretical coverage includes a selection of methods from spatial statistics, exploratory data analysis, spatial data mining, discriminative and generative approaches of machine learning. Projects and assignment tasks are targeted at real-world scalable implementation of systems and services based on data analytics in environmental remote sensing, transportation, energy, location-based services and the domain of “smart cities” in general.

**CE 264: Quantitative Behavioral Modeling (Walker) **Many aspects of engineering, planning, and policy involve a human element, be it consumers, businesses, governments, or other organizations. Effective design and management requires understanding this human response. This course focuses on behavioral theories and the use of quantitative methods to analyze human response. A mix of theory and practical tools are covered, with applications drawn from infrastructure investment and use, urban growth and design, health, and sustainability.

EE and CS courses are highly impacted; we can’t guarantee enrollment for students outside the EECS department. Enroll or add yourself to course waiting lists through Cal Central at the earliest enrollment opportunity.

**EE 226A, Random Processes in Systems **Probability, random variables and their convergence, random processes. Filtering of wide sense stationary processes, spectral density, Wiener and Kalman filters. Markov processes and Markov chains. Gaussian, birth and death, poison and shot noise processes. Elementary queueing analysis. Detection of signals in Gaussian and shot noise, elementary parameter estimation.

**EE 227AT, Optimization Models in Engineering** This course offers an introduction to optimization models and their applications, ranging from machine learning and statistics to decision-making and control, with emphasis on numerically tractable problems, such as linear or constrained least-squares optimization.

**EE 244, Fundamental Algorithms for Systems Modeling, Analysis, and Optimization **The modeling, analysis, and optimization of complex systems requires a range of algorithms and design software. This course reviews the fundamental techniques underlying the design methodology for complex systems, using integrated circuit design as example. Topics include design flows, discrete and continuous models and algorithms, and strategies for implementing algorithms efficiently and correctly in software. Laboratory assignments and a class project will expose students to state-of-the-art.

**CS C267, Parallel Computing **Models for parallel programming. Fundamental algorithms for linear algebra, sorting, FFT, etc. Survey of parallel machines and machine structures. Exiting parallel programming languages, vectorizing compilers, environments, libraries and toolboxes. Data partitioning techniques. Techniques for synchronization and load balancing. Detailed study and algorithm/program development of medium sized applications.

**CS 286A, Introduction to Database Systems **Access methods and file systems to facilitate data access. Hierarchical, network, relational, and object-oriented data models. Query languages for models. Embedding query languages in programming languages. Database services including protection, integrity control, and alternative views of data. High-level interfaces including application generators, browsers, and report writers. Introduction to transaction processing. Database system implementation to be done as term project.

## Other Possible Electives:

The courses listed here are not guaranteed to be offered or have space available, and the course schedule may change without notice.

Refer to the UC Berkeley Course Schedule (course schedules) and Berkeley Bulletin (course descriptions) for further enrollment information.

**STAT 157: Seminar on Topics in Probability and Statistics**

“Reproducible and Collaborative Data Science” – The course will cover philosophy, software tools, processes and best practices for reproducible computational research. The software tools will include git, IPython, SQL and LaTeX. There will be a collaborative term project. Course Format: Three hours of seminar per week. Offering Prerequisites: Mathematics 53-54, Statistics 134, 135. Knowledge of scientific computing environment (R or Matlab) often required. Prerequisites might vary with instructor and topics. Description: Substantial student participation required. The topics to be covered each semester that the course may be offered will be announced by the middle of the preceding semester; see departmental bulletins. Recent topics include: Bayesian statistics, statistics and finance, random matrix theory, high-dimensional statistics.

**STAT 204: Probability for Applications**

A treatment of ideas and techniques most commonly found in the applications of probability: Gaussian and Poisson processes, limit theorems, large deviation principles, information, Markov chains and Markov chain Monte Carlo, martingales, Brownian motion and diffusion.

**STAT 210A: Theoretical Statistics**

An introduction to mathematical statistics, covering both frequentist and Bayesian aspects of modeling, inference, and decision-making. Topics include statistical decision theory; point estimation; minimax and admissibility; Bayesian methods; exponential families; hypothesis testing; confidence intervals; small and large sample theory; and M-estimation.

**STAT 210B: Theoretical Statistics**

Introduction to modern theory of statistics; empirical processes, influence functions, M-estimation, U and V statistics and associated stochastic decompositions; non-parametric function estimation and associated minimax theory; semiparametric models; Monte Carlo methods and bootstrap methods; distribution-free and equivariant procedures; topics in machine learning. Topics covered may vary with instructor.

**STAT 215A: Statistical Models: Theory and Application**

Applied statistics with a focus on critical thinking, reasoning skills, and techniques. Hands-on-experience with solving real data problems with high-level programming languages such as R. Emphasis on examining the assumptions behind standard statistical models and methods. Exploratory data analysis (e.g., graphical data summaries, PCAs, clustering analysis). Model formulation, fitting, and validation and testing. Linear regression and generalizations (e.g., GLMs, ridge regression, lasso).

**STAT 215B: Statistical Models: Theory and Application**

Course builds on 215A in developing critical thinking skills and the techniques of advanced applied statistics. Particular topics vary with instructor. Examples of possible topics include planning and design of experiments, ANOVA and random effects models, splines, classification, spatial statistics, categorical data analysis, survival analysis, and multivariate analysis.

**STAT C239A: The Statistics of Causal Inference in the Social Science**

Approaches to causal inference using the potential outcomes framework. Covers observational studies with and without ignorable treatment assignment, randomized experiments with and without noncompliance, instrumental variables, regression discontinuity, sensitivity analysis and randomization inference. Applications are drawn from a variety of fields including political science, economics, sociology, public health and medicine. Also listed as Political Science C236A.

**STAT 240: Nonparametric and Robust Methods**

Standard nonparametric tests and confidence intervals for continuous and categorical data; nonparametric estimation of quantiles; robust estimation of location and scale parameters. Efficiency comparison with the classical procedures.

**STAT C241A (CS281A): Statistical Learning Theory**

Classification regression, clustering, dimensionality, reduction, and density estimation. Mixture models, hierarchical models, factorial models, hidden Markov, and state space models, Markov properties, and recursive algorithms for general probabilistic inference nonparametric methods including decision trees, kernal methods, neural networks, and wavelets. Ensemble methods. Also listed as Computer Science C281A.

**STAT C241B (CS281B): Advanced Topics in Learning and Decision Making**

Recent topics include: Graphical models and approximate inference algorithms. Markov chain Monte Carlo, mean field and probability propagation methods. Model selection and stochastic realization. Bayesian information theoretic and structural risk minimization approaches. Markov decision processes and partially observable Markov decision processes. Reinforcement learning. Also listed as Computer Science C281B.

**STAT 243: Introduction to Statistical Computing**

The structure and use of statistical languages and packages. Use of graphical displays in data analysis. Statistical data base management.

**STAT 244: Statistical Computing**

Algorithms in statistical computing: random number generation, generating other distributions, random sampling and permutations. Matrix computations in linear models. Non-linear optimization with applications to statistical procedures. Other topics of current interest, such as issues of efficiency, and use of graphics.

**STAT 248: Analysis of Time Series**

Frequency-based techniques of time series analysis, spectral theory, linear filters, estimation of spectra, estimation of transfer functions, design, system identification, vector-valued stationary processes, model building.

**MATH 221: Advanced Matrix Computations**

Direct solution of linear systems, including large sparse systems: error bounds, iteration methods, least square approximation, eigenvalues and eigenvectors of matrices, nonlinear equations, and minimization of functions.

**MATH 228A: Numerical Solution of Differential Equations**

Ordinary differential equations: Runge-Kutta and predictor-corrector methods; stability theory, Richardson extrapolation, stiff equations, boundary value problems. Partial differential equations: stability, accuracy and convergence, Von Neumann and CFL conditions, finite difference solutions of hyperbolic and parabolic equations. Finite differences and finite element solution of elliptic equations.

**MATH 228B: Numerical Solution of Differential Equations**

Ordinary differential equations: Runge-Kutta and predictor-corrector methods; stability theory, Richardson extrapolation, stiff equations, boundary value problems. Partial differential equations: stability, accuracy and convergence, Von Neumann and CFL conditions, finite difference solutions of hyperbolic and parabolic equations. Finite differences and finite element solution of elliptic equations.

**IEOR 242: Applications in Data Analytics**

This course applies foundational concepts in programming, databases, machine learning, and statistical modeling to answer questions from business and social science. The goal is for students to develop the experience and intuition to gather and build new datasets and answer substantive questions.

**IEOR 290R: Learning and Optimization (Topics in Risk Theory)**

Seminar on selected topics from financial and technological risk theory, such as risk modeling, attitudes towards risk and utility theory, portfolio management, gambling and speculation, insurance and other risk-sharing arrangements, stochastic models of risk generation and run off, risk reserves, Bayesian forecasting and credibility approximations, influence diagrams, decision trees. Topics will vary from year to year.

**IEOR 166: Decision Analytics**

Introductory course on the theory and applications of decision analysis. Elective course that provides a systematic evaluation of decision-making problems under uncertainty. Emphasis on the formulation, analysis, and use of decision-making techniques in engineering, operations research and systems analysis. Includes formulation of risk problems and probabilistic risk assessments. Graphical methods and computer software using event trees, decision trees, and influence diagrams that focus on model design. (Oren) 3 units

**IEOR 231: Introduction to Data Modeling, Statistics, and Systems**

This course uses industrial engineering and operations research models for analyzing and optimizing real systems where the underlying processes and/or parameters are not fully known, but data may be available, sampled, or artificially generated. Monte Carlo simulations are used to model systems that may be too complex to approximate accurately with deterministic, stationary, or static models, and to measure the robustness of predictions, and manage the risks, in decisions based on data-driven industrial engineering and operations research models. (Schruben) 3 units

**Info 247: Information Visualization and Presentation**

Information visualization is widely used in media, business, and engineering disciplines to help people analyze and understand the information at hand. The industry has grown exponentially over the last few years. As a result there are more visualization tools available, which have in turn lowered the barrier of entry for creating visualizations.

This course provides an overview of the field of Information Visualization. It follows a hands-on approach. Readings and lectures will cover basic visualization principles and tools. Labs will focus on practical introductions to tools and frameworks. We will discuss existing visualizations and critique their effectiveness in conveying information. Finally, guest speakers from the industry will give an insight into how information visualization is used in practice.

All students are expected to participate in class discussion, complete lab assignments, and create an advanced interactive data visualization as a semester project. Priority for attending this class is given to I School students. The semester project involves programming; therefore students are expected to have some coding experience. Interested students from other departments are invited to join the class if they can demonstrate the required skills.

Units: 3, Prerequisites: Info 206, Computer Science 160, or knowledge of programming and data structures with consent of instructor

Units: 2, Prerequisites: Info 213, or equivalent coursework with instructor approval

**Info 290: Data Mining and Analytics in Intelligent Business Services**

The purpose of this course is to train students in:

– Data mining and big data analytics (and a subset of topics in information retrieval, extraction, and machine learning);

– Providing an intelligent business services context (in areas and topics such as digital marketing and computational advertising, financial analytics, service analytics, energy analytics, social media and social networks).

Specifically:

– We hope to provide an overview of issues and trends which will shape the need for and structures of data mining, information extraction, and analytics in business information systems within areas and industries such as online marketing and ads, financial services, energy services, social media and networks, and service centers.

– Identify and explore key topics, followed by the development of analytic methods, for data mining, analytics, and information extraction, in these contexts

– We will have industry speakers and industry projects as well, to provide real-world perspective and real-world engagement.

Units: 3, Prerequisites: Introductory programming

**Info 290T: Working with Open Data**

Open data — data that is free for use, reuse, and redistribution — is an intellectual treasure-trove that has given rise to many unexpected and often fruitful applications. In this course, students will 1) learn how to access, visualize, clean, interpret, and share data, especially open data, using Python, Python-based libraries, and supplementary computational frameworks and 2) understand the theoretical underpinnings of open data and their connections to implementations in the physical and life sciences, government, social sciences, and journalism.

Units: 3, Prerequisites: Info 90 or equivalent background with Python.