# Reading Seminars

## Bartolomeo Stellato

### Princeton University

### Seminar title: Learning for Real-Time Decision Making

Abstract: In this fast-paced world, intelligent systems need to make reliable, real-time decisions in response to changing conditions and unexpected disruptions. Mathematical optimization is instrumental in this process, but it must offer reliable computations and robust solutions, especially in the presence of uncertainty. In this talk, we will explore how we can harness data to develop optimization tools that can handle uncertainty and operate in real-time. We will investigate learning techniques such as clustering and differentiable optimization to design tractable, robust problem formulations. We will also discuss how to use machine learning to design efficient optimization algorithms for continuous and mixed-integer optimization, with a focus on guaranteeing convergence in real-time operations. By customizing problem formulations and solution algorithms to specific tasks, we will show how we can significantly enhance the computational performance and reliability of decision-making pipelines.

## PAST SEMINARS

## Federico Bassetti

### Politecnico di Milano

### Seminar title: Feature learning in Bayesian deep linear networks with multiple outputs and convolutional layers

Abstract: Deep linear networks have been extensively studied in recent literature. However, there is still limited understanding regarding finite-width architectures with multiple outputs and convolutional layers, which are fundamental aspects in deep learning. In this seminar, we start by providing a concise overview of notable findings and common challenges in this area, within a Bayesian framework. We then present few new insights, including:

(i) An exact, straightforward, and non-asymptotic integral representation for the joint prior distribution across outputs.

(ii) An analytical formula for the posterior distribution, specifically addressing the mean square loss function.

(iii) A quantitative analysis of feature learning within the infinite-width regime, using large deviation language.

From a physical perspective, architectures featuring multiple outputs or convolutional layers can be seen as different manifestations of shape kernel renormalization. Our work establishes a reliable connection, translating physical insights and terminology into rigorous mathematical statements.

The talk is based on a joint working paper with P. Rotondo (Dip. di Scienze Matematiche, Fisiche e Informatiche, Univ. Parma), M. Pastore (École normale supérieure, CNRS), and M. Gherardi (Dip. Fisica, Univ. Milano).

## Bernardo Forni

### University of Pavia

### Seminar title: Deep-Learning with Transformers: a Mathematical Perspective

Abstract: Deep Learning has seen remarkable advancements in recent years, with Transformers emerging as a cornerstone technology for a wide range of applications, yet understanding their inner workings remains challenging. In this presentation, we provide a comprehensive examination of the Transformer architecture and its components, such as the attention mechanism, position encoding and layer normalization. Additionally, we present a geometric viewpoint on its semantic representation of words, allowing for an intuitive understanding of Transformers as processes that model the trajectory of words traveling around the surface of a hyper-sphere.

## SARA CAMBIAGHI

### University of Pavia

### Seminar title: Surgery Scheduling: A Two-Stage Robust Optimization Approach

Abstract: In the realm of healthcare management, the operating theater stands out as a notably high-cost facility, often accounting for a substantial portion, up to 40% in some cases, of the overall expenditure within healthcare institutions. Strategic planning regarding the utilization of operating rooms necessitates comprehensive consideration of several critical factors. These factors include determining the requisite number of ORs to be operational at any given time, assigning specific surgeries to particular ORs, and coordinating the scheduling of surgical procedures. Furthermore, the intricacies of surgeon collaboration represent a pivotal aspect of OR operations. Given the multifaceted nature of surgical interventions, the involvement of multiple surgeons is commonplace. Within this framework, it becomes imperative to establish hierarchical roles among surgical team members, comprising a primary surgeon and a secondary surgeon. Therefore, it is important to optimize the scheduling of operating rooms, secondary surgeons, and start time of surgeries, to minimize human resource expenses, OR setup costs, waiting times, and overtime expenses, all while considering the inherent variability in surgical procedure durations and surgeon coordination. In order to solve the problem we introduce a deterministic model to compute the optimal surgery schedule, followed by a robust two-stage mixed-integer model designed to account for the inherent variability in surgery durations.

## GAbor riccardi

### University of Pavia

### Seminar title: Projective Metrics for Coding Theory

Abstract: Coding Theory is the branch of mathematics and computer science that aims to find efficient ways to transmit, store, compress, and encrypt information. This involves sending messages so that, even if a message is received with errors, the original message can be accurately recovered. When a damaged message is received, we search for the "closest" valid message, although the definition of "closeness" varies depending on the transmission system. If errors can change each bit of the message independently and with equal probability, then the minimum number of errors needed to convert one string into another is defined by the Hamming distance between the two strings. This distance is the number of bits that need to be changed to transform one string into the other. However, different transmission methods may introduce other kinds of errors, necessitating different metrics to find the "closest" admissible message. These metrics include, among many others, rank metrics, cover metrics, and phase rotation metrics. In the literature, the theory behind each metric is studied individually, often seeking to demonstrate results analogous to those in Classical Coding Theory. Projective metrics, however, take a different approach by attempting to develop a theory that directly encompasses a class of metrics, including many of the most studied metrics and many others. In this presentation, we provide a brief introduction to Classical Coding Theory and then discuss our work on Projective Metrics Theory, where we have proved a general version of the well-known Singleton Bound and developed a basic theory for projective metrics. This framework also reveals interesting connections to other fields of mathematics such as Graph Theory, Matroid Theory, Projective Geometry, and Subspace Arrangements.

## Enis ChenchenE

### Graz University

### Seminar title: Splitting Methods in Convex Degenerate Optimization with Applications

Abstract: We explore proximal point and forward-backward schemes with positive semidefinite preconditioners, establishing weak convergence results and reduced storage requirements. The proposed framework leads us to new extensions of the Douglas-Rachford Splitting method, and to a new hybrid method that combines the advantages of proximal- and conditional-gradient-type schemes. Applications include optimal transport and regularization parameter learning for total variation denoising.

## Juan Nicolás Ibáñez

### European Commission’s Joint Research Centre (EC-JRC)

### Seminar title: Public and private transport accessibility analyses with a socioeconomic perspective

## DOMENICO SALVAGNIN

### Università degli Studi di Padova

### Seminar title: Computational research 101

Abstract: Computational research is a key ingredient in many fields. However, researchers often lack formal training in experimental design and, in particular, in software engineering. In this talk we will review basic principles and best practices, covering experimental setup, data analysis and writing maintainable code.

## AXEL PARMENTIER

### École des Ponts ParisTech

### Seminar title: Learning with combinatorial optimization layers and applications to dynamic vehicle routing

Abstract: Combinatorial optimization (CO) layers in machine learning (ML) pipelines are a powerful tool to tackle data-driven decision tasks, but they come with two main challenges. First, the solution of a CO problem often behaves as a piecewise constant function of its objective parameters. Given that ML pipelines are typically trained using stochastic gradient descent, the absence of slope information is very detrimental. Second, standard ML losses do not work well in combinatorial settings. A growing body of research addresses these challenges through diverse methods. Unfortunately, the lack of well-maintained implementations slows down the adoption of CO layers. Building upon previous works, we introduce a probabilistic perspective on CO layers, which lends itself naturally to approximate differentiation and the construction of structured losses. We recover many approaches from the literature as special cases, and we also derive new ones. Based on this unifying perspective, we present InferOpt.jl, an open-source Julia package that 1) allows turning any CO oracle with a linear objective into a differentiable layer, and 2) defines adequate losses to train pipelines containing such layers. Our library works with arbitrary optimization algorithms, and it is fully compatible with Julia's ML ecosystem. In the second part of the talk, we focus on the dynamic vehicle routing problem of the 2022 EURO-NeurIPS challenge (1). Using a CO layer in a deep learning pipeline enabled to win the challenge. We focus on the structure of the pipeline used as a policy, and on the algorithm used to train it, which are natural applications of the probabilistic perspective introduced during the first part of the talk.

(1) https://euro-neurips-vrp-2022.challenges.ortec.com/

[Slides]

## LORENZO BONASERA

### Università degli Studi di Pavia

### Seminar title: A genetic approach to Event-Interval Sequence Interpretable Classification

Abstract: Sequences of event-intervals appear in several application domains including sign language, sensor networks, medicine, human motion databases, and linguistics. Such sequences comprise events that occur at time intervals and are time stamped at their start and end time. In this talk, we propose and study a genetic approach to perform supervised classification of event-interval sequences. Our method combines machine learning techniques with evolutionary dynamics to provide interpretable outcomes.

## Martina Fischetti and Juan Nicolás Ibáñez

### European Commission’s Joint Research Centre (EC-JRC)

### Seminar title: Routing over large (public and private) transport networks across Europe: challenges and opportunities

Abstract: A well-functioning transport system is not only critical to European businesses and global supply chains, and as a rather important economic activity, contributing in the European Union (EU) to around 5% of GDP and more than 10 million jobs, but also critical to guarantee that European citizens have access to essential services such as education, healthcare, or employment. This compounds with the fact that the transport system is relatively hard to decarbonise, and a sector that brings many external costs to the society (GHG and pollutant emissions, noise, road crashes and congestion). Against this context, the transport sector plays a key role in one of the main EU policy priorities under the European Green Deal, the achievement of climate neutrality by 2050. The EU mandate includes the monitoring across all Member States of the deployment and the evaluation of the performance of both public and private transport infrastructure. This is used to inform relevant policy initiatives on issues such as guaranteeing a level playing field for competition across transport modes or identifying where funding is most needed to ensure access to an affordable and sustainable transport system. In this talk we will present details of different lines of work in the European Commission’s Joint Research Centre (EC-JRC) to support the abovementioned mandate, providing an overview of some of the transport-related modelling challenges faced and some of possible solution methods proposed. More elaboration will be provided on methodologies developed by the EC-JRC that focus on employing comprehensive (EU-wide) data to compute performance and access-to-opportunities metrics associated with different modes transport, underlining the network (graph building and routing) challenges addressed to compute these metrics with a sufficiently high level of spatial detail for the scale of interest (EU-wide). The discussion will refer to methodologies that exploit routing over large transport networks, for both private (road) and public (railways, buses, etc.) means of transport. On the latter we will present results from an ongoing collaboration with the Department of Mathematics of the University of Pavia on an implementation of an innovative method for fast computation of all-to-all routing over large schedule-based network graphs.

[Slides]

## Yuri Faenza

### Columbia University

### Seminar title: Incremental knapsack problems

### Date: November 29th, 2022 - 15:00, Aula Beltrami

Abstract: In this talk, we propose and study discrete, multi-period extensions of classical packing problems, a fundamental class of models in combinatorial optimization. Those extensions fall under the general name of incremental packing problems. We will mostly focus on incremental versions of the classical knapsack. In such models, we are given an added time component and weakly increasing capacities for each time, as to mimic the increase in available resources. Items can be inserted at any time, but once an item is inserted, it cannot be removed in future times. The goal is to maximize some item-dependent, and possibly also time-dependent, objective function under such constraints. We will present algorithms that perform well in theory and/or in practice for incremental knapsack problems and their extensions.

Based on joint work with Danny Segev (Tel Aviv) and Lingyi Zhang (Columbia-> Uber)

Short Bio: Yuri Faenza is an associate professor in the IEOR department at Columbia University. He works in discrete optimization, operations research, matching theory, and their applications. His research has been funded by the NSF (including an NSF Career award), the ONR, the Swiss NSF, and by a Meta Research Award.

## federico battista

### Università degli Studi di Verona

### Seminar title: Semidefinite Lift-and-Project relaxation for the Stable Set Problem

### Date: November 16th, 2022 -14:30, Aula Beltrami

Abstract: The stable set problem is one among the fundamental problems in combinatorial optimization and it is well-known to be NP-hard, hence no polynomial time algorithm to solve it exactly is expected to exist unless P=NP. Thus, the study of strong relaxations for this problem is a well-researched topic. In particular, the Lovász theta function θ(G) provides a good upper bound on the stability number of any graph G. It can be computed in polynomial time by solving a semidefinite program (SDP), which in addition turns often out to be fairly tractable in practice. As a consequence, θ(G) achieves a hard-to-beat trade-off between computational effort and strength of the bound. Hierarchies of relaxations to strengthen θ(G) towards the stability number have been documented, but in general such improvements come at a heavy computational burden with off-the-shelf SDP algorithms and require highly specialized methods to be addressed.

In the last decades, Lift-and-Project methods have gained a lot of attention, being able to generate strong relaxations for combinatorial optimization problems. In particular, starting from any linear relaxation Lovász and Schrijver’s Lift-and-Project framework generates a semidefinite relaxation. Its application to the fractional stable set polytope showed its potential, producing bounds stronger than θ(G) but in general they come at a nontrivial computational cost.

In this talk we introduce a new semidefinite relaxation for the stable set problem obtained by the lifting of a more compact linear formulation than the classical one. Then, we characterize some classes of valid inequalities for the stable set polytope which are implied by our proposal. We then discuss how to face the computational burden arising from these semidefinite programs by the employment of a general purpose solver for SDPs.

Alternating Direction Methods of Multipliers (ADMMs) represent the most popular alternative to interior-point methods for solving SDPs. As first-order methods, ADMMs are suited to scale to much largers SDPs, at some cost in accuracy that should be correctly addressed when bounding the optimal solution of some combinatorial optimization problem. Here we discuss different methodologies to compute a valid bound on the optimal value of the SDP starting from a medium accuracy solution and we discuss the employments of these methodologies within ADMMs.

## ETTORE LANZARONE

### Università degli Studi di Bergamo

### Seminar title: Scheduling appointments for blood donation: from donations at the collection center to home blood donation

### Date: November 9th, 2022 -11:30, Aula Beltrami

Abstract: Blood plays a crucial role in all health care systems, it is a limited resource and its shelf life is short. In Western countries, blood is collected from donors, and the so-called Blood Donation Supply Chain (BDSC) aims to provide an adequate supply of blood units to transfusion centers and hospitals.

In this talk, I focus on the collection of blood units from donors, which is the first BDSC echelon, and I consider two opposing donation settings: the classical blood donation at the collection center, and a new organizational model in which blood is collected at donor's homes to respond to the emerging need for delocalization of health services. For each of them, I tackle the Blood Donation Appointment Scheduling (BDAS) problem, providing a decision support tool that balances the production of the different blood types between days in order to provide a fairly constant supply of blood units to the rest of the BDSC.

In the case of donations at the collection center, the tool consists of an offline Mixed Integer Linear Programming (MILP) model that preallocates time slots to blood types, and an online prioritization policy that assigns a preallocated slot when the donor calls to make the reservation. In the case of home blood donation, the tool consists of a matheuristic framework with three decision stages, in which the first two stages extend those of the other setting and the third one consists of a Multi-Trip Vehicle Routing Problem with Time Windows to route the bloodmobiles that collect blood at donors' homes.

Both tools have been tested on data coming from a real Italian provider, i.e., the Milan department of the Associazione Volontari Italiani Sangue (AVIS). The numerical experiments have verified the well-performing behavior of both frameworks. In particular, the tests conducted on realistic instances generated based on the AVIS Milan case have confirmed the capability of the tools to balance the production of each blood type over the days and, for home blood donation, to create cost-effective vehicle schedules.

Short Bio: Ettore Lanzarone is a tenure-track assistant professor at the Department of Management, Information and Production Engineering of the University of Bergamo. His research focuses on optimization, and stochastic model applied to bioengineering problems, operations research applications in health care, and stochastic differential equations.

[Slides]

## austin buchanan

### Oklahoma State University

### Seminar title: Imposing contiguity constraints in political districting models

### Date: June 21st, 2022

Abstract: Beginning in the 1960s, techniques from operations research began to be used to generate political districting plans. A classic example is the integer programming model of Hess et al. (Operations Research 13, 998-1006, 1965). Due to the model’s compactness-seeking objective, it tends to generate contiguous or nearly contiguous districts, although none of the model’s constraints explicitly impose contiguity. Consequently, Hess et al. had to manually adjust their solutions to make them contiguous. Since then, there have been several attempts to adjust the Hess model and other models so that contiguity is explicitly ensured. In this talk, we review two existing models for imposing contiguity, propose two new ones, and analytically compare them in terms of their strength and size. We conduct an extensive set of numerical experiments to evaluate their performance. While many believe that contiguity constraints are particularly difficult to deal with, we find that the districting problem considered by Hess et al. does not become harder when contiguity is imposed. In fact, a branch-and-cut implementation of a cut-based model generates, for the first time, optimally compact districting plans for 21 different US states at the census tract level. To encourage future research in this area, and for purposes of transparency, we make our test instances and source code publicly available. This is joint work with Hamidreza Validi and Eugene Lykhovyd.

## Giorgio Corani

### Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA)

### Seminar title: A Bayesian approach to the problem of forecast reconciliation

### Date: May 24th, 2022

Abstract: Often time series are organized into a hierarchy. For example, the total visitors of a country can be divided into regions and the visitors of each region can be further divided into sub-regions. Forecasts of hierarchical time series should be coherent; for instance, the sum of the forecasts of the different regions should equal the forecast for the total. The forecasts are incoherent if they do not satisfy such constraints.

Reconciliation methods proceed in two steps. First, base forecasts are computed by fitting an independent model to each time series and ignoring the sum constraints. Then, the base forecasts are adjusted to become coherent; this step is called reconciliation. Besides being coherent, reconciled forecasts are generally more accurate than the base forecasts, as they benefit from information coming from multiple time series.

In this talk, we show how forecast reconciliation can be addressed by taking a Bayesian viewpoint.

## Ambrogio bernardelli

### Università degli Studi di Pavia

### Seminar title: Data-Driven Distributionally Robust Optimization Using the Wasserstein Metric

### Date: April 6th, 2022

Abstract: The seminar aims at presenting a method to solve stochastic problems where the distribution of the uncertain parameters is only observable through a small training dataset. In particular, an ambiguity set will be defined using the Wasserstein metric, and we will seek decisions that perform best in view of the worst-case distribution within the ambiguity set. We will see that this type of problem has nice (convex or even linear) reformulations in many cases of practical interests. The talk will be concluded with the presentation of a numerical test.

## Thiago Serra

### Bucknell University

### Seminar title: What Makes Neural Networks So Expressive, and What Could Make Them Smaller? Some Answers Based on Polyhedral Theory and Mixed-Integer Linear Programming

### Date: March 31st, 2022

Abstract: Neural networks have been successfully applied to complex predictive modeling tasks in areas such as computer vision and natural language processing. On the one hand, they have been shown to be a very powerful mathematical modeling tool: a neural network can model a piecewise-linear function with an exponential number of pieces with respect to its number of artificial neurons. On the other hand, we may still need an unreasonably large neural network in order to obtain a predictive model with good accuracy in many cases. How can we reconcile those two facts?

In this talk, we apply traditional tools from operations research to analyze neural networks that use the most common type of artificial neuron: the Rectified Linear Unit (ReLU). First, we investigate both theoretically and empirically the number of linear regions that networks with such neurons can attain, which reflect the number of pieces of the piecewise linear functions modeled by those networks. With respect to that metric, we unexpectedly find that sometimes a shallow network is more expressive than a deep network having the same number of neurons. Second, we show that we can use optimization models to remove units and layers of a neural network while not changing the output that is produced, which thus implies a lossless compression of the network. We find that such a form of compression can be facilitated by training neural networks with certain types of regularization that induce a stable behavior on its neurons.

This talk is based on papers coauthored with Christian Tjandraatmadja (Google Research), Srikumar Ramalingam (Google Research), Abhinav Kumar (Michigan State University) , and Xin Yu (The University of Utah); and published at ICML 2018 (https://arxiv.org/abs/1711.02114), AAAI 2020 (https://arxiv.org/abs/1810.03370), CPAIOR 2020 (https://arxiv.org/abs/2001.00218), and NeurIPS 2021 (https://arxiv.org/abs/2102.07804).

## Stefano Coniglio

### University of Southampton

### Seminar title: Norm minimization problems in data science: an integer programming perspective

### Date: March 9th, 2022

Abstract: The talk addresses data-science problems involving the minimization of vector norms, which arise in, among others, sparse regression, certain clustering problems involving hyperplanes, and linear classification. Focusing on how those problems can be cast as (mixed integer) nonlinear optimization problems, the talk illustrates different mathematical-programming ways to handle vector norms, as well as geometrical reformulations leading to easier to solve problem whose optimal solutions are within a constant approximation factor of the optimal solution value of the original problem.

## Mathieu Besançon

### Zuse Institute Berlin

### Seminar title: Frank-Wolfe methods for large-scale constrained optimization

### Date: February 2nd, 2021

Abstract: Conditional gradient algorithms allow the integration of convex constraints in a first-order optimization method. We present a new Julia toolbox integrating several variants of the Conditional Gradient method and detail the design choices that enable large-scale and atypical applications. In particular, we will present the linear minimization oracle interface making the library extensible and allowing users to leverage closed-form solutions of linear minimization subproblems when they are known.

## Simone milanesi

### Università degli Studi di Pavia

### Seminar title: Early detection of incipient VOC’s penetration via funnel plots of regional reproduction numbers

### Date: January 26th, 2021

Abstract: Tools to early detect the emergence of a new variant of concern are essential to develop strategies that contain epidemic outbreaks and their health-economic-social consequences. In this talk, we ‘funnel plots’ as a statistical process control method that can quickly identify regions of a country where the reproduction number is anomalous with respect to the national one, while keeping false alarms under control. The method is validated on public COVID-19 data demonstrating its efficacy in the early detection of SARS-CoV-2 variants in India, South Africa, England, and Italy.

## lorenzo bonasera

### Università degli Studi di Pavia

### Seminar title: Decision Trees under a modern optimization lens

### Date: December 6th, 2021

Abstract: Decision tree is a commonly used supervised method in Machine Learning and Data Mining for regression and classification tasks. Decision tree classifier involves segmenting the predictor space into a number of simple regions, each one corresponding to a qualitative response, taking the form of a binary tree. State-of-the-art decision tree learning methods apply heuristics recursively to create each split in isolation, which may not capture well the underlying characteristics of the dataset. The Optimal Decision Tree problem attempts to resolve this by creating the entire decision tree at once to achieve global optimality. Through mixed-integer optimization and huge improvements in both hardware and software resources, we present a novel formulation of the optimal decision tree learning problem.

## Davide duma

### Università degli Studi di Pavia

### Seminar title: On the computation of the exact chi-square index by Integer Programming

### Date: November 24th, 2021

Abstract: Consider a set of observations consisting of measures on two variables. A statistical test of independence of the two variables is the maximum Pearson’s Chi-square index, defined as a Quadratic Transportation Problem (QTP). The QTP is derived from the Linear Transportation Problem: they are both defined on the transportation polytope, but the QTP has an objective function quadratic and convex in the flow variables. Since the solution is one of the many extreme points of the transportation polytope, it is hard to certify the optimality. We introduce a combinatorial relaxation of QTP, and we propose a decomposition method to compute upper bounds, in which the QTP is reduced to 0-1 knapsack problems.