CG Week 2020 is online because of

COVID-19
Robust Boolean operations on polygons

.
We detail an implementation of regularized Boolean operations
(union, intersection, difference) on two polygonal regions. It is
based on an optimal red/blue segment intersection algorithm that
requires only double the input precision. We carefully define
predicates so that we do not need to define shared endpoints or
overlapping segments as degeneracies requiring special handling but
can fold them into the general case. We do not address the geometric
rounding of intersection points (which may require more than double
precision) back to input precision since such rounding must be
application-specific.

Voronoi-based similarity distances between arbitrary crystal lattices

.
This document develops new distance metrics based on Voronoi cell of
a crystal lattice to compare crystal structures. Crystal structure
prediction methods generate many simulated crystals used for
materials discovery or drug design of which many of them may be
similar to each other. Therefore, we may need to filter datasets by
removing similar structures that can be time consuming for different
computations. The proposed distances are invariant under rigid motions
and can be used for clustering and visualizing datasets of crystals.

Hardness of Approximation for Some Covering Problems

In Geometric Set Cover the goal is to find a minimum sized cover for
a given set of points in the plane by a family of given objects. This
is a fundamental problem which has been studied for over 30 years. It
has long been known to be NP-hard and was shown to be APX-hard by Chan
and Grant in 2014. We study this and related problems here.

First, we consider the problem of Boxes Class Cover: points are given in two colors, red and blue, and the goal is to find a minimum number of axis-aligned rectangles that cover all the blue points but no reds. This problem is introduced for the first time in a paper in 2012 by Bereg et al., who showed the problem is NP-hard. No hardness of approximation result has been shown for this problem. Here for the first time we show that the problem is APX-hard.

We also study Red-Blue Geometric Set Cover, where the points are given in two colors, red and blue, and the goal is to find a family of given axis-aligned rectangles that cover all the blue points with the minimum number of reds. Chan in 2014 showed that the version of this problem where objects are axis-aligned rectangles is also NP-hard. Here for the first time we show that the problem is APX-hard. Moreover, we show that Red-Blue Geometric Set Cover is NP-hard to approximate within o(log n) factor of optimum, where the covering objects are convex objects and n is the number of blue points.

First, we consider the problem of Boxes Class Cover: points are given in two colors, red and blue, and the goal is to find a minimum number of axis-aligned rectangles that cover all the blue points but no reds. This problem is introduced for the first time in a paper in 2012 by Bereg et al., who showed the problem is NP-hard. No hardness of approximation result has been shown for this problem. Here for the first time we show that the problem is APX-hard.

We also study Red-Blue Geometric Set Cover, where the points are given in two colors, red and blue, and the goal is to find a family of given axis-aligned rectangles that cover all the blue points with the minimum number of reds. Chan in 2014 showed that the version of this problem where objects are axis-aligned rectangles is also NP-hard. Here for the first time we show that the problem is APX-hard. Moreover, we show that Red-Blue Geometric Set Cover is NP-hard to approximate within o(log n) factor of optimum, where the covering objects are convex objects and n is the number of blue points.

Simplicial Oversampling: Accounting for the Higher-order Relations in Data Improves the Solution of the Imbalanced Learning Problem

SMOTE is the established geometric approach to random oversampling to
balance classes in the imbalanced classes learning problem, followed
by many extensions. Its main idea is to introduce synthetic data
points in the minor class with each new point being the convex
combination of an existing data point and one of its k-nearest
neighbors. This could be viewed as a sampling from the edges of a
geometric neighborhood graph.

We propose a generalization of the sampling approach, thus sampling from the maximal simplices (with respect to an ambient space) of the clique complex of a neighborhood graph. That is, a position of a new point is defined by the barycentric coordinates with respect to a simplex spanned by an arbitrary number of data points being sufficiently close. We evaluate the generalized technique and conclude that it outperforms the original SMOTE.

We propose a generalization of the sampling approach, thus sampling from the maximal simplices (with respect to an ambient space) of the clique complex of a neighborhood graph. That is, a position of a new point is defined by the barycentric coordinates with respect to a simplex spanned by an arbitrary number of data points being sufficiently close. We evaluate the generalized technique and conclude that it outperforms the original SMOTE.

Duality in Persistent Homology of Images

We derive the relationship between the persistent homology barcodes
of two dual filtered CW complexes. Applied to greyscale digital
images, we obtain an algorithm to convert barcodes between the two
different (dual) topological models of pixel connectivity.

An Enumeration and Isotopy Classification of Oriented Textile Structures

We have developed a first method for encoding isotopy classes of
links embedded in a thickened torus by 1-dimensional strings of
symbols.
We have designed a linear time algorithm to determine which codes
give rise to realizable textiles in the plane. The new encoding and
realizability algorithm have allowed us to enumerate for the first
time all oriented single component textiles up to three crossings.

Decomposition and Partition Algorithms for Tissue Dissection

We consider the decomposition of simple polygons and the partition
of graphs under specific constraints. Our research is motivated by an
application in the field of protein diagnostics, that is processing
tissue samples for laser capture mircodissection. For the dissection
to be successful the region of interest has to satisfy certain
constraints in size and morphology. We consider different algorithmic
approaches for the segmentation of these regions.

Minimal Delaunay triangulations of hyperbolic surfaces

Motivated by recent work on Delaunay triangulations of hyperbolic
surfaces, we consider the minimal number of vertices of such
triangulations. In particular, we will show that every hyperbolic
surface has a Delaunay triangulation where edges are given by
distance paths and where the number of vertices grows linearly as
function of the genus g. We will show that the order of this bound
is attained for some families of surfaces. Finally, we will give an
example showing that the
Ω(√g)
lower bound in the more general case of topological surfaces is
tight for hyperbolic surfaces as well.

Quantifying barley morphology using Euler characteristic curves

Shape is foundational to biology. Observing and documenting shape
has fueled biological understanding, and from this perspective, it
is also a type of data.

The vision of topological data analysis, that data is shape and shape is data, will be relevant as biology transitions into a data-driven era where meaningful interpretation of large data sets is a limiting factor. We focus on quantifying the morphology of barley spikes and seeds using topological descriptors based on the Euler characteristic and relate the output back to genetic information.

The vision of topological data analysis, that data is shape and shape is data, will be relevant as biology transitions into a data-driven era where meaningful interpretation of large data sets is a limiting factor. We focus on quantifying the morphology of barley spikes and seeds using topological descriptors based on the Euler characteristic and relate the output back to genetic information.

On the Power of Concatenation Arguments

We present several concatenation arguments, and show their
applications to setting bounds on the growth constants of
polyominoes and polycubes, and to families (e.g., trees) of
polyominoes and polycubes whose enumerating sequences are pseudo
sub- or super-multiplicative.

On The Number of Compositions of Two Polycubes

We provide tight bounds on the minimum and maximum possible numbers
of compositions of two polycubes, either when each is of size n, or
when their total size is 2n, in two and higher dimensions.

Practical volume estimation of polytopes by billiard trajectories and a new annealing schedule

We study the problem of estimating the volume of convex
polytopes. Our algorithm is based on Multiphase Monte Carlo
(MMC) methods and our main contributions include:

- a new uniform sampler employing Billiard Walk (BW) for the first time in volume computation,
- a new simulated annealing schedule, generalizing existing MMC by making use of adaptive convex bodies which fit to the input, thus drastically reducing the required number of iterations.

Samples Using Geometric Counting Lower Bounds

Using Szemerédi–Trotter theorem for bounding the number of
incidences between n points and m lines, we compute the sample size
required for achieving that bound. We prove a similar result for the
size of the dominating set of a graph and use it to find a sample
for k-center.

Recovering the homology of immersed manifolds

Given a sample of an abstract manifold immersed in some Euclidean
space, we describe a way to recover the singular homology of the
original manifold. It consists in estimating its tangent
bundle—seen as subset of another Euclidean space—in a measure
theoretic point of view, and in applying measure-based filtrations
for persistent homology.

The construction we propose is consistent and stable, and does not involve the knowledge of the dimension of the manifold. In order to obtain quantitative results, we introduce the normal reach, which is a notion of reach suitable for an immersed manifold.

The construction we propose is consistent and stable, and does not involve the knowledge of the dimension of the manifold. In order to obtain quantitative results, we introduce the normal reach, which is a notion of reach suitable for an immersed manifold.

Understanding the Topology and the Geometry of the Persistence Diagram Space via Optimal Partial Transport

Despite the obvious similarities between the metrics used in
topological data analysis and those of optimal transport, an
explicit optimal transport based formalism to study persistence
diagrams and similar topological descriptors has yet to come. By
considering the space of persistence diagrams as a measure space,
and by observing that its metrics can be expressed as optimal
partial transport problems, we introduce a generalization of
persistence diagrams, namely Radon measures supported on the upper
half plane. Such measures naturally appear in topological data
analysis when considering continuous representations of persistence
diagrams (e.g. persistence surfaces) but also as limits for laws of
large numbers on persistence diagrams or as expectations of
probability distributions on the persistence diagrams space. This
formalism allows us to prove topological properties of this new
space, which will also hold for the closed subspace of persistence
diagrams.

This bridge between optimal transport and persistence diagram metrics allows us to address various problems that naturally arise when dealing with persistence diagram, both in theory and in applications.

First, we provide a characterization of convergence in the space of persistence diagram (with respect to the diagram metrics) in terms of vague convergence of measures. This result provides a powerful tool to study convergence and continuity properties in the space of persistence diagrams; in particular it allows us to give an exhaustive characterization of continuous linear representations of persistence diagrams, a common tool used when incorporating persistence diagrams in machine learning pipeline.

Second, this formalism allows us to prove new results regarding persistence diagrams in a random setting, as it enables to manipulate some continuous such as expected persistence diagrams (that are not standard persistence diagrams) and to prove convergence rates and stability results in this context.

Finally, an optimal transport-based formalism has huge strengths in applications. Indeed, modern tools routinely used in computational optimal transport can be straightforwardly adapted to persistence diagrams metrics, allowing to estimate efficiently distances, barycenters, or quantizations of persistence diagrams to name a few.

This presentation is based on the preprint [Divol & Lacombe, 2019].

This bridge between optimal transport and persistence diagram metrics allows us to address various problems that naturally arise when dealing with persistence diagram, both in theory and in applications.

First, we provide a characterization of convergence in the space of persistence diagram (with respect to the diagram metrics) in terms of vague convergence of measures. This result provides a powerful tool to study convergence and continuity properties in the space of persistence diagrams; in particular it allows us to give an exhaustive characterization of continuous linear representations of persistence diagrams, a common tool used when incorporating persistence diagrams in machine learning pipeline.

Second, this formalism allows us to prove new results regarding persistence diagrams in a random setting, as it enables to manipulate some continuous such as expected persistence diagrams (that are not standard persistence diagrams) and to prove convergence rates and stability results in this context.

Finally, an optimal transport-based formalism has huge strengths in applications. Indeed, modern tools routinely used in computational optimal transport can be straightforwardly adapted to persistence diagrams metrics, allowing to estimate efficiently distances, barycenters, or quantizations of persistence diagrams to name a few.

This presentation is based on the preprint [Divol & Lacombe, 2019].

Optimizing Embeddings using Persistence

This work looks to optimize Takens-type embeddings of time series
using persistent (co)homology. The motivation is to assume that an
input time series exhibits periodic, quasi-periodic, or recurrent
behaviour, then use continuous optimization to find good
embeddings. Here we provide a practical approach to finding good
embeddings with indications as to possible theoretical questions and
directions which arise.

Finding Surfaces in 2-Dimnesional Simplicial Complexes with Bounded Treewidth 1-Skeletons

We present an algorithm for finding a subcomplex K' of a simplicial
complex K such that K' is a surface with specified boundary, genus,
and orientability. This algorithm is parameterized by the treewidth
of the 1-skeleton of the simplicial complex K and the genus of the
surface K'. Our algorithm is linear in the number of vertices of K
if the treewidth of the 1-skeleton of K and the genus of K' are
constant. Otherwise, our algorithm is polynomial if the treewidth of
the 1-skeleton of K is constant.

Topology Aware Morphological Operations

This work in progress aims to give a solution of the homological
simplification problem in ℝ^{2} which is
also geometrically nice.

Which Integrable Projection is Which?

Given a piecewise constant vector field v on a manifold m, we begin
with the question of when it can be expressed as the gradient of
some scalar field s, in which case we say that this field is
*discretely integrable*. When v is not integrable, one can then
begin considering projections from v to the space of integrable
fields. In this document we note that there are a number of related
notions of integrable projection defined in the literature. We
present a framework in which we generalize three of these notions to
simplicial complexes in arbitrary dimension, and which shows promise
as a computational tool for solving otherwise combinatorially hard
problems in computational topology.

Computing relevant subtrajectory bundles faster

The ubiquitous use of smartphones and other GPS-enabled devices
leads to vast amounts of tracking data being collected on a regular
basis. But generally, processing large GPS trajectory data requires
bundling portions of the trajectories together; this is referred to
as sub-trajectory clustering. Since this is a multi-criteria
optimization problem, there are many potential bundles of interest
and it is unclear which ones to choose. Recently an approach was
proposed for identifying so-called relevant bundles of
sub-trajectories. We provide a faster algorithm for computing such
relevant bundles, by employing a combination of exponential and
binary search. This yields a runtime improvement of a linear factor
over the previous exhaustive search algorithm.

Imprint | Disclaimer | Powered by w3.css | Valid HTML5 | Last modified: Wed May 27 11:28:49 CEST 2020