New approach can help organizations scale their data science efforts with artificial data and crowdsourcing.
Although data scientists can gain great insights from large data sets — and can ultimately use these insights to tackle major challenges — accomplishing this is much easier said than done. Many such efforts are stymied from the outset, as privacy concerns make it difficult for scientists to access the data they would like to work with.
IN 1907 John D. Hertz, the owner of a taxi firm in Chicago, asked some academics at the University of Chicago to do a piece of research for him. He wanted to know what colour he should paint his cabs in order to make them stand out among the sea of black vehicles that then inhabited American city streets. The researchers’ conclusion was: yellow. Now, more than a century later, a group of researchers at a different university have concluded that yellow was a wise choice for other reasons, too. In a study just published in the Proceedings of the National Academy of Sciences, Ho Teck Hua of the National University of Singapore and his colleagues show that yellow taxis are less likely to be involved in accidents.
How to construct good approximation for the distribution of the solution value to linear optimization problem when the objective function is random? More generally, we consider any mixed zero-one linear optimization problem, and develop an approach to approximate the distribution of its optimal value when the random objective coefficients follow a multivariate normal distribution. Linking our model to the classical Stein’s Identity, we show that the least squares normal approximation of the random optimal value can be computed by solving the persistency problem, first introduced by Bertsimas et al. (2006).
Consider a large number of detectors each generating a data stream. The task is to detect online, distribution changes in a small fraction of the data streams. Previous approaches to this problem include the use of mixture likelihood ratios and sum of CUSUMs. We provide here extensions and modifications of these approaches that are optimal in detecting normal mean shifts. We show how the (optimal) detection delay depends on the fraction of data streams undergoing distribution changes as the number of detectors goes to infinity. There are three detection domains. In the first domain for moderately large fractions, immediate detection is possible. In the second domain for smaller fractions, the detection delay grows logarithmically with the number of detectors, with an asymptotic constant extending those in sparse normal mixture detection. In the third domain for even smaller fractions, the detection delay lies in the framework of the classical detection delay formula of Lorden. We show that the optimal detection delay is achieved by the sum of detectability score transformations of either the partial scores or CUSUM scores of the data streams.
The design of this method combines an inexact 2-block majorized semi-proximal ADMM and the recent advances in the inexact symmetric Gauss-Seidel (sGS) technique for solving a multi-block convex composite quadratic programming whose objective contains a nonsmooth term involving only the first block-variable. One distinctive feature of our proposed method (the sGS-imsPADMM) is that it only needs one cycle of an inexact sGS method, instead of an unknown number of cycles, to solve each of the subproblems involved. With some simple and implementable error tolerance criteria, the cost for solving the subproblems can be greatly reduced, and many steps in the forward sweep of each sGS cycle can often be skipped, which further contributes to the efficiency of the proposed method
Parallel surrogate-based global optimization method forcomputationally expensive objective functions that is more effective for larger numbers ofprocessors. To reach this goal, we integrated concepts from multi-objective optimizationand tabu search into, single objective, surrogate optimization. Our proposed derivative-freealgorithm, called SOP, uses non-dominated sorting of points for which the expensive functionhas been previously evaluated SOP: parallel surrogate global optimization with Pareto center selection for computationally expensive single objective problems.
The objective is to obtain optimal routing solutions that would, as much as possible, adhere to a set of specified requirements after the uncertainty is realized. These problems include finding an optimal routing solution to meet the soft time window requirements at a subset of nodes when the travel time is uncertain, and sending multiple capacitated vehicles to different nodes to meet the customers’ uncertain demands. We introduce a precise mathematical framework for defining and solving such routing problems
Data-science algorithms and models changed the way we search for information on products and services, make payments, procure and trade, and along the way, changed how ﬁrms use individual-level consumer data,and how business transactionsare created,documented,regulated,and analyzed.