Making Math Delicious: The Research Cortex

Making Math Delicious: The Research Cortex

 

Last time I posted, I was the grouchy mathematician “telling data scientists to get off my lawn” as I attempted to persuade you that eating Brussels sprouts of Math is just as cool as eating that thick porterhouse steak named Data Science. (Disclaimer: I recognize all diets as equally valid, and WLOG operate in the space where Brussels sprouts are uncool and steaks are cool.) Data Science gets to be that porterhouse because its practitioners not only demonstrate its nutritional value to a business, but found a way to make it delectable, satisfying, and visually appealing to a wide audience. We all know that Brussels sprouts are nutritious, but in that way that tastes nutritious. With all this in mind, I’d like to provide a recipe for preparing those Brussels sprouts in a way that doesn’t feel like you are forcing them down while your mother glares at you.

 

Meet the Research Cortex at www.theresearchcortex.com.

 

The Research Cortex has the lofty goal of doing for mathematics what so many others have done for data science—make it rigorous, yet accessible to a wide audience that spans disciplines and industries. This new sibling of The Data Cortex serves as the unofficial hub for academic research of Dell EMC’s Data Protection Division CTO Team. Initially, our focus will be primarily mathematics.

 

The work we’ll publish is original, rigorous content… with a twist. Shortly after publishing a new paper, we add video overviews about the work and the key results. We also feature video microcontent (Math Snacks) that spans various topics and metatopics in mathematics. Our first series of Math Snacks looks at types of mathematical proofs, beginning with direct proofs, in order to give some insight into how mathematicians approach problems.

 

The scope of the research is broad; no echo chambers here. We want full exploration of all branches of mathematics, pure and applied. Our first published work, by yours truly, examines sequences of dependent random variables and constructs a new probability distribution that analytically handles correlated categorical random variables. The next paper is the first part of a Masters thesis by Jonathan Johnson, currently a PhD student at the University of Texas at Austin, discussing summation chains of sequences. Future work will touch queuing theory, reliability theory, algebraic statistics, and anything else that needs a home and an audience.

 

Mathematics is that underground river that nurtures every other branch of science and engineering. My hope is that, by making these theoretical and foundational works accessible and enjoyable to consume, we can spark innovative ideas and applications by our readers in any area they can think of.

 

I also want to take the time to acknowledge those who helped the Research Cortex go from a mathematician’s lofty ideal to a tangible (sort of) object. Mariah Arevalo, a software engineer in the ELDP program at Dell EMC is the site administrator, designer, social media manager, and other titles I’m sure I’ve missed. I’ll also throw a quick shout-out to Jason Hathcock for the assistance in video design and production, and music composition.

 

We are very proud of the Data Cortex’s new brother, and hope you will bookmark www.theresearchcortex.com and visit regularly to check out all our new content.

 

~Rachel Traylor @ Mathopocalypse

Mathematics, Big Data, and Joss Whedon

Mathematics, Big Data, and Joss Whedon

Definition 1: The symmetric difference of two sets A and B, denoted A \Delta B , is the set of elements in each of A and B, but not in their intersection.

Let A be “Mathematics”, and let B be “Data Science”. This is certainly not the first article vying for attention with the latter buzzword, so I’ll go ahead and insert a few more here to help boost traffic and readership:

Analytics, Machine Learning, Algorithm,

Neural Networks, Bayesian, Big Data

These formerly technical words (except that last one) used to live solidly in the dingy faculty lounge of set A. They have since been distorted into vague corporate buzzwords, shunning their well-defined mathematical roots for the sexier company of “synergy”, “leverage”, and “swim lanes” at refined business luncheons. All of the above words have allowed themselves to become elements of the nebulous set B: “Data Science”. As the entire corporate and academic world scrambles to rebrand themselves as members of Big Data™, allow me to pause the chaos in order to reclaim set A.   This isn’t to say that set B is without its merits. Data Science is Joss Whedon, making the uncool comic books so hip that Target sells T-shirts now. The advent of powerful computational resources and a worldwide saturation of data have sparked a mathematical revival of sorts. (It is actually possible for university mathematics departments to receive funding now.) Data Science has inspired the development of methods for quantifying every aspect of life and business, many of which were forged in mathematical crucibles. Data science has built bridges between research disciplines, and sparked some taste for a subject that was previously about as appetizing to most as dry Thanksgiving turkey without gravy. Data science has driven billions of dollars in sales across every industry, customized our lives to our particular tastes, and advanced medical technology, to name a few. Moreover, the techniques employed by data scientists have mathematical roots. Good data scientists have some mathematical background, and my buzzwords above are certainly in both sets. Clearly,  A \cup B   is nonempty, and the two sets are not disjoint. However, the symmetric difference between the two sets is large. Symbolically,  (A \Delta B) \gg   A \cup B   . To avoid repetition of the plethora of articles about Data Science, our focus will be on the elements of mathematics that data science lacks. In mathematical symbols, we investigate the set A \ B.

Mathematics is simplification. Mathematicians seek to strip a problem bare. Just as every building has a foundation and a frame, every “applied” problem has a premise and a structure. Abstracting the problem into a mathematical realm identifies the facade of the problem that previously seemed necessary. An architect can design an entire subdivision with one floor plan, and introduce variation in cosmetic features to produce a hundred seemingly different homes. Mathematicians reverse this process, ignoring the unnecessary variation in building materials to find the underlying structure of the houses. A mathematician can solve several business problems with one good model by studying the anatomy of the problems.

Mathematics is rigor. My real analysis professor in graduate school told us that a mathematician’s job is two-fold: to break things and to build unbreakable things. We work in proofs, not judgment. Many of the data science algorithms and statistical tests that get name dropped at parties today are actually quite rigorous, if the assumptions are met. It is disingenuous to scorn statistics as merely a tool to lie; one doesn’t blame the screwdriver that is being misused as a hammer. Mathematicians focus on these assumptions. A longer list of assumptions prior to a statement indicates a weak statement; our goal is to strip assumptions one by one to see when the statement (or algorithm) breaks. Once we break it, we recraft it into a stronger statement with fewer assumptions, giving it more power.

Mathematics is elegance. Ultimately, this statement is a linear combination of the previous two, but still provides an insightful contrast. Data science has become a tool crib of “black box” algorithms that one employs in his language of choice. Many of these models have become uninterpretable blobs that churn out an answer (even good ones by many measures of performance. Pick your favorite measure–p values, Euclidean distance, prediction error.) They solve the specific problem given wonderfully, molding themselves to the given data like a good pair of spandex leggings. However, they provide no structure, no insight beyond that particular type of data. Understanding the problem takes a back seat to predictions, because predictions make money, especially before the end of the quarter. Vision is long-term and expensive. This type of thinking is short-sighted; with some investment, that singular dataset may reveal a structure that is isomorphic to another problem in an unrelated department, and even one that may be exceedingly simple in nature. In this case, mathematics can provide an interpretable, elegant solution that solves multiple problems, provides insight to behavior, and still retains predictive power.

As an example, let us examine the saturated research field of disk failures. There is certainly no shortage of papers that develop complex algorithms for disk failure prediction; typically the best performing ones are an ensemble method of some kind. Certain errors are good predictors of disk failure, for instance, medium errors and reallocated sectors. These errors evolve randomly, but always increase. A Markov chain fits this behavior perfectly, and we have developed the method to model these errors. Parameter estimation is a challenge, but the idea is simple, elegant, and interpretable. Because the mathematics are so versatile, with just one transition matrix a user can answer almost any question he likes without needing to rerun the model. This approach allows for both predictive analytics and behavior monitoring, is quick to implement, and is analytically (in the mathematical sense) sound. The only estimation needed is in the parameters, not in the model structure itself. Effective parameter estimation will effectively guarantee good performance.

There is room for both data scientists and mathematicians; the relationship between a data scientist and a mathematician is a symbiotic one. Practicality forces a quick or canned solution at times, and sometimes the time investment needed to “reinvent the wheel” when we have (almost) infinite storage and processing power at hand is not good business. Both data science and mathematics require extensive study to be effective; one course on Coursera does not make one a data scientist, just as calculus knowledge does not make one a mathematician. But ultimately, mathematics is the foundation of all science; we shouldn’t forget to build that foundation in the quest to be industry Big Data™ leaders.

 

~Rachel Traylor @mathpocalypse