Text Analytics spans two very different fields: linguistics and mathematics. Since many of us only really understand at most one of these, I'd like to demystify some of the fundamental mathematics concepts present in the field. This is intended to be light and approachable and not, I promise, a math lecture. Do you know what a matrix is, and what it's used for? If not, read on!
The matrix is a fundamental object in many different fields of analysis, and natural language processing is no exception. Whether you want to study the flow of air around a speeding race car or study the interaction of terms in a document, the matrix can represent the state, the changes, and the connections in a complicated problem.
But what exactly is a matrix? The most common place to run into them in the modern world is in spreadsheet software like Excel. A matrix is really just a grid of objects, usually numbers or equations. We can identify any single cell in the matrix by counting down the side to find its row, and then across the top to find its column. In Excel we'd write C12, in mathematics we'd write M3,12, but in either way we're giving the coordinates to a particular piece of data in our matrix.
A matrix, then, can be used to store information in an organized way. If we have a network, we can use each row and column to represent a different element in the network, and the cells to represent connections. This is how Google analyzes the flow of authority on the internet to determine the most relevant results for your searches. At Lexalytics, we use a matrix to represent the interactions between words and Wikipedia articles within our Concept Matrix.
There are also many powerful algorithms that can be run on matrices. Factorization takes a complicated mess of data and finds simpler, more compact explanations. Eigenvector Analysis can tell you what the most important items in your matrix are. We use these sorts of algorithms to figure out the most likely way for the words of a sentence to go together.
Math with matrices is very tedious, with lots of multiplication and addition repeated over and over and over again hundreds of thousands of times (or more!). Fortunately, that sort of rote mathematics is where computers excel. From that simple mathematics all sorts of amazing calculations can be done, from modeling an electron, to displaying a 3d shape in a game, to automatically figuring out whether "Google" in a document is a company or not. Interestingly, all these different uses of matrix mathematics has given techniques developed in one field a way to spread to others. Cutting edge natural language processing work has actually started borrowing techniques from the quantum physics used to describe the laws of reality.
If you want to learn even more about matrices, give it a search in your search engine of a choice, and a matrix will point the way for you.