by Doug Alexander
Gegevens mining is a powerful fresh technology with good potential to help companies concentrate on the most significant information te the gegevens they have collected about the behavior of their customers and potential customers. It detects information within the gegevens that queries and reports can’t effectively expose. This paper investigates many aspects of gegevens mining te the following areas:
Gegevens Rich, Information Poor
The amount of raw gegevens stored ter corporate databases is exploding. From trillions of point-of-sale transactions and credit card purchases to pixel-by-pixel pics of galaxies, databases are now measured te gigabytes and terabytes. (One terabyte = one trillion bytes. A terabyte is omschrijving to about Two million books!) For example, every day, Wal-Mart uploads 20 million point-of-sale transactions to an A&,T massively parallel system with 483 processors running a centralized database. Raw gegevens by itself, however, does not provide much information. Ter today’s fiercely competitive business environment, companies need to rapidly turn thesis terabytes of raw gegevens into significant insights into their customers and markets to guide their marketing, investment, and management strategies.
The druppel ter price of gegevens storage has given companies willing to make the investment a tremendous resource: Gegevens about their customers and potential customers stored ter “Gegevens Warehouses.” Gegevens warehouses are becoming part of the technology. Gegevens warehouses are used to consolidate gegevens located ter disparate databases. A gegevens warehouse stores large quantities of gegevens by specific categories so it can be more lightly retrieved, interpreted, and sorted by users. Warehouses enable executives and managers to work with vast stores of transactional or other gegevens to react swifter to markets and make more informed business decisions. It has bot predicted that every business will have a gegevens warehouse within ten years. But merely storing gegevens ter a gegevens warehouse does a company little good. Companies will want to learn more about that gegevens to improve skill of customers and markets. The company benefits when meaningful trends and patterns are extracted from the gegevens.
What is Gegevens Mining?
Gegevens mining, or skill discovery, is the computer-assisted process of digging through and analyzing enormous sets of gegevens and then extracting the meaning of the gegevens. Gegevens mining devices predict behaviors and future trends, permitting businesses to make proactive, knowledge-driven decisions. Gegevens mining instruments can reaction business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
Gegevens mining derives its name from the similarities inbetween searching for valuable information te a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find where the value resides.
What Can Gegevens Mining Do?
Albeit gegevens mining is still te its infancy, companies te a broad range of industries – including retail, finance, heath care, manufacturing transportation, and aerospace – are already using gegevens mining devices and technics to take advantage of historical gegevens. By using pattern recognition technologies and statistical and mathematical technics to sift through warehoused information, gegevens mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed.
For businesses, gegevens mining is used to detect patterns and relationships ter the gegevens ter order to help make better business decisions. Gegevens mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Specific uses of gegevens mining include:
- Market segmentation – Identify the common characteristics of customers who buy the same products from your company.
- Customer churn – Predict which customers are likely to leave your company and go to a competitor.
- Fraud detection – Identify which transactions are most likely to be fraudulent.
- Rechtstreeks marketing – Identify which prospects should be included te a mailing list to obtain the highest response rate.
- Interactive marketing – Predict what each individual accessing a Web webpagina is most likely interested ter witnessing.
- Market basket analysis – Understand what products or services are commonly purchased together, e.g., mannetjesvarken and diapers.
- Trend analysis – Expose the difference inbetween a typical customer this month and last.
Gegevens mining technology can generate fresh business opportunities by:
Automated prediction of trends and behaviors: Gegevens mining automates the process of finding predictive information ter a large database. Questions that traditionally required extensive hands-on analysis can now be directly answered from the gegevens. A typical example of a predictive problem is targeted marketing. Gegevens mining uses gegevens on past promotional mailings to identify the targets most likely to maximize comeback on investment ter future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to react similarly to given events.
Automated discovery of previously unknown patterns: Gegevens mining devices sweep through databases and identify previously hidden patterns. An example of pattern discovery is the analysis of retail sales gegevens to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous gegevens that could represent gegevens entry keying errors.
Using massively parallel computers, companies dig through volumes of gegevens to detect patterns about their customers and products. For example, grocery chains have found that when guys go to a supermarket to buy diapers, they sometimes walk out with a six-pack of teddybeer spil well. Using that information, it’s possible to lay out a store so that thesis items are closer.
AT&,T, A.C. Nielson, and American Express are among the growing ranks of companies implementing gegevens mining technics for sales and marketing. Thesis systems are crunching through terabytes of point-of-sale gegevens to aid analysts ter understanding consumer behavior and promotional strategies. Why? To build up a competitive advantage and increase profitability!
Similarly, financial analysts are plowing through vast sets of financial records, gegevens feeds, and other information sources te order to make investment decisions. Health-care organizations are examining medical records to understand trends of the past so they can reduce costs ter the future.
The Evolution of Gegevens Mining
Gegevens mining is a natural development of the enhanced use of computerized databases to store gegevens and provide answers to business analysts.
Gegevens Collection (1960s)
“What wasgoed my total revenue te the last five years?”
computers, tapes, disks
Gegevens Access (1980s)
“What were unit sales ter Fresh England last March?”
swifter and cheaper computers with more storage, relational databases
Gegevens Warehousing and Decision Support
“What were unit sales te Fresh England last March? Drill down to Boston.”
quicker and cheaper computers with more storage, On-line analytical processing (OLAP), multidimensional databases, gegevens warehouses
“What’s likely to toebijten to Boston unit sales next month? Why?”
swifter and cheaper computers with more storage, advanced pc algorithms
Traditional query and report instruments have bot used to describe and samenvatting what is te a database. The user forms a hypothesis about a relationship and verifies it or discounts it with a series of queries against the gegevens. For example, an analyst might hypothesize that people with low income and high debt are bad credit risks and query the database to verify or disprove this assumption. Gegevens mining can be used to generate an hypothesis. For example, an analyst might use a neural netwerken to detect a pattern that analysts did not think to attempt – for example, that people overheen 30 years old with low incomes and high debt but who own their own homes and have children are good credit risks.
How Gegevens Mining Works
How is gegevens mining able to tell you significant things that you didn’t know or what is going to toebijten next? That technology that is used to perform thesis feats is called modeling. Modeling is simply the act of building a specimen (a set of examples or a mathematical relationship) based on gegevens from situations where the reaction is known and then applying the monster to other situations where the answers aren’t known. Modeling technics have bot around for centuries, of course, but it is only recently that gegevens storage and communication capabilities required to collect and store giant amounts of gegevens, and the computational power to automate modeling mechanisms to work directly on the gegevens, have bot available.
Spil a ordinary example of building a proefje, consider the director of marketing for a telecommunications company. He would like to concentrate his marketing and sales efforts on segments of the population most likely to become big users of long distance services. He knows a loterijlot about his customers, but it is unlikely to discern the common characteristics of his best customers because there are so many variables. From his existing database of customers, which contains information such spil age, hookup, credit history, income, zip code, occupation, etc., he can use gegevens mining devices, such spil neural networks, to identify the characteristics of those customers who make lots of long distance calls. For example, he might learn that his best customers are unmarried females inbetween the age of 34 and 42 who make te excess of $60,000 vanaf year. This, then, is his prototype for high value customers, and he would budget his marketing efforts to accordingly.
Gegevens Mining Technologies
The analytical technologies used te gegevens mining are often well-known mathematical algorithms and technics. What is fresh is the application of those technics to general business problems made possible by the enhanced availability of gegevens and inexpensive storage and processing power. Also, the use of graphical interfaces has led to devices becoming available that business experts can lightly use.
Some of the devices used for gegevens mining are:
Artificial neural networks – Non-linear predictive models that learn through training and resemble biological neural networks ter structure.
Decision trees – Tree-shaped structures that represent sets of decisions. Thesis decisions generate rules for the classification of a dataset.
Rule induction – The extraction of useful if-then rules from gegevens based on statistical significance.
Genetic algorithms – Optimization technics based on the concepts of genetic combination, mutation, and natural selection.
Nearest neighbor – A classification mechanism that classifies each record based on the records most similar to it te an historical database.
Details about who calls whom, how long they are on the phone, and whether a line is used for fax spil well spil voice can be invaluable te targeting sales of services and equipment to specific customers. But thesis tidbits are buried te masses of numbers te the database. By delving into its extensive customer-call database to manage its communications network, a regional telephone company identified fresh types of unmet customer needs. Using its gegevens mining system, it discovered how to pinpoint prospects for extra services by measuring daily household usage for selected periods. For example, households that make many lengthy calls inbetween Trio p.m. and 6 p.m. are likely to include teenagers who are prime candidates for their own phones and lines. When the company used target marketing that emphasized convenience and value for adults – “Is the phone always tied up?” – hidden request surfaced. Extensive telephone use inbetween 9 a.m. and Five p.m. characterized by patterns related to voice, fax, and modem usage suggests a customer has business activity. Target marketing suggesting those customers “business communications capabilities for puny budgets” resulted te sales of extra lines, functions, and equipment.
The capability to accurately gauge customer response to switches te business rules is a powerful competitive advantage. A handelsbank searching for fresh ways to increase revenues from its credit card operations tested a nonintuitive possibility: Would credit card usage and rente earned increase significantly if the canap halved its ondergrens required payment? With hundreds of gigabytes of gegevens signifying two years of average credit card balances, payment amounts, payment timeliness, credit limit usage, and other key parameters, the handelsbank used a powerful gegevens mining system to prototype the influence of the proposed policy switch on specific customer categories, such spil customers consistently near or at their credit thresholds who make timely ondergrens or petite payments. The canap discovered that cutting ondergrens payment requirements for petite, targeted customer categories could increase average balances and extend indebtedness periods, generating more than $25 million te extra rente earned,
Merck-Medco Managed Care is a mail-order business which sells drugs to the country’s largest health care providers: Blue Cross and Blue Shield state organizations, large HMOs, U.S. corporations, state governments, etc. Merck-Medco is mining its one terabyte gegevens warehouse to uncover hidden linksaf inbetween illnesses and known drug treatments, and spot trends that help pinpoint which drugs are the most effective for what types of patients. The results are more effective treatments that are also less costly. Merck-Medco’s gegevens mining project has helped customers save an average of 10-15% on prescription costs.
The Future of Gegevens Mining
Ter the short-term, the results of gegevens mining will be te profitable, if mundane, business related areas. Micro-marketing campaigns will explore fresh niches. Advertising will target potential customers with fresh precision.
Ter the medium term, gegevens mining may be spil common and effortless to use spil e-mail. Wij may use thesis devices to find the best airfare to Fresh York, root out a phone number of a long-lost classmate, or find the best prices on lawn mowers.
The long-term prospects are truly titillating. Imagine slim agents turned liberate on medical research gegevens or on sub-atomic particle gegevens. Computers may expose fresh treatments for diseases or fresh insights into the nature of the universe. There are potential dangers, however, spil discussed below.
What if every telephone call you make, every credit card purchase you make, every flight you take, every visit to the doctor you make, every warranty card you send te, every employment application you pack out, every schoolgebouw record you have, your credit record, every web pagina you visit . wasgoed all collected together? A loterijlot would be known about you! This is an all-too-real possibility. Much of this kleuter of information is already stored te a database. Recall that phone vraaggesprek you talent to a marketing company last week? Your replies went into a database. Recall that loan application you packed out? Ter a database. Too much information about too many people for anybody to make sense of? Not with gegevens mining implements running on massively parallel processing computers! Would you feel convenient about someone (or lots of someones) having access to all this gegevens about you? And recall, all this gegevens does not have to reside ter one physical location, spil the televisiekanaal grows, information of this type becomes more available to more people.