-
5 Modeling
As the given problem is to
deal with forecasting on customer campaign
feedback, the following 3
modeling
tools are selected to create predictive model:
Neuron Network , C5.0 and CART. During the
different stages of CRISP-DM process, a
number of predictive models have been created and
evaluated.
They will be discussed in
detail in the following paragraphs:
Stage1 :
During this stage,
correlations
of all the
fields were exhaustively analyzed. Those pares of
fields which
has an absolute
correlation of greater than 0.9 will be separated,
and one member of such pare must be
eliminated. Besides that,
heuristic
also were used to
do fields
elimination, eg: “For Future
Tax Filer Use”
Stage 2:
After fields
elimination finished,
anomaly
model
were used to eliminate
potential outliers. We filter out
those
outliers before model creation.
Stage 3:
To further improve
model accuracy, we attempt to incorporate
clustering methods for Census and Tax
filer data. To achieve that, census
data were filtered and clustered. Then cluster
number of each record
was appended back
to original table. Same thing happened for Tax
filer.
Model Evaluation
As the objective is to increase the
response rate, model accuracy is not a good
indicator for model
fitness.
Assume that company has a very large
customer database. After the creation of
predictive model,
campaign will only be
held among those predicted responders. Under this
assumption, fitness of the
model can be
measured by the formula below:
Assume: A = Predicted responders B =
Actual responders
Model Fitness =
Count(A
∩
B) / Count(A)
We apply it to the models
created in different stages of CRISP-DM process,
the Model fitness table can
be created:
-
-
-
-
-
-
-
-
-
上一篇:数据挖掘实验指导书
下一篇:为什么要建立数据仓库