In this version I have: * Fixed some bugs in
time_transfer
.
In this version I have: * Fixed some bugs in
split_bins
,min_max_norm
,digits_num
.
* Add new function of sql_hive_text_parse
for automatic
production of hive SQL * Add new function of sum_table
includes both univariate and bivariate analysis and ranges from
univariate statistics and frequency distributions, to correlations,
cross-tabulation and characteristic analysis.
In this version I have: * Fixed some bugs in
split_bins
,time_transfer
,cohort_analysis
,xgb_filter
,feature_selector
.
* Rewrite the functions of
plot_bar
,plot_density
,plot_line
,
plot_box
,plot_relative_freq_histogram
,love_color
.
* Add new functions of plot_colors
In this version I have: * Fixed some bugs in
cohort_analysis
,time_transfer
,get_ctree_rules
.
In this version I have: * Fixed some bugs in
plot_bar
,missing_proc
,char_to_num
.
* Rewrite the logic of time_variable
. * New function
plot_line
is for generating line plots.
In this version I have: * Fixed some bugs in
data_cleansing
,plot_table
,check_rules
.
In this version I have: * Fixed some bugs in
check_rules
, time_transfer
.
#creditmodel-1.2.2
In this version I have: * Fixed some bugs in
get_ctree_rules
,
ks_plot
,cross_table
.
#creditmodel-1.2.1
In this version I have: * Enhanced strategy analysis
capabilities. * New function rule_value_replace
is for
generating new variables by rules. * Fixed some potential bugs in
ks_plot
,
perf_table
,training_model
,process_nas
.
In this version I have: * Enhanced strategy analysis
capabilities. * New function replace_value
is for replacing
values of some variables. * Fixed some potential bugs in
check_rules
,
get_ctree_rules
,rules_filter
,%alike%
.
In this version I have: * New function
plot_distribution
,plot_relative_freq_histogram
,
plot_box
,plot_density
, plot_bar
are for data visualization. * New function swap_analysis
is
for swap out/swap in analysis. * New function rules_filter
is used to filter or select samples by rules * Fixed some potential bugs
in char_to_num
,
merge_category
,check_rules
,get_ctree_rules
.
In this version I have: * New function
cross_table
is for cross table analysis. * Fixed some
potential bugs in data_cleansing
,
low_variance_filter
,time_variable
,plot_vars
.
In this version I have: * New function
entropy_weight
for is for calculating Entropy Weight. * New
function term_tfidf
for computing tf-idf of documents. *
New function plot_oot_perf
for plotting performance of over
time samples in the future. * Fixed some potential bugs in
get_breaks
,
lift_plot
,perf_table
,model_result_plot
.
* Add a parameter cut_bin to get_breaks
for cutting breaks equal depth or equal width.
In this version I have:
split_bins
,
woe_transfer
time_series_proc
for time series data
processing.ranking_percent_proc
,ranking_percent_dict
are
for processing ranking percent variables and generating ranking percent
dictionary.read_dt
to
read_data
and add and parameter pattern
for matching files.traing_xgb
,‘xgb_params’save_dt
to
save_data
and save_data
also supports multiple
data frames.In this version I have:
pred_xgb
for using xgboost model to
predict new data.get_psi_plots
,
psi_plot
to plot PSI of your data..p_to_score
for transforming
probability to score.multi_left_jion
for left jion a list
of datasets fast.read_data
for loading csv or txt
data fast.In this version I have:
xgb_filter
,
feature_selector
, split_bins
,
ks_table_plot
, ks_psi_plot
,
ks_value
.pred_score
for predicting new data
using scorecard.lr_params_search
,
xgb_params_search
for searching the optimal parameters.
“random_search”,“grid_search”,“local_search” are available.partial_dependence_plot
,
get_partial_dependence_plots
for generating partial
dependence plot.cohort_analysis
,
cohort_table
, cohort_plot
for cohort (vintage)
analysis and visualization.perf_table
,
roc_plot
, ks_plot
, lift_plot
,
psi_plot
for model validation drawings.In this version I have: * Fixed some potential bugs
in get_names
, digits_num
In this version I have:
data_exploration
for data
exploration.missing_proc
,
outliers_proc
,get_names
lasso_filter
, AUC
&K-S
is added to select the best lambda. In this way, not only can the set of
variables that makes the AUC or K-S maximized be selected, but also the
multicollinearity (which is difficult to eliminate by AIC in stepwise
regression), can be minimized. That means instead of stepwise
regression, the optimal combination of variables can be selected by
lasso to solve the regression problem.K-S
or
AUC
values corresponding to different lambda.auc_value
ks_value
,
which can calculate Kolmogorov-Smirnov (K-S) & AUC of multiple model
results quickly.