# Financial stock market forecasting data mining techniques

Stock market is considered too uncertain to be predictable. Many individuals have developed methodologies or models to increase the probability of making a profit in their stock investment. The overall hit rates of these methodologies and models are generally too low to be practical for real-world application. One of the major reasons is the huge fluctuation of the market. Therefore, the current research focuses in the stock forecasting area is to improve the accuracy of stock trading forecast.

This paper introduces a system that addresses the particular need. The system integrates various data mining techniques and supports the decision-making for stock trades.

The proposed system embeds the top-down trading theory, artificial neural network theory, technical analysis, dynamic time series theory, and Bayesian probability theory. To experimentally examine the trading return of the presented system, two examples are studied. The first uses the Taiwan Semiconductor Manufacturing Company TSMC data-set that covers an investment horizon of trading days from 16 February to 23 January The second example examines the stock data of Evergreen Marine Corporation, an international marine shipping company.

Given the remarkable investment returns in trading the example TSMC and Evergreen stocks, the proposed system demonstrates promising potentials as a viable tool for stock market forecasting. Forecasting performance of nonlinear models for intraday stock returns. Journal of Forecasting , 31 , — Surveying stock market forecasting techniques—Part II: Expert Systems with Applications , 36 , — Neuro-fuzzy and soft computing: A computational approach to learning and machine intelligence.

Upper Saddle River, NJ: However, based on the assumption that is largely supported by real case studies that with appropriate training over any uptrend, down-trend, and flat horizon one could have enough indicators to forecast the trend with significant accuracy.

Future trends may be predicted to some extent based on some key indicators and past behaviors. Due to system uncertainties and other unknown random factors, every stock market model is approximate. Elliott wave theory and neuro-fuzzy systems, in stock market prediction: Expert Systems with Applications , 38 , — One of the best ways to model the market value is the use of expert systems with artificial neural networks ANN , which is void of standard formulas and can easily adapt the changes of the market.

In literature, many ANN models are evaluated against statistical models for forecasting the market value. Using artificial neural network models in stock market index prediction. The proposed system in this research is a hybrid intelligent forecast system combined with ANN.

It may predict with significant accuracy stock price trends using historical stock market prices from the Taiwan Stock Exchange TSE and gives very encouraging results. The trend of the Taiwan Semiconductor Manufacturing Company TSMC stock and the Evergreen Marine Corporation stock were predicted with an This percentage of accuracy corresponds to a ratio 4: In the sections that follow, we propose a system that integrates various data mining techniques to support the stock trading decision-making.

The system also incorporates the theory of top-down trading and tandem trading pioneered by Livermore Livermore, J. How to trade in stocks; the Livermore formula for combining time element and price. New York , NY: The theory was found useful in stock forecasting. Analysis of top-down analysis in stock prediction is vital for two important reasons.

One is the top-down analysis of the market direction. The investor must know the overall trend of the market before making a trade. This applies to the stock market, the industry group, and individual stocks. A comparison of classification and level estimation models. International Journal of Forecasting , 16 , — Then, the individual stock is investigated by the system integrated with data mining techniques including technical analysis, Bayesian probability theory, dynamic time series theory, and ANN.

In this research, we start with checking the main market. The step is to know which way the overall market is headed: Secondly, we examine the specific industry group to make sure that the group is moving in the same direction in order to increase the chance of making a profit on the trade.

Thirdly, we review the sister stocks to see if the stock is moving in the same direction. In the fourth step, all three factors are examined at the same time; that is, considering the overall market, the industry group and the sister stocks simultaneously.

It can be clearly seen how the system works when all factors are in unison. The remaining sections of this paper are organized as follows.

Section 2 gives the background of the related studies. Section 3 introduces the system of data mining techniques used in this study and Section 4 provides results of the approach using the daily TSE stock price. The final section gives the conclusion and recommendations for future research.

This paper contributes to the study of intelligence forecasting. It would also help to realize profitable stock transactions if properly implemented. Many financial analysts and stock market investors seem convinced that they can make profits by employing one technical analysis approach or another to predict stock market. Some use time series models expressed by financial theories to forecast a series of stock price data.

ANN is usually chosen as a stock prediction tool besides other methods. Yet, these approaches cannot be employed alone because they are not directly applicable to predict the market value which is always subject to external impact. The nature of the stock market is affected by system uncertainties and other unknown random factors.

A hybrid stock trading system for intelligent technical analysis-based equivolume charting. Neurocomputing , 72 , — A hybrid fuzzy intelligent agent-based system for stock price prediction. International Journal of Intelligent Systems , 27 , — Thus, it necessarily indicates the hybrid use of technical analysis, time series forecasting, and possibly ANN.

In the following, a review is given to the recent development of hybrid approach for the prediction of the stock market. Technical analysis and ANN were used by Mandziuk and Jaruszewicz Mandziuk, J. Neuro-genetic system for stock index prediction. They introduced an experimental evaluation of a neuro-genetic system for the prediction of the short-term stock index.

Their results showed that prediction based on the neuron-genetic model worked well during both uptrend and downtrend. The approach developed by Tan, Quek, and Yow Tan, A. Maximizing winning trades using a novel RSPOP fuzzy neural network intelligent stock trading system.

Applied Intelligence , 29 , — Their intelligent stock trading system combines the superior predictive capability of a fuzzy neural network and the widely accepted MA and RSI trading rules. They presented the Wave Analysis Stock Prediction system, which was based on the neuro-fuzzy architecture that utilized the Elliott Wave Theory. The approach by Abraham, Nath, and Mahanti Abraham, A.

### BlueChipPennyStocks - The number one trusted financial newsletter site

Hybrid intelligent systems for stock market analysis. A hybridized soft computing technique for the automated stock market forecasting and trend analysis is used along with the principal component analysis to preprocess input data before they are fed to an ANN for stock forecasting.

Zuo and Kita Zuo, Y. Engineering Management Research , 1 , 46 — Chen, Su, Cheng, and Chiang Chen, T.

A novel price-pattern detection method based on time series to forecast stock markets. African Journal of Business Management , 5 , — This paper presents a system that incorporates the top-down trading theory first introduced by Livermore Livermore, J. Livermore believed that stock trends follow a trend line that can be used to forecast both in the long- and short-term.

Using stock data he concluded that stock-group behavior was an important indication to overall market direction, whether they are big or small—an indication embraced by the Wall Street but ignored by most traders.

He believed stock-groups often provided the key to changes in trends. As the favored groups of the moment became weaker and collapsed, a correction in the overall market was usually on the way. The same thing happened in year dot. The leaders flipped and fell first, and the others followed. Figure 1 depicts the block diagram of the system. Detail descriptions of the system are as follows.

Block diagram showing the operation procedure of the system. Examining Current Market Direction. The first step is to survey and to establish the current market direction and to investigate if the current line of least resistance is positive, negative, or neutral Livermore, Livermore, J.

Figure 2 shows that the TSI began its recovery in November of where a pivot point was formed and basic direction was changed. Tracking the Industry Group. The second step is to check the specific industry group.

Since the trades of TSMC are of interest, the semiconductor industry group is checked out to make sure that the group is moving along the line of least resistance, in order to increase the chance of making a profit on the selected trade.

Stocks do not move alone. When they move, they move in a group. The semiconductor industry group began its recovery in November of , the same time TSI began its recovery in Figure 2. The signals confirmed that the trend was now heading to the upside. Semiconductor group gave a clear signal that the trend was upward in November of Tandem trading involves comparing two stocks of the same group by comparing the stock of interest in trading with its sister stocks.

To trade in TSMC, the Taiwan MediaTek is examined as a sister stock. Both stocks bottomed out in December of and gave a signal, by a pivotal point, that the line of least resistance was positive. Scoring the Three Factors. In the fourth step, the previous three factors, namely the market, the industry group, and the Tandem stocks, are examined all together. It can be clearly seen in Figures 2 — 5 that all factors are in unison.

All the signals in the figures show a bottoming out in November and a reversal in trend, clearly indicating that the line of least resistance was now upward in direction. The rules to score the three factors are described as follows. Sum up the scores of Rules 1—3.

The summed score is considered one of the key factors in ANN. Integrated Data Mining Techniques for Stock Forecasting. Lastly, after all the trend lines are confirmed and the score is made, the next step is to make prediction of the future stock values. Forecasting stock market short-term trends using a neuro-fuzzy based methodology. The information is vital for the investor to buy at the start of an uptrend and to sell off just before the trend reverses.

Since the stock data-set does not show the correlation with stock behavior patterns, the techniques including technical analysis, Bayesian probability, dynamic time series, and ANN are integrated to figure the patterns, not necessary correlation only, from those massive and non-meaningful data.

More details are elaborated in the following subsections. Technical analysis and fundamental analysis are two major stock market analyzing methods used to predict short-term and long-term stock trends, respectively.

Mechanistic approach to generalized technical analysis of share prices and stock market indices. European Physical Journal B , 27 , — Technical analysis of stock trends 9th ed.

CRC Press ; New York , NY: AMACOM, American Management Association. Fundamental analysis considers commercial factors, such as financial statements, management ability, business competition, and market conditions, in order to determine the intrinsic value of a given stock.

Technical analysis helps recognize the price patterns according to the extrapolations from historical price patterns. In technical analysis method, chart patterns and technical indicators are the two major analyzing tools. Charting patterns such as head-and-shoulder and flag use stock charts to study the movement of the stock prices.

Technical indicators such as RSI and moving average are produced by specific equations to examine market signals and help investors make trading decisions. Popular technical indicators are usually classified into two major functions: An intelligent business advisor system for stock investment.

Expert Systems , 14 , — All the technical indicators utilized in this study are summarized in Table 1. Bayesian probability is a method used to update the probability estimates for a hypothesis once additional evidence is learned. One is probabilistic learning. BP can calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems. Each training example can gradually increase or decrease the probability when a hypothesis is correct.

Prior knowledge can be combined with observed data. Statistical Science , 8 , — Does a Bayesian approach generate robust forecasts? Evidence from applications in portfolio investment decisions. Annals of the Institute of Statistical Mathematics , 62 , — The formula of BP is expressed as follows. Table 2 tabulates several technical indicators calculated by BP. It also gives the result of prior probability and posterior probability. The value of each technical indicator stands for the performance accuracy of the individual stock according to the recent trading days.

The result can provide a standard of optimal decision-making for selecting significant technical indices. We then ignore the technical indicators with low values and select the significant ones. The values of the selected indictors become the inputs of the neural network in the next step. BP screens out the unnecessary technical indicators to prevent possible losing trades. From Table 2 , we select MA, ADX and William as candidates of significant technical indicators for the ANN in this research.

Exponential smoothing is a technique that can be applied to time series data to either produce smoothed data or make forecast. Time series data themselves are a sequence of observations. The exponential smoothing model for forecasting does not eliminate any past information but adjust the weights given to the past data that older data get increasingly less weight. Each new forecast is based on an average that is adjusted each time there is a new forecast error.

The raw data sequence is often represented by X t and the output of the exponential smoothing algorithm is commonly written as Equation 2, which may be regarded as the best estimate of what the next value of x will be. Exponential smoothing model selection for forecasting. International Journal of Forecasting , 22 , — Adaptive exponential smoothing methods allow a smoothing parameter to change over time, in order to adapt to changes in the characteristics of the time series.

Business cycle forecasts and their implications for high frequency stock market returns. Journal of Forecasting , 31 1 , 1 — Volatility forecasting with smooth transition exponential smoothing.

International Journal of Forecasting , 20 , — We present a new adaptive method, which enables a smoothing parameter to be modeled as a linear combination function of the trading volume, trend, and momentum. D i is the actual stock value of the i th day, F i is the forecast stock value at time i , and Z i is the deviation of the forecast value at time i.

A simplified forward-propagation learning rule applied to adaptive closed-loop control. Formal models and their applications—ICANN Vol.

The closed-loop structure of adaptive exponential smoothing methods. V i is the volume indicator of the i th day. Many ANN models have been evaluated against statistical models for market forecast. Application of neural networks to an emerging financial market: Forecasting and trading the Taiwan Stock Index. An intelligent agent based stock prediction system using hybrid RBF recurrent network. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans , 34 , — The most commonly used neural network technique in pattern recognition is Multilayer Perceptron MLP for the classification problems.

MLP architecture using back-propagation algorithm has gone into the application field of stock price prediction. Two important characteristics of the MLP are its non-linear processing elements PEs, applying the sigmoid function in this research and their massive interconnectivity. Sigmoid functions all share a similar S shape that is essentially linear in their center and non-linear toward their bounds that are approached asymptotically. Comparing sigmoid transfer functions for neural network multistep ahead streamflow forecasting.

Journal of Hydrologic Engineering , 15 , — The back-propagation rule propagates the errors through the network and allows adaptation of the hidden PEs. The MLP is trained with error correction learning, which means that the desired response for the system must be known.

Learning typically occurs by example through training, where the training algorithm iteratively adjusts the connection weights.

When the network is adequately trained, it is able to generalize relevant output for a set of input data. Training automatically stops when generalization stops improving, as indicated by an increase in the MSE of the validation samples.

MSE is the average squared difference between outputs and targets. Application of data mining techniques in stock markets: Journal of Economics and International Finance , 2 , — Stock market trading rule discovery using pattern recognition and technical analysis. Expert Systems with Applications , 33 , — In this study, the ANN input data include the top-down scores, selected key technical indicators and the forecasting value.

The number of hidden neurons is We set aside some samples for validation and testing. Figure 8 depicts the MSE decreasing after 57 epochs in TSMC. MSE decreases after a period of training in TSMC. Analytic confusion matrix bounds for fault detection and isolation using a sum-of-squared-residuals approach. IEEE Transactions on Reliability , 59 , — Figure 9 shows the classification results for the whole testing period.

Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. All correct predictions are located in the diagonal of the table, so it is easy to visually inspect the table for errors, as they will be represented by any non-zero values outside the diagonal. Overall, the true positive rate is For training and evaluating the performance of the presented approach, trading-day stock data were considered.

The system was retrained daily. Stocks were bought whenever the forecast was positive, and the position was closed when the forecast became negative. Transaction costs were taken into consideration and were amount to. TSMC stock was tested first. Experiments were carried out on a personal computer. The system was coded in Microsoft VBA and the neural network analysis was run in MATLAB.

It is noted that this period also includes the great recession, European debt crisis and the fiscal cliff of the United States in The moving hit rate is illustrated in Figure 10 which shows the hit rate since the first day of this period. A hit rate is a term used to describe the success rate of an effort. This rate compares the number of times an initiative was a success against the number of times it was attempted.

The moving hit rate of TSMC converges towards. The moving hit rate in the period of trading days.

The returns of investment and the variation of stock price in a year. To compare the performances of different time periods, this period is broken into three sub-periods; namely, one month, one quarter, and six months, respectively. The moving hit rate is a diagram which shows the hit rate since the first day of this period. The return of investment is 3. It can be seen in Figure 12 that the moving hit rate converges toward.

The moving hit rate in the first period of 20 trading days in TSMC. The returns of investment and the variation of stock price in the first period.

During this second period of 60 trading days, the results are even better. The return of investment achieves 8. The moving hit rate in the second period of 60 trading days.

The returns of investment and the variation of stock price in the second period. The portfolio return as compared to the initial investment is again considered. The return of investment achieves The rate of accuracy of this period is It is seen in Figure 16 that the moving hit rate converges toward.

The moving hit rate in the third period of trading days. The returns of investment and the variation of stock price in the third period. The performances of different periods of TSMC are summarized in Table 3. The proposed system made 82 transactions in the stock market during this period of trading days. The total trading period was also divided into three sub-periods that cover one month, one quarter, and six months, respectively.

The result of each period is summarized as follows: The accurate rates achieved were 70, 75, and The approach is also applied to Evergreen the same as in TSMC. Two hundred forty-trading-day Evergreen stock data were considered for training and evaluating the performance of the system which was retrained daily.

The proposed system made 64 Evergreen stock transactions in the market during this period of trading days. Although the stock value dropped by 7. To study the performance of different periods, we divide the periods into one month, one quarter, six months, and one year. The result of each period is summarized as follows Table 4: The rates of the stock price were 6. The proposed approach that integrated various data mining techniques has achieved remarkable results.

The investment returns of the TSMC and Evergreen stocks were As all sub-periods of the TSMC and Evergreen trading generated profits for various trading days, it is evident that the proposed system is highly effective for stock forecast.

Instead of giving a straight tool, this research proposes a methodological system to handle the stock forecast. Every stock may have different structures in the top-down theory, the dynamic time series, and ANN, and have different choices in the technical analysis and the Bayesian probability. Hence, applications of the methodological system are not limited to the TSMC and Evergreen stocks.

In our future work, we will apply the proposed system to the popular Nasdaq index of Stock Market as well as some of the companies listed in the Nasdaq index. The authors would like to acknowledge the support from the research projects NSC E and NSC E National Science Council of Taiwan.

Submit an article Journal homepage. Chin-Yin Huang Department of Industrial Engineering and Enterprise Information, Tunghai University, Taichung , Taiwan Correspondence huangcy thu. Received 19 Nov In this article Abstract 1. Literature Review and Related Work 3. The Proposed Methodology—Integrated Data Mining Techniques 4. Experimentation Setup and Test Results 5. Application of integrated data mining techniques in stock market forecasting. Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display.

Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom. Abstract Stock market is considered too uncertain to be predictable. Literature Review and Related Work Many financial analysts and stock market investors seem convinced that they can make profits by employing one technical analysis approach or another to predict stock market.

The Proposed Methodology—Integrated Data Mining Techniques This paper presents a system that incorporates the top-down trading theory first introduced by Livermore Livermore, J. Application of integrated data mining techniques in stock market forecasting All authors. PowerPoint slide Original jpg Technical Analysis Technical analysis and fundamental analysis are two major stock market analyzing methods used to predict short-term and long-term stock trends, respectively.

Summary of Technical Indicators. Bayesian Probability Bayesian probability is a method used to update the probability estimates for a hypothesis once additional evidence is learned. Results of Prior Probability and Posterior Probability Calculated by BP. Dynamic Time Series Theory Exponential smoothing is a technique that can be applied to time series data to either produce smoothed data or make forecast.

ANN Training Many ANN models have been evaluated against statistical models for market forecast. Experimentation Setup and Test Results For training and evaluating the performance of the presented approach, trading-day stock data were considered.

Summary of TSMC Stock Performance The performances of different periods of TSMC are summarized in Table 3. The returns of investment were 3. The Performance Comparison of Investment in TSMC. Application to the Evergreen Stock The approach is also applied to Evergreen the same as in TSMC.

The returns of investment were 8. The Performance Comparison of Investment in Evergreen. Conclusions The proposed approach that integrated various data mining techniques has achieved remarkable results.

Additional author information Chin-Yin Huang. People also read Article. Martins Iyoboyi et al. Adrian Bell et al. Browse journals by subject Back to top. Area Studies Arts Behavioral Sciences Bioscience Built Environment Communication Studies Computer Science Development Studies. Information for Authors Editors Librarians Societies.

### Application of integrated data mining techniques in stock market forecasting: Cogent Economics & Finance: Vol 2, No 1

Open access Overview Open journals Open Select Cogent OA. Help and info Help FAQs Press releases Contact us Commercial services. Accept This website uses cookies to ensure you get the best experience on our website.

Represent the position of the market on a percentage basis versus its range over the previous n -period sessions. Detect whether a stock is trading near the high or the low or in between of its recent trading range.

#### Financial Stock Market Forecasting using Data Mining by on Prezi

Measure the velocity and magnitude of directional price movements. Show the ratio of rising period to total which indicates control of buyers and sellers. Detect the offset level between daily stock price and a period moving average line. Determined directional movement by comparing the difference between two consecutive lows with the difference between the highs. Smooth the price movement so that the longer term trend becomes less volatile therefore obvious. Work as a filtered measure of the derivative of the stock price with respect to time.