PREDICTION OF A RELATIONAL DATABASE ’ S OPERATION IN THE INFORMATION SYSTEM

Introduction. There exist various methods and systems of testing databases (DB) used in information systems (IS) [1...4]. They allow to simulate certain situations by changing the number of records in the table, increasing users’ number, a certain type querying intensity changes, etc. [5]. However, in most cases, the testing plan is based onto experience and intuition of the engineer/user conducting the test series and the entire database represents such test object. Analysis of recent research and publications. Most aspects of DB operation and its tables’ content are determined by the sequence of requests arriving to the input [5...7]. The essence of predicting the IS behavior consists in extrapolating trends of some system’s characteristics [8]. At this research the trend will be determined by approximating the time series’ actual values at querying the database. Further, it makes sense to limit such analysis with univariate forecasting functions where the argument factor is the time, considered as integral index of all factorial traits’ combined effect. In practice the most often used are about 15 univariate forecasting functions. In the case under consideration, only functions describing some IS characteristics’ increase or decrease that affect the systems ‘performance represent the interest subject. So further reasonable is to limit with two forecasting functions (regression functions): — linear 1 y a b t = + × , (1) — parabolic

Introduction.There exist various methods and systems of testing databases (DB) used in information systems (IS) [1...4].They allow to simulate certain situations by changing the number of records in the table, increasing users' number, a certain type querying intensity changes, etc. [5].However, in most cases, the testing plan is based onto experience and intuition of the engineer/user conducting the test series and the entire database represents such test object.
Analysis of recent research and publications.Most aspects of DB operation and its tables' content are determined by the sequence of requests arriving to the input [5...7].The essence of predicting the IS behavior consists in extrapolating trends of some system's characteristics [8].At this research the trend will be determined by approximating the time series' actual values at querying the database.Further, it makes sense to limit such analysis with univariate forecasting functions where the argument factor is the time, considered as integral index of all factorial traits' combined effect.
In practice the most often used are about 15 univariate forecasting functions.In the case under consideration, only functions describing some IS characteristics' increase or decrease that affect the systems 'performance represent the interest subject.So further reasonable is to limit with two forecasting functions (regression functions): -linear 1 y a b t = + × , parabolic 2 2 0 1 2 y a a t a t = + × + × , where y1, y2 -some of system's characteristics, e.g., average delay of queries execution across 24 hours period; t -time unit (e.g., 24-hours period); a, a0, a1, a2, b -coefficients subject to determining.The Aim of Research.To enable the purposed application of software targeting the IS improved operation, this research aim is assigned as developing a method for predicting the system behavior with identification of database components critical in terms of their impact on the whole IS performance.
Main Body.Let we consider a time series of requests received by the IS during observed time delay τ from the start t 0 to the end t e of the observation cycle.Each query q can be written as , , where txt -query text; s t , f t -query execution start and end time, respectively.
Use of the s t and f t at (3) allows the assumption of every query's unique character and consecutively makes possible to represent the queries' time-sequence as a set { }, 1,  where n -number of queries arrived to IS during τ delay.
To determine the queries' temporal characteristics the notion of requests' intensity distribution is used [9,10].For this the For fd correct representation it makes sense to adjust the requests time series by deleting the IS downtime and maintenance works time from the observation period i.e. excluding time when the system does not perform its basic functions.
The linear approximation of intensity function fd.To find the coefficients a and b at (1) we shall use the principle of least squares, which determine the extent of two functions values dispersion S .Then, according to [11] ( ) The intensity function fd parabolic approximation.According to the principle of least squares, for the forecasting function (2) [11] 1 ( ) 0 where P 1 , P 2 -Chebyshev's polynomials: Selecting the approximation function.To choose one of the prognostic functions the criterion can be based on a comparison of average prediction errors calculated for each of the two (linear and parabolic) functions: Preferred shall be the function, whose prediction error value is the least.If ( ) y fdr t = -the found regression functions, then the extrapolation operation shall be ( ) where L -prediction period.System loading trend evaluating.Here the system loading to be meant as the ratio of an observation period length to the total execution time of queries arrived during that period.Preparing data to regression function evaluation represents the calculation the each query interval's i t Δ execution time.Here we introduce the concept of system loading function ( ) fl t taking for fixed time moments i t values i l .

( )
where k Δ -number of queries arrived to IS during the interval i t Δ sj t -start point of j-query execution; fj t -end point of j-query execution; After performing the above operations, we obtain a regression function ( ) y flr t = according to (5) allowing to predict the system loads.
Evaluating the queries' average execution time trend.This characteristic allows estimating the system load degree influence onto queries execution time.The speed of query processing depends on several factors caused by database condition, e.g., the number of data contained at DB tables.In addition, the LAN can be loaded with other tasks execution: sending electronic documents, requesting to Internet resources, email, etc.
Preparing data for regression function estimation refers to calculating the average time for each query interval i t Δ .Entering a function of average query execution time ( ) fta t taking a value i Ta for fixed time i t moments, on its basis we can obtain a respective regression function ( ) y ftar t = that, according to (5) can predict the average execution time.
where sj t -start point of j-query execution; fj t -end point of j-query execution.
Analysis and prediction of peak loads.We assume the load for some time, to be peak one if its value is Kp times greater than the average value for the period of observation.
Average IS load during the time of observation τ Let we enter the peak load changes function depending on time ( ) fp t , which value for every Identification of the most frequently used tables and fields.When DB testing we must identify tables that are used more frequently while requests' processing.If the queries execution time is critical, we can use special tools to improve the IS performance, allowing to eliminate "bottlenecks" (replication, restructuring, indexing, etc.).We did specify the query model [9] including of information on used tables and query type , , , , where { }, 1, On the MT basis we can form a list Lmt, arranging the elements by the degree of decrease in pa- Identifying the tables and fields most frequently used at particular type queries.The proposed method of creating a list of most frequently used tables is easy for applying to a certain type of queries.For example, to find the most frequently used table type in queries SELECT, we must create a subsetbased on the queries set Qsel q q Q Ty ts = ∈ ∧ = .
For queries q Qsel ∈ we can establish both the tables set MTsel and hierarchically arranged list . For information about tables, whose entries amount grows the faster, we can analyze the requests type INSERT and DELETE, getting a set of numbers that define entries adding (removing) during the period of observation for each database table and the correspondingly ordered list.Results.Presented are the results of the proposed method of information system "Accounting at medical institutions" analysis.
The system uses a relational database consisting of 98 tables.
The study covered three months of the system operation.Fig. 1 shows a diagram of average execution time for queries of SELECT type Calculations according to (7) show that during next 6 months the average execution time may increase by 3,4 times.
Recommended is to improve the IS performance using software tools.For that, the ( 8) and ( 9) were used to obtain lists of most frequently required tables and their fields (Tables 1 and 2).Proposed is to introduce the indexation for: field idperson at table planersearchkind, field idlaboratory at table planersearchkindmore, fields datefactservicein, datesend at table factservice, and to revise the structure of tables planersearchkindmore and factservice, expanding them with such fields as: pllaboratory, plisown, plcode, duplicating the data of laboratory name, its institutional affiliation and laboratory analysis code respectively.
Performed is a study of IS peak loads (6) for Кр=1,5 value.The respective diagram (without byworking-hours details) is shown at Fig. 2.
Identified are the queries, creating increased system load during of peak load periods (Monday, end of the month).Recommended is to verify the possible use of materialized representations for these queries [9].Conclusions.The proposed forecasting method, based on the IS DB requests time series' analysis, allows to predict whole system behavior, identifying its elements which may affect its performance in the future and at peak loads, also allowing to determine approximate term of necessary modernization, with specific-purpose tests performing and simulation of system behavior.
A.B. Kungurtsev, S.L. Zinovatnaya, Munzer Al Abdo.Prediction of a relational database's operation in the information system.A necessary condition of any information system's efficient operation is to ensure that response time to user requests satisfies the subject area requirements.Actual is the problem of automated methods for choosing the tools reducing the execution time of queries to the database.Formal description of the system's state and development gives the possibility to apply the most effective ways of speeding up queries in time when there is a need to improve system performance.While researching some dependences are obtained, allowing to predict the intensity of requests entering the information system, to determine the trend of system's loading, to identify peak periods and set of the most commonly used tables and fields.The results on system operative features researching provide the possibility to predict system's behavior and to take timely measures to maintain the required level of performance.Keywords: information system, relational database, query, test, modeling, prediction.
Received December 24, 2014 Structure of considered and other lists may serve as a starting point for predicting changes in IS "local" performances.

ПраціFig. 1 .
Fig. 1.Changes in queries' execution average time Fig. 2. Peak loads identification time interval is defined as a constant for the entire period.The value t Δ can be set on the basis of the following considerations.The vast majority of observed IS functions have periodic character.Often the very short duration period is equal to the length of one working day.Selecting the t equal to one day.To search for peak loads we must significantly reduce this interval duration, for example, up to p Δ =0,3 h.Otherwise the averaging can "hide" the surging load.Further p Δ reduction is not convenient because a transient increase in load is undetectable for IS user.Average IS load for some interval of t Δ j p Δ period.
Especially for queries updating the data (INSERT, UPDATE, DELETE), we introduce the operation of identification of table containing the updated data of tables used at query q ; y T -query type.the operation result is "true" value, if the field ij f belongs to table i NT and "false" value if otherwise.

Table 1
Number of requests to DB tables