Correlability of Solar Wind with Seismic Events in the Balkan Peninsula Zone

. The Solar Heliospheric Observatory (SOHO) satellite was launched on the 2 of December 1995 at L1 Lagrange point (1.5×10 km from Earth) with the purpose of gathering data for helioseismology, remote sensing of the solar atmosphere, and solar wind in situ . The satellite was positioned into orbit in early 1996, with data acquisition expected to commence on January 20 The correlation between increased values of solar wind parameters and earthquakes in the Balkan peninsula zone between 1996 and 2018 was made possible by data obtained through continuous proton density and proton velocity monitoring. The assessment of the anomalous threshold was based on statistically determined parameters due to the huge fluctuation of solar wind over time and distinct value increases of proton density and speed. Visual repre-sentations of proton density and proton speed were created for the time window preceding each earthquake after defining the boundary between normal and anomalous values. According to the chart analysis, increased proton density occurred in 40 of the 50 cases observed, whereas increased proton velocity appeared in 28 of the 50 cases. Using hypergeometrical probability and an unbiased test with randomly generated parameters, the discovered correlation was statistically verified. A retrospective selection bias analysis is also provided in the research paper.


Introduction
The term "precursor" refers to a wide variety of physical phenomena that are used for earthquake prediction (CiCErOnE et al., 2009). Earthquake prediction is a branch of seismology that seeks to predict forthcoming earthquakes in the short, medium, and long term. Deterministic approach in earthquake prediction requires the prior knowledge of the earthquake epicenter's geographical latitude and longitude, magnitude, and time, which can be seen as unrealistic compared to a probabilistic approach (similar to weather forecasting). The probabilistic approach yields a likelihood that an earthquake will occur in a given region at some time span. Every earthquake prediction method should be based solely on statistics i.e., probability (kAmEr et al., 2021).
in order for earthquakes to be predicted with high certainty, the search for new precursors is ongoing still to this day. Today, one group of precursors are astronomical precursors. modern research showed that there is a strong correlation between solar wind parameters (density, velocity, dynamic proton pressure, etc.) and global earthquakes with a minimum magnitude of m5. 6 (mArCHiTELLi et al., 2020). A positive correlation was also displayed with earthquakes that occur on a global scale and an increase in proton density (STrASEr & CATALDi, 2014). Aside from proton density, vertical Z Earth's magnetic field component was positively correlated with earthquakes with a magnitude m6.0 (STrASEr & CATALDi, 2014).
it is also worth noting that increased solar wind parameter values in the days leading up to the earthquake can't be considered as a precursor. Earthquake precursors are physical phenomena that occur as a result of an unstable subsurface state in a particular region. As a result, precursors are the result of accumulated stress. increases in solar wind parameters can only be considered possible triggers. increased solar wind parameter values can act as a "straw that broke the camel's back" by accelerating up an earthquake that would have happened anyhow and was tectonically controlled because of the tectonically accumulated stress (mULArgiA, 1997;mULArgiA, 2001) Even if the increased solar wind parameters do have an effect on tectonically controlled earthquakes acceleration, it is crucial to understand the modulation they cause. proposed mechanisms of the Sun's influence on earthquakes should be discussed to get a better knowledge of those modulations. SimpSOn (1967) presented the first mechanism, which states that the magnetohydrodynamical interaction of the solar and terrestrial magnetic fields can affect the Earth's angular velocity. Within that mechanism SimpSOn (1967) states there are two submechanisms that can alter the Earth's angular velocity, causing earthquakes. The first sub-mechanism states that the continents and oceans are dynamically unstable. Variations in the rotating velocity of the Earth can produce changes in the subsurface stress level (SimpSOn, 1967). The second sub-mechanism states that the Earth's viscosity prevents it from adapting to the angular velocity change. inability to adapt to a new state might produce tension in the upper part of the Earth's crust, causing an earthquake.
During the construction of a precursory hypothesis, the main goal is to make sufficient advancement compared to the already existing precursory hypothesis. Because of that, a balance should be made between generalities and particularities (rHOADES & EViSOn, 1989;mULArgiA, 1997). prior mentioned research (particularly mArCHiTELLi et al., 2020 andCATALDi, 2016) gave a foundation for a statistical analysis of the correlation between solar wind parameters and earthquakes that occur in a region of complex (seismo)tectonic architecture, such as the Balkan peninsula.

Methodology
The CELiAS proton monitor (solar wind proton density and velocity data) and the United States geological Survey (USgS) Earthquake catalog were used in this study. These two databases were used for a period of time ranging from 1996 to 2018.
The proton density and velocity datasets contain values measured every 30 seconds, i.e., a dataset of 1051200 datapoints is regarded full for a year with 365 days, while a dataset of 1054080 datapoints is considered full for a leap year (366) days. The true datapoint number in each dataset was computed, and the results ranged in the order of 90 percent capacity, except for 1998, where the datapoint capacity in the dataset is 59 percent. The explanation for the discrepancy is that the satellite lost control, lost power, and was no longer directed at the Sun from June 24 until October 29. The satellite was put back into operation towards the end of October 1998.
From 1996 to 2018, the USgS website provided an earthquake catalog at a worldwide scale with a minimum magnitude of m5.0. Datasets containing information about earthquakes in the Balkan peninsula zone (mainly Dinarides, Carpatho-Balkanides, and the pannonian Basin) were selected from the worldwide datasets. in the Balkan peninsula region, 52 earthquakes with a minimum magnitude of m5.0 occurred over a 23 year period. Only 50 earthquakes were considered in the final analysis since two earthquakes that occurred on the territory of the republic of Serbia occurred during a period when the satellite did not acquire any data. Three earthquakes that are thought to have occurred on the border between Albania and greece were also taken into account. it should also be mentioned that the republic of Hungary's territory was assumed to span a larger region of the pannonian basin, despite the fact that no earthquakes with a minimum magnitude of m5.0 occurred on the territory from 1996 to 2018. Aside from the earthquakes already described, three more earthquakes in the Adriatic Sea region have been added.
it is worth noting that the earthquakes were simply filtered from the USgS database by the name of the country in which they occurred. The main goal was to cover as much of the Balkan peninsula as possible, including the Dinarides, Carpatho-Balkanides, and pannonian Basin, which are three separate geological units in the Balkan peninsula. The digital elevation model (DEm: see references nATiOnAL CEnTErS FOr EnVirOnmEnTAL inFOrmATiOn, for the internet link) base can be used to map the earthquakes outlined previously (Fig. 1). The DEm utilized was gLOBE, which has a 1 km resolution. Figure 1a shows that the majority of earthquakes occur in Albania (Dinarides) and romania (Carpatho Balkanides), accounting for 68 percent of all earthquakes (34/50). The area of the researched surface in relation to the Earth's surface can be observed in figure 1b. The examined area is only 0.41 percent of the overall Earth's size, with a total area of 2 092 846 km 2 .
The analysis was carried out with a total of six steps which will be discussed below: Step 1: Basic statistical calculation for obtaining the information about the mean value, standard deviation, minimum, and maximum values for the yearly dataset. This type of data provides a foundational understanding of the dataset.
Step 2: Determination of the anomaly threshold, which is the line between what is considered normal and what is considered anomalous. This is done by calculating skewness and kurtosis (and two standard errors of skewness and kurtosis), as well as basing the anomalous threshold on n standard deviations. According to the "33/67/99.7" rule (empirical rule used to estimate the percentage of data that falls within one, two, or three standard deviations), three standard deviations for normal distributions can be considered anomalous since they correspond with only 0.3 percent of data. The n is greater for distributions with larger skewness. This stage calculates the percentage of data that is equal to or greater than the anomaly threshold; this information is used to check the quality of the determined anomaly threshold. That value should not exceed 1% on average throughout the whole 23 year sample.
Step 3: Construction of graphic representation (charts) for the two weeks leading up to each earthquake.
Step 4: identifying which earthquakes have a proton density (velocity) anomaly in the two-week, one-week, and four-day periods prior to the earthquake.
Step 5: Hypergeometrical probability based statistical significance test. in this stage, the number of unique anomalous days for each year is calculated, and the hypergeometrical probability is calculated to see if the anomaly occurred more frequently than might randomly. in addition, an independent test is run to corroborate the hypergeometrical probability as a statistical significance test.
Step 6: Analysis and interpretation of the results from the previous five steps. The correlation is thought not to be significant, i.e., can be thought to be random, if the probability obtained by the hypergeometrical probability coincides in the range of 10% with the number of times analyzed earthquakes had an anomalous proton density (velocity) value in the time window chosen.

Results and Discussion
in a yearly format, Table 1 shows general statistical information of the proton density parameter. The mean value in the database (from 1996 to 2018) shows a decreasing trend until 2008, after which there is a modest increase until 2015, after which the mean yearly proton density decreases again (Fig. 2, middle). The average number of protons per cubic centimeter over the entire dataset is 5.3.
With the exception of 2018, which has a standard deviation of 2.86 cm -3 , standard deviation values are consistent, ranging from 3 to 5 protons per cubic centimeter. The overall dataset's mean standard deviation is 4.08 cm -3 .
For the time span 1996 to 2010, the minimum values are -1. This is not a representation of the measured proton density, but rather a lack of data (mainly of technical nature). minimum values after 2010 are either 0 or a number greater than 0. Values less than 0 were filtered for the period up to 2010.
The maximum values in each dataset range from 53 (1996) to 127 (2013) protons per cubic centimeter. With two maximums (2003 and 2012) and three minimums (1996, 2007, and 2018), there is some type of pattern in the maximum levels (Fig. 2, bottom).
Because of the displayed periodicity of the mean and maximal proton density values, a correlation with the number of Sunspots in the same period was performed ( Fig. 2, top). To visually represent the Sun spot number, a database of Sunspot numbers was taken (see references SiLSO data, royal Observatory of Belgium, Brussels for the internet link) and a local regression (LOESS) filter was applied.
For the period 1996 to 2018, the sunspot chart (Fig.  2, top) shows three minimums and two maximums, indicating two cycles (Solar cycles 23 and 24). minimum values can be recorded in 1996, 2008, and 2018 (on average every 11 years), where as maximum values are also located at an 11 year time period.
Two correlation coefficients, pearson's and Spearman's, were calculated to assess the correlation between these three factors. pearson's correlation coefficient is a traditional correlation coefficient that can occasionally reveal flaws, such as when one data point appears to be outside the general trend of other data points. pear son's correlation coefficient will yield a lower value than the true correlation coefficient (SCHOBEr et al., 2018). To acquire more detailed information, both correlation coefficients were calculated, and a description and a color based on the value range were assigned, as reported by SCHOBEr et al. (2018). Table 2 shows that maximum proton density values have a weak correlation with the Sunspot number calculated using both pearson's and Spearman's correlation coefficients. On the other hand, the pearson correlation coefficient shows a weak correlation with annual mean proton density values, whereas the Spearman correlation coefficient shows a moderate value.
To determine the anomalous threshold, the skewness, kurtosis, and two standard errors of skewness and kurtosis must be calculated first (table 3). From 1.76 in 1996 to 5.34 in 2007, the skewness parameter shows a wide range of fluctuation. Skewness takes an average 3.37 value over the course of the observed period. Because the skewness parameter informs us about the "length of the tail" of the distribution, the skewness is minimal for the lowest maximum value of yearly proton density. The contrary is not true, skewness is not maximal for the highest maximum values in the dataset (2013) Average values of skewness and kurtosis are higher than the approximated two standard error values of skewness and kurtosis for the investigated period. This information leads us to the conclusion that the proton density distribution is highly horizontally and vertically distorted, as was verified graphically in figure 3. Because of that, higher multiplicities of standard deviations can be used as the anomaly threshold for each year. For this research, five standard deviations will be used as the anomaly threshold. The anomaly threshold for each year will thus be calculated as:   pd anomaly (y) = pD m (y) + 5SD(y)

Fig. 2. Sunspot number (top) correlation with yearly mean proton density values (middle); Yearly maximal proton density values (bottom)
where: pD anomaly (y) -anomaly threshold for each year, pD m (y) -mean proton density value for each year, SD(y) -standard deviation calculated for each year.
it is possible to calculate the anomaly threshold and the percentage of data that is equal to or greater than the anomaly threshold using the previously displayed data. The maximal anomaly threshold can be obtained for the year with a comparatively high yearly mean proton density and high standard deviation, since the anomaly threshold is solely determined by the mean yearly proton density value and the standard deviation value. The anomaly threshold values range from 17.97 protons per cubic centimeter (2018) to 33.46 protons per cubic centimeter (2002), as shown in table 4. The overall dataset's mean anomaly threshold value is 25.69 protons per cubic centimeter. The percentage of data that is equal to or higher than the anomaly threshold varies between 0.24 percent (1996) to 0.67 percent (2015). The average percentage of data over the anomaly threshold is 0.49 percent, which is lower than the expected value of 1%.

Correlability of Solar Wind with Seismic Events in the Balkan Peninsula Zone
Geol. an. Balk. poluos., 2021, 82 (2)   After calculating the anomaly threshold, determining whether earthquakes had anomalous proton density values in the two weeks, one-week, and four-day periods leading up to it was done. Table 5 shows the individual earthquakes examined in this study, as well as the occurrence of the anomalous proton density value in the time interval before it denoted by the symbol "•". Table 5 shows that in the two-week period leading up to an earthquake, 80 percent of earthquakes (40/50) exhibit anomalous proton density values. This number lowers to 62 percent, or 31 out of 50 earthquakes, in the week leading up to an earthquake. For the four days leading up to each analyzed earthquake, 23/50 (46 percent) showed a proton density anomaly.
The first step in determining the statistical significance of these findings is to establish the number of individual anomalous days in a calendar year. To begin, a definition of an anomalous day should be established: "Every day with at least one proton density measurement higher than the anomaly threshold value for a particular year is an anomalous day." This notion of an anomalous day has some disadvantages. The first is that it ignores the duration of the anomaly, i.e., one anomaly with a duration of 30 seconds and another with a duration of several hours are not the same. in keeping with the survey's generalities, it is considered that every anomaly has the same relevance, regardless of its duration.
After determining the number of individual anomalous days for each year in the dataset, the hypergeometric probability can be calculated, which is the probability that in a year with 365 days and, for example, 22 individual anomalous days, at least one anomalous day can be found in the chosen time window (in this case 14, 7 and 4 days prior to each earthquake). This probability shows the likelihood of finding a proton density anomaly in the time window chosen before each earthquake at random.
To validate the hypergeometric probability accuracy, or whether it is a good probability model for probability computation, an independent test was created with the same input parameters as the hypergeometric probability, with the exception that all of the parameters are picked at random. Figure 4 depicts the flowchart for such a test.
There are eight steps in the flow of the test probability. The first step is to choose a number at random from 1 to m-n (i.e., 365-14/7/4). The remaining n-1 numbers are calculated by adding to the first number +1, +2, +3, etc. This group of numbers is referred to as "population A," and it repre-

Correlability of Solar Wind with Seismic Events in the Balkan Peninsula Zone
Geol. an. Balk. poluos., 2021,82 (2), 69-83 sents the time period preceding the earthquake. The number of anomalous days in a year is represented by population B, which is picked at random from 1 to 365 integers. After both population A and population B have been determined, the two populations can be compared to see if they have the same numbers. After iterating the method 100 000 times, the percentage of times the two populations had at least one matching number was calculated.
This percentage is an approximation of the hypergeometric probability and is used to check the quality of the probabilities obtained. Table 6 shows the hypergeometric probability, the test probability based on the previously shown algorithm, and the difference between these two probabilities for the time windows of 14, 7, and 4 days before the earthquake. Table 6 shows that the average hypergeometric probability values for the time windows of 14, 7 and 4 days are 82.52 percent, 59.11 percent, and 40.32 percent, respectively. These probabilities are in line with the algorithm's probability, which is in the 2 percent range (maximal discrepancy being 2.26 percent for the time window of 7 days). Even if the difference between the two probabilities is small, it can be explained in two ways.
The first reason for the disparity is that as the number of algorithm iterations approaches infinity, the algorithm probability tends to hypergeometric probability. By increasing the number of iterations, this effect can be reduced.
The hypergeometric probability is a cumulative probability of having all anomalous days in a time window sample, which is the second cause of the tiny disparity between the hypergeometric probability and the algorithm probability. Even though the chance of having more than two anomalous days in the time sample is small, it is accounted for in the hypergeometric probability.
With the two explanations in mind, it is easy to see why there's such a modest difference between the hypergeometric and algorithm probabilities. As a result, the hypergeometric probability model is considered to be a satisfactory probability model for this study.
The obtained hypergeometric probability was compared to the calculated percentage of proton density anomaly occurrence for the time window chosen before the analyzed earthquakes on the Balkan peninsula zone (table 7). if the two values are within 10% of each other, any occurrence of proton density anomaly in the time window preceding an earthquake is regarded to be random. As a result, there is no statistically significant correlation between the two events.
The same method was applied for the proton velocity parameter, with anomalous proton velocity values found in 56 percent (28/50) of earthquakes in the two-week time frame prior to it, 32 percent (16/50) in the one-week time window, and 18 percent (9/50) in the four-day time window. Similar to  Table 6. Hypergeometric probability, algorithm probability and difference for the proton density parameter. the proton density case, no statistically significant correlation was discovered. The last possibility for correlation remaining is to check if the proton velocity parameter was a small subset of days with both the density and velocity anomalies. Only 66 days (0.7 percent of the dataset) showed both density and velocity anomalies across the 23 year period. Only 12% (6/50) had an anomaly over the two-week period, 4% (2/50) during the one week period, and 2% (1/50) during the four-day period preceding the earthquake. There was no statistically significant correlation found.

Increased Anomaly Threshold Test
Because of the skewness of the proton density distribution, bigger n standard deviation multiplici-ties could be used. The increase from five to nine standard deviations resulted in an average increase to 42 protons per cubic centimeter throughout the entire sample. This increase is 40% greater than the prior anomaly threshold, however, it has resulted in a 75% reduction in the number of anomalous days. An average of 11 days each year were marked as anomalous during the investigation period.
Only 28% (14/50) had an anomaly in the two-week period, 22% (11/50) in the oneweek period, and 12% (6/50) in the four-day period before the earthquake when the entire sequence was repeated as previously shown for the proton density parameter.
The predicted values were derived by recalculating the hypergeometric probability with the updated values (33 percent for the two-week period, 18.67 percent for the one-week period, and 11 percent for the four-day period). As can be observed, the expected values derived using hypergeometric probability and the calculated values are within 10% of one another, which is enough to rule out any statistically significant correlation even with higher proton density values.
The proton velocity parameter was treated in the same way, with the n standard deviations increased from three to three and a half. This was a 6 percent increase, but it resulted in a 47 percent reduction in the number of anomalous days. in the two-week period, 28 percent (14/50) had an anomalous value, 14 percent (7/50) had an anomalous value in the oneweek period, and 6 percent (3/50) had an anomalous value in the four-day period before an earthquake. There was no statistically significant correlation found. it is worth noting that even when the dynamical anomaly threshold is disregarded (the threshold is based on the mean yearly proton density/velocity value and the standard deviation for the year) and an exceptionally high and constant anomaly threshold is applied (100 cm -3 for proton density and 1000 km/s for proton velocity), only six individual anomalous days for the proton density are observed and 14 for the proton velocity parameter. Those two figures, respectively, represent 0.07 percent and 0.1 percent of the overall data set. Only one earthquake (Albania, April 9 th , 2001) displayed a proton density anomaly prior to the earthquake, while none had a proton velocity anomaly. There was no statistically significant association, as with the other examples previously shown.

Conclusion
A statistical correlation between increased solar wind parameters (proton density and velocity) and earthquakes on the Balkan peninsula zone between 1996 and 2018 was presented in this study paper.
in the two weeks leading up to an earthquake, the presented increased proton density parameters show an 80% (40/50) correlation. This is the highest correlation found in this study. There is no substantial statistical correlation between these two occurrences on the Balkan peninsula zone, according to statistical verification. proton velocity, like proton density, had no statistically significant correlation with earthquakes on the Balkan peninsula, even when they occurred 56 percent of the time prior to earthquakes. A small group of days with both proton density and velocity anomalies produced no statistically significant correlation when studied.
increasing the anomaly threshold for the proton density parameter from five to nine standard deviations and for the proton velocity parameter from three to three and a half standard deviations reduced the number of anomalous days by 75 and 47 percent, respectively. This reduction in the number of anomalous days did not result in a statistically meaningful improvement in the association. Furthermore, ignoring a dynamical anomaly threshold and concentrat-ing on rare occurrences (100 cm -3 and 1000 km/s) did not produce any significant results.
in summarizing the subjectivities in this study, the primary focus must be on the selection of the research area. mArCHiTELLi et al. (2020) found a positive association on a worldwide scale, however, this is of no practical utility because knowing when an earthquake will occur without knowing where it will occur is useless. The focus of this research article was on a smaller scale with a complex geological setting, however, the subjectivity involved in selecting such a location could not be adequately quantified.
The second subjectivity involves expanding the time span of 6.9 days stated by STrASEr & CATALDi (2014). As the time window is extended, days that are more anomalous will surely occur before an earthquake. This subjectivity, like the preceding one, cannot be measured, but it is thought that statistical verification will eliminate it.
The last subjectivity is that when kurtosis increases, the value of variance decreases (TABACHnik et al. 2007), and so the value of standard deviation decreases. if the standard deviation is less than the true value, the anomalous threshold will be less than the true value as well. Although this subjectivity is not quantifiable, it is eliminated by the use of the test of increasing anomalous thresholds.
Even when viewing this research topic with a critical mindset, it is believed that subjectivities in this study did not have a substantial, if any, influence. gEL - LEr et al. (1997) also express skepticism regarding the discovery of a new, major precursor, stating that the probability of discovering the next one decreases with each new attempt, and that the possibility of discovering a novel precursor is now exceedingly minimal.
To locate a novel precursor candidate, you must first have a well-understood and well-established mechanism for such an occurrence. There is no documented and verified mechanism that could explain the solar wind generating earthquakes phenomenon, for example. The earthquakes are assumed to be tectonically controlled, but the increase in solar wind parameters acts as a trigger.
Even though the presented research found no statistically significant association for the Balkan peninsula, it does not rule out the positive correlation found by mArCHiTELLi et al. (2020) on a worldwide Filip ArnAUt, DEjAn VUčkoVić, iVAnA VASiljEVić & VESnA CVEtkoV scale. To establish it as a viable method, first, the mechanism should be established, then the usage of a statistical method to demonstrate the statistical significance of such a method with high confidences and low false alarm values should be undertaken. This approach has yet to be established for any of the known precursors. сутна у 80 % случајева (40/50). Оваква вредност