

Beneath the Surface
Regional Disparities in Aquaculture Production and Volume
Our project, "Aquaculture in the Philippines", utilizes data science techniques to analyze economic trends, assess regional variations, and evaluate the investment potential of the country's aquaculture industry. By examining the growth and distribution of various marine species across different regions, we aim to provide data-driven insights into the viability of further investment in Philippine aquaculture.
Rationale
There are three fisheries sectors in the Philippines: municipal fisheries, commercial fisheries, and aquaculture.
As of 2024, approximately 64.1% of the total volume of fisheries productions comes from aquaculture, contributing 683,934.30 Metric Tons of production. Despite a -7.3% decrease of production from 2023 to 2024, aquaculture still remains effective as a constant and reliable source of fish, especially for a country that subsists on a fish-based protein diet.
An attribution to its rising increase of production are due to overfishing and illegal fishing practices that destroy marine environment, which leads to less hauls for other fisheries sectors. This also includes rising sovereignity tensions between the Philippines and the China for control of the West Philippine Sea, driving away local fishers from their designated areas. This, in turn, creates a need for adaptation to aquaculture practices.
The United Nation's Food and Agriculture Organization (FAO) report titled "The State of World Fisheries and Aquaculture 2024" released in June 2024 also highlights Philippines as one of the few nations to dominate the global aquaculture industry. In order for the Philippines to capitalize future positive market implications, the shift towards aquaculture requires proper investments in infrastructure, training for local fishermen to transition smoothly and sustainably from recent events, and finding specific trends that will boost existing aquaculture production.
Problem
Philippines already has a domineering aquaculture production, but recent statistical data shows a decrease in production a year prior.
Solution
By analyzing previous quarterly data about aquaculture, the group aims to find specific parameters and analyze trends that could have affected the value and volume of production. This also includes recent events or policies that directly or indirectly affect the sector.
Research Questions
Research Question 1
Does an increase in aquaculture production volume lead to a corresponding increase in its economic value?
Null Hypothesis
There is no significant correlation between aquaculture volume and value.
Alternative Hypothesis
There is a significant correlation between aquaculture volume and value.
Research Question 2
Is there a significant difference in aquaculture volume across different regions in the Ph?
Null Hypothesis
There is no significant difference in aquaculture volume across different regions in the Philippines.
Alternative Hypothesis
There is a significant difference in aquaculture volume across different regions in the Philippines.
Research Question 3
Is there a significant trend in the volume of aquaculture production in the Philippines over the years?
Null Hypothesis
There is no significant trend in the volume of aquaculture in the Philippines over the years.
Alternative Hypothesis
There is a significant trend in the volume of aquaculture in the Philippines over the years.
Data & Methods
As of March 2025, our group have preprocessed the dataset necessary for the project.
Data Collection
To address the problem and research questions, our group opted to collect data from PSA's OpenStat website. The Philippine Statistics Authority's OpenStat website is an open data platform that shares national and regional statistical data in the Philippines.
About the Dataset
For this project, two datasets were used: quarterly aquaculture value and quarterly aquaculture volume for each region, scoping the first quarter of 2020 to the fourth quarter of 2024. The first dataset contains the value of each aquaculture specie for each region while the second dataset contains the volume of each aquaculture specie for each region. The finalized dataset that will be used for analysis contains six columns, namely Species, Geolocation, Year, Quarter, Value, and Volume, and 6800 samples.
Data Preprocessing
For data preprocessing, four major steps were used on both dataset to arrive on the combined final dataset. Before proceeding to the four major steps, a preliminary check was done on both datasets to gain a brief understanding about the structure and format of both datasets.
Step 1
Check Dataset Structure
The first step done was to check the structure of both datasets. On our Colab notebook, we were able to identify that both datasets have the same structure: both contains species, geolocation, and the quarterly values/volumes for each species/geolocation grouping and a wide formatting due to the nature on how quarterly value/volumes were structured.
Step 2
Apply Imputation
The second step was to identify missing values and apply imputation, if necessary. It was found out that both datasets contain missing values in the form of "...
" strings. From here, we preprocess those strings in such a way that we could apply KNN imputation to fill up the gaps. This method of imputation was used because standard imputation methods, such as mean, mode, median, backward, and forward fill, may lead to reduced variance and a possibility of data leakage. KNN imputation utilizes other features to identify what value to use instead of relying on one column to infer what value to impute that's why it was chosen as the method for imputation.
Step 3
Reformat Dataset
The third step done was to reformat the columns and the dataset themselves. We've skipped the outlier part since these values may have been caused by outside interactions/factors. We also took note of the source for these datasets because the sampling and collection weren't done by our group but rather by professional individuals which means that the values provided on these datasets are reliable and consistent. Based on the values of the Species column, we can remove leading periods. The same goes with the Geolocation column. After reformatting both columns, we transform the wide formatted dataset into long formatting using melt()
in order to properly structure the datasets into a format suitable for time-series analysis.
Step 4
Combine & Finalize Dataset
The fourth step was to combine both datasets into a finalized dataset that will be used for the analysis step. The finalized dataset can be found here.
For references, you may refer to Part 1 and 2 of the Google Colab notebook.
Exploratory Data Analysis
This subsection is a summary of a lengthy process. To know more, follow this link
Now that we've successfully preprocessed our data, we proceed to visualize and analyze our dataset through a series of plots from univariate to multivariate analyses.
Univariate Analysis
Univariate analysis focuses on the smaller picture of identifying characteristics of a feature from our dataset. On our Colab file, we were able to plot several of them for each possible feature.
Based on our plots, we were able to identify that our data contained a sizeable amount of values that are past 3 standard deviations. We won't classify them as outliers and eventually remove them because they are values that were gathered by professionals and with our trust in them, we assure that the data gathered weren't faulty for the next few section of this portfolio. This means that the distribution of our data is prominently righ-skewed followed by a very long tail. The outlier values actually highlight the presence of large producers in the country while the majority of the production entities produce at smaller volumes.
Bivariate Analysis
Bivariate analysis on the other hand pairs up two features and identifies relations between them.
From our plots, we were able to find out that there are species and regions that emerges in terms of economic value and production volume. Milkfish leads the charts with the highest mean economic value while seaweed dominates total production volume, which sparks contrast on what is valued and produced more in the market. On the other hand, Central Luzon actually leads as the economic powerhouse in the Philippines.
Multivariate Analysis
Finally, we analyzed groups of features in order to identify trends from each species and region.
Based on our plots, the BARMM region leads in production volume while Central Luzon dominates in economic value. Among species, milkfish generates the highest economic value while P. Vannamei (whiteleg shrimp) shows consistently high production volumes. Seaweed production remains significant across multiple quarters, with substantial regional variations in both production and economic returns. The box plots reveal that grouper has the highest economic value distribution per unit despite not having the largest production volume, indicating it's a high-value species compared to others. These reveals distinct regional variation, reflecting diverse ecological conditions and market specializations throughout the Philippine archipelago.
Hypothesis Testing
Research Question 1
Does an increase in aquaculture production volume lead to a corresponding increase in its economic value?
Ho
: There is no significant correlation between aquaculture volume and value.
Ha
: There is a significant correlation between aquaculture volume and value.
Selection of Statistical Test
The first research question deals with two quantitative values: volume and value. Since we are testing if there is no positive correlation between the two features and not just any relation, a one-tailed Pearson correlation test is the appropriate statistical method for the following reasons:
- Both variables are continuous and measured on an interval/ratio scale
- We are investigating a linear relationship between the variables
- The hypothesis is directional (specifically testing for a positive correlation)

Statistical Analysis Results
The Pearson correlation coefficient (r) calculated between production volume and economic value is 0.4275, indicating a moderate positive linear relationship between the two variables. According to established statistical guidelines, correlation coefficients between 0.3 and 0.5 typically represent moderate positive correlations.
The one-tailed p-value is p is less than 0.0001, which is significantly below our alpha threshold of 0.05. This extremely small p-value indicates that the probability of observing this correlation coefficient (or stronger) by random chance alone, assuming the null hypothesis is true, is less than 0.01%. Therefore, we have sufficient statistical evidence to reject the null hypothesis.
Visual Analysis and Implications
To visually support this claim, we can utilize our plot from 3.2.7 but now we add a regression line.

The scatter plot reveals an interesting pattern that might not be fully captured by a single correlation coefficient. Upon visual inspection, there appear to be two distinct clusters within the data, suggesting a potentially more complex relationship. With this, we use K-means clustering to identify the two visually identifiable clustering.

The plot above is quite interesting as compared to what we could have imagined (most would easily halve the original scatter plot into two by creating a dividing line that will cluster the original scatter plot). But does these two plots inherit the same regression line from the original plot?

The first clustering has a high positive coefficient while the second clustering tends to have a lower negative coefficient. Given these two plots, it is hard to conclude what characteristics can be discerned from approximately 66% of production volume values. But after some point, economic value tends to die down.
This statistical result have economic implications. The positive correlation between production volume and economic value suggests that, in general, increasing aquaculture production leads to higher economic returns. However, the cluster analysis reveals a variation where:
- Cluster 0: A strong positive relationship exists (y = 7.905x + 56,991.36), suggesting efficient market conditions where increasing production significantly boosts economic value.
- Cluster 1: The relationship becomes negative (y = -2.059x + 3,893,432.51), indicating diminishing returns, likely due to market saturation or price suppression at high volumes.
These findings makes sense because it reflects the law of demand and supply. After some point, we cannot value the same good similar to the one previously bought due to decreasing marginal utility. Aside from that, there are income effects and demand shifts that needs to be considered especially that we are living in a third-world country and are currently facing high taxes. As a conclusion, where producers face market and policy constraints, this suggests that maximizing production may not always maximize profit.
Research Question 2
Is there a significant difference in aquaculture volume across different regions in the Ph?
Ho
: There is no significant difference in aquaculture volume across different regions in the Philippines.
Ha
: There is a significant difference in aquaculture volume across different regions in the Philippines.
Selection of Statistical Test
For this research question, we are dealing with a categorical variable as a predictor (region) and a quantitative outcome variable (production volume). Based on this, the appropriate statistical test for comparing more than two independent groups is the Kruskal-Wallis H test, a non-parametric alternative to one-way ANOVA when normality cannot be assumed.

Statistical Analysis Results
Using the ANOVA method (`f_oneway`), we obtained an F-statistic of 45 and a p-value of $3.25 \times 10^-35$. Given the extremely small p-value (far below any conventional alpha level), we reject the null hypothesis that all regional aquaculture volumes come from the same distribution.
This result indicates a statistically significant difference in aquaculture production volume across different regions in the Philippines. The variation is substantial enough to merit closer examination of regional contributions and disparities.
Visual Analysis and Implications
This tells us that every region has their own contribution and no conclusion can be said that could summarize the volume of aquaculture each region provides. This discrepancy allows us to identify weak points which we could prepare for or plan about. There might be some lack of support or some hidden cause that could lead to such discrepancy.
This can be further shown by the boxplots below, excluding outliers for each region. BARMM is quite an interesting case having a healthy production volume. If we geographically locate BARMM, it is located in the south-western part of the Philippines and is comprised of several islands and a mainland Mindanao area. Given its geographical location, we can already infer as to why BARMM produces such high highs in terms of volume.

If we think about it, several other regions especially in the Visayas are surrounded by large bodies of water but if we take a closer look, Western Visayas have greater proportions compared to Central and Eastern Visayas. Western Visayas has a well-established and well-supported aquaculture industry, backed by strong infrastructure and government investment. Key highlights include Capiz as the seafood capital, the Iloilo Fish Port Complex as a major trading and processing hub, a rehabilitated hatchery in Aklan producing up to 10 million bangus fry annually (operational since 2021), and plans for a ₱30 million aquaculture feed mill to lower feed costs. Central Visayas engages in aquaculture, particularly in Cebu and Siquijor, but faces economic and social challenges. A 2015 study found that while the sector offers employment, jobs are often non-permanent and poverty remains high in surrounding communities. Most farms use basic infrastructure like mud-bottom ponds and rely on commercial feed and free-flowing water systems, raising concerns about long-term sustainability. Eastern Visayas shows aquaculture potential, with feasibility studies supporting hatchery development. However, the region is highly vulnerable to typhoons, which threaten aquaculture operations and infrastructure.

Aside from the regions that are surrounded by bodies of water, there is also another interesting instance where Central Luzon, which is known for its vast lands for farming and agricultural products, is the second in terms of production volume. The study by Manlosa et al. (2021) explores how institutional dynamics and environmental changes have driven the expansion of aquaculture in Central Luzon, particularly highlighting the conversion of rice paddies into fish farms. A key driver of this shift is saltwater intrusion in low-lying areas, which made rice farming less viable and prompted farmers—especially in Pampanga, Bulacan, and Bataan—to transition to brackishwater aquaculture. These areas, along with inland provinces like Nueva Ecija and Tarlac, now contribute significantly to aquaculture production, cultivating species such as bangus, sugpo, and tilapia in fishponds, cages, and reservoirs.
Research Question 3
Is there a significant trend in the volume of aquaculture production in the Philippines over the years?
Ho
: There is no significant trend in the volume of aquaculture in the Philippines over the years.
Ha
: There is a significant trend in the volume of aquaculture in the Philippines over the years.
Selection of Statistical Test
For analyzing the production volume trend over time, the Mann-Kendall Trend Test was selected as the appropriate statistical method. This test is ideal for this analysis because:
- It effectively detects monotonic trends in time series data
- It doesn't require the data to be normally distributed
- It can handle seasonal variations common in production data
- It's resistant to outliers which appear present in the quarterly production volumes

Statistical Analysis Results
Given that the trend is no trend
, this means that there's no significant trend, thus we fail to reject `Ho`. This tells us about the uncertainty of the possible volume of aquaculture that we'll have for the following years. This uncertainty can cause several implications such as:
- Difficulty in long-term planning for supply chain management, including storage, distribution, and market pricing.
- Challenges in policy-making for the aquaculture sector, as the absence of a clear trend makes it harder to design effective interventions or subsidies.
- Risk for investors and stakeholders in the aquaculture industry who rely on production forecasts for business decisions.
- Potential instability in food security, especially if future volumes fluctuate unexpectedly, affecting local consumption and export goals
Visual Analysis and Implications
This can be further illustrated by the time series graph below which shows that there is indeed no apparent trend for the past five years.


From these plots, we get to learn that there are no apparent upward or downward patterns for quarterly fluctuations. There are some notable peaks that occurred in Q3 2022, Q4 2021, and Q3 2023, with the highest volume reaching nearly 6,000 metric tons. Q4 2021 and Q3 2022 are interesting cases because these are times where we're still under the Covid 19 pandemic.
What's our take? In a nutshell...

With these things in mind, we get to see a bigger picture. Having resources is not enough for an economy to work without proper support. Even though a large portion of Visayas is surrounded by a large body of water, they still lack in terms of production volume. But is it the end-all? No, there are prospering regions that dare defy this case. Central Luzon is one of the leading regions in terms of production volume but in their case, they are not as geographically resourceful in terms of bodies of water. This might mean one thing. Other regions are not trying.
Or it could be that the disparity lies in access to aquaculture infrastructure, policy support, and investment in technology. BARMM, for instance, dominates production—suggesting that when regions are empowered, productivity can skyrocket regardless of long-standing challenges. Meanwhile, the map also reveals that economic value doesn’t always correlate directly with production volume, hinting at deeper systemic issues in pricing, value chains, and market access. This further emphasizes the need for a more equitable and strategic approach to aquaculture development across regions.
Time Series Forecasting
For the final part of our project, we are tasked to create apply a machine learning model from our dataset. Since our dataset is time series in nature, our group opted to use autoregressive integrated moving average (ARIMA) to forecast the next quarter.
Since our dataset has minimal datapoints for every Region-Species specific combination, we opted to finetune models instead by using APIs from statmodels and Prophet.
Model
Future volume can be forecasted using Facebook's Prophet library, designed for time series data that exhibits trends and seasonality. This procedure is robust to missing data and shifts in trends, especially when taking to account strong seasonal effects common in the archipelago.
To use Prophet, there are four quarters in the season to take note of:
- Quarter 1: March 31
- Quarter 2: June 30
- Quarter 3: September 30
- Quarter 4: December 31
By grouping specific quarters with their respective volumes for each region-species to a time-series format, we can fit the model and start predicting the volume for upcoming years.
Limitations
There are a few limitations regarding the model, specifically to its dataset.
- Limited Data Points per Region-Species pair:
Currently, the dataset only holds 20 data points for each region-species pair and this causes problems that are relevant for forecasting due to the lack of source knowledge that the model needs to learn from. This leads to overfitting of points and may not provide a significant result.
- Limited External Factors for Analysis and Modelling:
Aside from the lack of data points, the features included in the dataset does not fully include all factors that may affect the studied variables. These may lead to wrong correlations due to exogenous variables unaccounted during the analyses.
Forecasting Model
To generate a forecast, pick a specific region and species respectively to generate a forecast plot of their input. The applet outputs a forecast plot with a trendline with uncertainty bands, and observed points seen in the dataset.

Clich here for the Huggingface Gradio App
Conclusion
In this study, we were able to learn about aquaculture systems in the Philippines. By applying statistical analyses on regional disparities, we were able to identify a moderate positive correlation between production volume and economic value which means that an increase in production can contribute to to economic returns. However, there are no clear signs regarding production trend over time which may lead to the sector's complexities and challenges.
Call to Action
These challenges provide opportunities for policymakers, researchers, and industry stakeholders to collaborate and provide strategies to solve the root cause of these problems. Investment on underperforming regions, enhanced support systems, and active monitoring on production data are just some of the efforts that may be done based on our report. Integrating additional factors aside from the ones considered for this study may help ensure both economic viability and food security.
Recommendation
From here, the team recommends to identify additional factors in understanding the correlation between the variables. Also, when dealing with time series data, it is highly recommended to use fine-grained data in order to better analyze and identify trends since relevant information is better understood when studied under finer details.