Geographic information system-based spatial analysis of population distribution in Banten province-Indonesia

Population distribution is one of the of disaster vulnerability parameters needed in a disaster risk assessment. The analysis approaches to determine the spatial population distribution can use many methodological alternatives. The general approach used in Indonesia is based on the results of a survey or census, where the number of population density is distributed evenly within the administrative borders. Another approach using Random Regression Tree model-based Forest Mapping is used by Worldpop. Both methodologies have their respective advantages and disadvantages. This study was conducted by combining these two methods where some data and parameters are added as driving factors on the scale spatial resolution analysis (grid size) 0.000833333 decimal degrees (approximately 100 m in the equatorial region) for a case study in Banten Province. Data processing is performed by raster analysis approach and GIS. The results are more affordable, meet cost requirements, and can be utilized to calculate the level of disaster risk in the area.


Introduction
In order to conduct a disaster assessment, spatial distribution data that illustrates the distributions of the human population are needed. In disaster management context, population distribution is significant to estimate the element of the risk to reduce the incident of fatalities [1]. The accurate population information can be achieved by considering the estimated Indonesia population for next 10 years since 2010, which more than 50% of the growth will be absorbed into urban areas. The easilyupdatable population data for research and others are requested. This request can be satisfied by the use of remote sensing and other geospatial data sets to refine estimation of population data. But, a new approach is needed to achieve accurate spatial population distributions.
In general, the population is provided in the form of statistical tables and presented spatially in a reference map. The spatial presentation also commonly referred to as mapping choropleth. The disadvantage of this method is that this method can't describe in detail the occupied area. In fact, the population distribution through choropleth mapping does not represent actual population distribution [1]. For the disaster risk analysis purposes, this method is less feasible to be used, especially to describe the populations exposure.
In order to have population data in more detail, some scholars such as [2] and [3] have employed a dasymetric mapping method. The Dasymetric mapping is one area-based thematic mapping method that produces more detailed spatial information [4]. Dasymetric mapping has an advantage in generating more realistic spatial population distribution compared to choropleth mapping.
Dasymetric mapping has been well developed along with the development of spatial technology. One of the spatial data were freely available to be used is data from WorldPop, especially for the region of Indonesia. The WorldPop data are generated using random regression tree model-based forest mapping along with various assumed variables which effect the population density pattern. The predicted value is then used as a surface weight values to redistribute dasymetric population density per grid up to villages level. The WorldPop spatial resolution data is 0.000833333 decimal degrees (approximately 100 m in the equatorial region).
In particular cases in Indonesia, the WorldPop data has some disadvantages. First example, the IndoPop population (Indonesia Population data produced by WorldPop) data in 2015 is overestimate compared to the Indonesia Statistical Agency (BPS) at the Provincial level. Second, the distribution of population density values is not only in residential areas but also outside the areas. Then the IndoPop data quality is necessary to be improved.
This research introduces a new method to improve population distribution data by combining the two earlier mentioned methods for more reliable population data as well as its accuracy.

Data
The data used in the form of spatial data consists of:

Amount of Population (Village level) Tabular
The official source data are available on the internet. The village-level boundary data and the population data are obtained from BPS. The IndoPop data is available online and the residential distribution is extracted from 1:25.000 topography map which is also freely available (tanahair.indonesia.go.id) which can be opened through ArcGIS -ArcMap program.

Methodology
The methods used in this study are data pre-processing, improving IndoPop data, and spatial distribution of the population.

Data pre-processing
This step aim to prepare the initial data for a further analysis. Since the data is run in a raster environment, the data must be common in the format, cell size, and coordinate system. All data must use a Mercator projection such as WGS_1984_World_Mercator or Universal Transverse Mercator (UTM). The transformation of the coordinate system can be done using ArcToolbox -Projections and Transformations -Project (for vector data) or ArcToolbox -Projections and Transformations -Raster -Raster Project.

Improving IndoPop data
Since the data have inaccurate estimated population and distribution, it is necessary to improve the IndoPop data. The basic concept of this step is to distribute the population only in the settlement area. This step can be run through the following procedures: Integer.  Please rate 1 for the residential attribute and give 0 for the non-residential attribute.  Convert to raster polygon overlay by using Polygon to Raster tools and set the size of the cell to 100 (uniformity with IndoPop data).  Get Indo Pop pieces of data based on study area boundaries using tools Extract by Mask or Clip Raster.  Use the Raster Calculator tool to perform the data Indo Pop settlement based on data from existing polygons by using the syntax: Con ("raster_A" == 1, "raster_B", "raster_A") Where "raster_A" is settlement raster; and "raster_B" is IndoPop raster.
 The result of these steps is an improved IndoPop data.

Spatial distribution of population
This step deals with the population size. The population data from the IndoPop will be adjusted according to the BPS data. In this case, the spatial data distribution will be updated based on BPS in 2015. The procedure can be done by following scheme:  Use the tools Zonal Statistics to count the number of inhabitant village of the Indo Pop data. Select the type of statistics SUM to obtain the number of residents in each village based on administrative boundary data.  Convert polygon administrative boundaries of the village into raster data based on the number of people by using the tools Polygon to Raster. Set the size of the cell to 100 (uniformity with IndoPop data).  Use the Raster Calculator tool to redistribute the value of the total population based on data Indo Pop improvement and the existing population in 2015 by using the formula: where P is the number of people per grid; p i is the number of people form WorldPop data i; p j is the number of people form BPS by village data j, and j is the village. 3 Results and d

Administrative distribution
The map shows the population distr Province in village represents the total population of the villages. In general, the north part

Random regression tree model forest mapping
The map (Fig 2) the study area generated through the Random Regression Tree modelthis map displays previous map in which the population northern part of the study area.

Final r
The map shows region modeled administrative Regression Tree model shows the population distribution pattern administrative and WorldPop map.

Administrative-based of population istribution
The map shows the population distr Province in village-level basis represents the total population of the villages. In general, the north parts of the area are den Population distribution in administrative bases Random regression tree model apping (Fig 2) illustrates the population distribution in the study area generated through the Random Regression -based Forest Mapping method. displays the same population distribution as the previous map in which the population northern part of the study area.

Final results
shows the population distribution of Banten region modeled through the combination between the administrative-based mapping method and the Random Regression Tree model-based Forest Mapping method. It shows the population distribution pattern administrative and WorldPop map.    Even though the three maps population distribution pattern, each different accuracy. the amount of population is distribut administrative base the number of people live in the area. Howe choropleth map is in distribution precisely evenly but concentrated in Figure 2 display on the land use classification population unevenness. However, when it comes to the population calculation, underestimates the amount of population (see Table 1). The combination of Random Regression Tree model does not only deal provides more reliab to the statistics data. Table 1 shows population using reasons is the Random Regression Tree mode Forest Mapping utilizes a medium imagery to identify the human presence. As a result, some settlement areas may not be captured, especially for one that smaller than image spatial resolution. In contrast, the combined method use derived from the topographic map (1: 25,000) is more detail. Therefore, it can distribute the population data precisely within the settlement area.
In order to illustrate th ffects the disaster risk of three different methods Figure shows the study area map. From the figure the number be estimated as displayed in Table 2 . Population distribution produced by the combination Even though the three maps population distribution pattern, each different accuracy. For instance, in the first map (Fig.1) the amount of population is distribut administrative bases. The map the number of people live in the area. Howe choropleth map is insufficient distribution precisely since the concentrated in settlement Figure 2 displays the population proportional on the land use classification enness. However, when it comes to the calculation, it the amount of population (see Table 1). The combination of administrative Random Regression Tree model not only deal with the distribution issues reliable amount of population to the statistics data.
shows an underestimat population using WorldPop data. One of the possible reasons is the Random Regression Tree mode Forest Mapping utilizes a medium imagery to identify the human presence. As a result, some settlement areas may not be captured, especially smaller than image spatial resolution. In contrast, the combined method use derived from the topographic map (1: 25,000) is more detail. Therefore, it can distribute the population data precisely within the settlement area.
o illustrate the population distribution data ffects the disaster risk assessment, of three different methods are compared each other the study area overlayed by map. From the figure the number displayed in Table 2 . Population distribution produced by the combination Even though the three maps show population distribution pattern, each map actually instance, in the first map (Fig.1) the amount of population is distributed evenly in he map is also able to the number of people live in the area. Howe sufficient to show the population since the people are not distributed settlement areas. the population proportional on the land use classification. It addresses enness. However, when it comes to the it may overestimates the amount of population (see Table 1).
administrative-based Random Regression Tree model-base Forest Mapping with the distribution issues le amount of population an underestimate of WorldPop data. One of the possible reasons is the Random Regression Tree mode Forest Mapping utilizes a medium-resolution satellite imagery to identify the human presence. As a result, some settlement areas may not be captured, especially smaller than image spatial resolution. In contrast, the combined method uses a settlement dataset derived from the topographic map (1: 25,000) is more detail. Therefore, it can distribute the population data precisely within the settlement area. e population distribution data sment, the analysis results are compared each other overlayed by map. From the figure the number of flood displayed in Table 2. . Population distribution produced by the combination the similar actually has the instance, in the first map (Fig.1) ed evenly in able to distinguish the number of people live in the area. However, this the population people are not distributed the population proportionally based s the issue of enness. However, when it comes to the overestimates or the amount of population (see Table 1).
based and the base Forest Mapping with the distribution issues but also le amount of population compared of the total WorldPop data. One of the possible reasons is the Random Regression Tree model-base resolution satellite imagery to identify the human presence. As a result, some settlement areas may not be captured, especially smaller than image spatial resolution. In s a settlement dataset derived from the topographic map (1: 25,000) is more detail. Therefore, it can distribute the population data e population distribution data the analysis results are compared each other. overlayed by flood hazard exposure can . Population distribution produced by the combination the similar the instance, in the first map (Fig.1) ed evenly in distinguish ver, this the population people are not distributed ly based the issue of enness. However, when it comes to the or the amount of population (see Table 1). and the base Forest Mapping but also compared the total WorldPop data. One of the possible base resolution satellite imagery to identify the human presence. As a result, some settlement areas may not be captured, especially smaller than image spatial resolution. In s a settlement dataset derived from the topographic map (1: 25,000)  The method used in this study to adjust the WorldPop before it an updating and distribution mechanism for the census data which usually exists Although the developed method more reliable distribution and population information, this method has some limitations. Fir method relies on WorldPop data, all data need to be adjusted accordingly to the WorldPop data structure especially the cell size (approximately 100 x 100 m). There may be some distortions if one intends to use this method in a finer pixel Compared to the BantenPop, which uses the combination method, the overestimated but the WorldPop Regression Tree model-base Forest Mapping method) underestimated. Disaster risk analysis requires an accurate data and information as it will affect the disaster agement plan. The overestimate calculation more resource ne and sanitation. On the other hand may result in a The method used in this study the WorldPop before it an updating and distribution mechanism for the census usually exists in administrative format. Although the developed method more reliable distribution and population information, this method has some limitations. Fir method relies on WorldPop data, all data need to be adjusted accordingly to the WorldPop data structure especially the cell size (approximately 100 x 100 m). There may be some distortions if one intends to use this method in a finer pixel resolution. Therefore, it requires The method used in this study shows the possibility the WorldPop before it is used an updating and distribution mechanism for the census in administrative format. Although the developed method is able to more reliable distribution and population information, this method has some limitations. Fir method relies on WorldPop data, all data need to be adjusted accordingly to the WorldPop data structure especially the cell size (approximately 100 x 100 m). There may be some distortions if one intends to use this resolution. Therefore, it requires Compared to the BantenPop, which uses the mapping method data (Random base Forest Mapping method) . Disaster risk analysis requires an accurate data and information as it will affect the disaster agement plan. The overestimate calculation such as logistics, housin , an underestimate of resource in case of shows the possibility is used. It also provides an updating and distribution mechanism for the census in administrative format. is able to provide more reliable distribution and population information, this method has some limitations. Firstly, since this method relies on WorldPop data, all data need to be adjusted accordingly to the WorldPop data structure especially the cell size (approximately 100 x 100 m). There may be some distortions if one intends to use this resolution. Therefore, it requires potential populations Compared to the BantenPop, which uses the mapping method is data (Random base Forest Mapping method) is . Disaster risk analysis requires an accurate data and information as it will affect the disaster will such as logistics, housing, underestimate in case of shows the possibility provides an updating and distribution mechanism for the census provide more reliable distribution and population information, stly, since this method relies on WorldPop data, all data need to be adjusted accordingly to the WorldPop data structure especially the cell size (approximately 100 x 100 m). There may be some distortions if one intends to use this resolution. Therefore, it requires further well a also depends on the quality of input

Conclusions
Disaster risk analysis requires accurate population distribution data combined methods of administrative Random Regression Tree can be an excellent alternative to overcome the population data. GIS Therefore, it can play a greater role in disaster management practices. Table 3. Amount of population in the comparison between WorldPop, BantenPop (combined method), and statistics data.

Conclusions
Disaster risk analysis requires accurate population distribution data combined methods of administrative Random Regression Tree can be an excellent alternative to overcome the population data.
GIS is an effective tool Therefore, it can play a greater role in disaster management practices.
The study is a part of National Disaster conducted by the National Disaster Management Authority of Republic Indonesia (BNPB) in 2015.

Conclusions
Disaster risk analysis requires accurate population distribution data to calculate the combined methods of administrative Random Regression Tree model can be an excellent alternative to overcome the effective tool to Therefore, it can play a greater role in disaster management practices.
The study is a part of National Disaster conducted by the National Disaster Management Authority of Republic Indonesia (BNPB) in 2015.
Lung, T., Lübker, T., Ngochoch, J. K., & Schaab, G. Human population distribution modeling regional level using very high Applied Geography Yue, T., Zhu, L., & Clinton, N population density using land cover Disaster risk analysis requires accurate population vulnerability. The based and the base Forest Mapping can be an excellent alternative to overcome the deal with spatial data. Therefore, it can play a greater role in disaster