Evaluating Tree Species Diversity in Forest Ecosystems Using LiDAR Data: A Exploration in NRW

Authors

Affiliation

Jakob Danel

Universität Münster

Federick Bruch

Universität Münster

Published

February 19, 2024

1 Introduction

Forests, referred as the “world’s air-conditioning system” or the “lungs of the planet,” play a critical role in maintaining the global environment, as noted by King Charles III. Delving into forests and their diverse tree species is essential for our comprehension of both local ecosystems and the broader globe. The combination of tree species significantly contributes to the functionality and biodiversity of a forest, as emphasized by Mori et al. in their work on forest tree biodiversity (Mori, Lertzman, and Gustafsson 2017). Different tree species face distinct threats, posing challenges to both the environment and human populations. For instance, the Oak processionary moth, highlighted by Sobczyk et al, (Sobczyk 2014) in their study on the impact of the oak processionary moth, and the bark beetle, discussed in the work of Müller et al. on bark beetle infestations (Müller and Imhof 2019), exemplify the varying risks associated with specific tree types. Forests play a pivotal role in climate regulation by absorbing carbon dioxide (CO\(_2\)) and providing habitats for a multitude of animal and plant species. Approximately 80% of all terrestrial species find their home in forests, housing more than four-fifths of all known animal and plant species outside the oceans (n.d.). Recognizing the importance of gaining a comprehensive understanding of forests is imperative, particularly in unraveling the (spatial) distribution patterns of different tree species. Such insights are crucial for informed conservation efforts, sustainable management, and addressing the diverse ecological challenges that forests face.

Often, forest monitoring relies on Sentinel data, offering valuable insights through various applications such as time series analysis of the deforestation process (Cremer et al. 2020), forest classification (Dostálová et al. 2021), and detection of forest succession (Szostak, Hawryło, and Piela 2018). However, employing Sentinel data is not without challenges. Dependency on cloud coverage and limited resolution, such as the \(5\times5\)m resolution on Sentinel-1 (Agency 2024), can hinder the accurate identification of individual trees. To overcome these limitations, LiDAR data emerges as a promising alternative. LiDAR provides higher resolutions and utilizes the intensity of the returned signal as an indicator of the forest structure (Gonzalez). The aim is to investigate whether LiDAR is an appropriate tool for distinguishing between tree species, with a focus on oak, beech, pine, and spruce – the most common species in North Rhine-Westphalia (NRW). This choice ensures comparable environmental conditions, including climate and altitude.

The methodology involves employing different statistical tools to analyze the distribution of LiDAR returns. Specifically, the study utilizes the random forest algorithm to predict the species of detected trees based on LiDAR data. This approach aims to enhance the precision and capabilities of forest monitoring, especially in regions like NRW, by leveraging LiDAR technology for species-specific insights. Our motivation and methodical approach leading to the following research question:

Can LiDAR technology be effectively employed to distinguish between tree species in monocultural forests in North Rhine-Westphalia (NRW), and how do the distinctive characteristics of individual tree species contribute to the accuracy of LiDAR-based classification?

We formulate the following hypothesis:

LiDAR data can effectively differentiate between tree species in monocultural forests in NRW, and there are statistically significant differences in the LiDAR-derived metrics among the various tree species.
Random Forest classification can be used to predict the tree species of trees in monoculteral forests.
Random Forest classification performance varies significantly depending on the set of LiDAR parameters used, indicating that certain combinations of parameters contribute more effectively to accurate tree species classification.

2 Methods

We first describe the processes of data acquisition and data preprocessing. Then we will propose how we analyse the distributions and how we utilize random forest predictions for species distinguishing.

2.1 Data acquisition

Our primary objective is to identify patches where one tree species exhibits a high level of dominance, striving to capture monocultural stands within the diverse forests of Nordrhein-Westfalia (NRW). Recognizing the practical challenges of finding true monocultures, we aim to identify patches where one species is highly dominant, enabling meaningful comparisons across different species.

The study is framed within the NRW region due to the availability of an easily accessible dataset. Our focus includes four prominent tree species in NRW: oak, beech, spruce, and pine, representing the most prevalent species in the region. To ensure the validity of our findings, we derive three patches for each species, thereby confirming that observed variables are characteristic of a particular species rather than a specific patch. Each patch is carefully selected to encompass an area of approximately 10-50 hectares and contain between 500 and 5,000 trees. Striking a balance between relevance and manageability, these patches avoid excessive size to enhance the likelihood of capturing varied species mixes and ensure compatibility with local hardware.

Specific Goals:

Retrieve patches with highly dominant tree species.
Minimize or eliminate the presence of human-made structures within the selected patches.

To achieve our goals, we utilized the waldmonitor dataset (Welle et al. 2022) and the map provided by (Blickensdoerfer 2022), both indicating dominant tree species in NRW. We identified patches of feasible size where both sources predicted the presence of a specific species. Further validation involved examining sentinel images of these forest regions to assess the evenness of structures, leaf color distribution, and the absence of significant human-made structures such as roads or buildings. The subsequent preprocessing steps, detailed in the following subsection, involved refining our selected patches and deriving relevant variables, such as tree distribution and density, to ensure that the chosen areas align with the desired research domains.

2.2 Preprocessing

In this research study, the management and processing of a large dataset are crucial considerations. The dataset’s substantial size necessitates careful maintenance to ensure efficient handling. Furthermore, the data should be easily processable and editable to facilitate necessary corrections and precalculations within the context of our research objectives. To achieve our goals, we have implemented a framework that automatically derives data based on a shapefile, delineating areas of interest. The processed data and results of precalculations are stored in a straightforward manner to enhance accessibility. Additionally, we have designed functions that establish a user-friendly interface, enabling the execution of algorithms on subsets of the data, such as distinct species. These interfaces are not only directly callable by users but can also be integrated into other functions to automate processes. The overarching aim is to streamline the entire preprocessing workflow using a single script, leveraging only the shapefile as a basis. This subsection details the accomplishments of our R-package in realizing these goals, outlining the preprocessing steps undertaken and justifying their necessity in the context of our research.

The data are stored in a data subdirectory of the root directory in the format species/location-name/tile-name. To automate the matching of areas of interest with the catalog from the Land NRW¹, we utilize the intersecting tool developed by Heisig². This tool, allows for the automatic retrieval and placement of data downloaded from the Land NRW catalog. To enhance data accessibility, we have devised an object that incorporates species, location name, and tile name (the NRW internal identifier) for each area this object facilitates the specification of the area to be processed. Additionally, we have defined an initialization function that downloads all tiles, returning a list of tile location objects for subsequent processing. A pivotal component of the package’s preprocessing functionality is the map function, which iterates over a list of tile locations (effectively the entire dataset) and accepts a processing function as an argument. The subsequent paragraph outlines the specific preprocessing steps employed, all of which are implemented within the mapping function.

To facilitate memory-handling capabilities, each of the tiles, where one area can span multiple tiles, has been split into manageable chunks. We employed a \(50\times50\)m size for each tile, resulting in the division of original \(1\times1\)km files into 400 tiles. These tiles are stored in our directory structure, with each tile housed in a directory named after its tile name and assigned an id as the filename. Implementation-wise, the lidr::catalog_retile function was instrumental in achieving this segmentation. The resulting smaller chunks allow for efficient iteration during subsequent preprocessing steps.

The next phase involves reducing our data to the actual size by intersecting the tiles with the defined area of interest. Using the lidR::merge_spatial function, we intersect the area derived from the shapefile, removing all point cloud items outside this region. Due to our tile-wise approach, empty tiles may arise, and in such cases, those tiles are simply deleted.

Following the size reduction to our dataset, the next step involves correcting the z values. The z values in the data are originally relative to the ellipsoid used for referencing, but we require them to be relative to the ground. To achieve this, we utilize the lidR::tin function, which extrapolates a convex hull between all ground points (classified by the data provider) and calculates the z value based on this structure.

Subsequently, we aim to perform segmentation for each distinct tree, marking each item of the point cloud with a tree ID. We employ the algorithm described by Li et al. (2012), using parameters li2012(dt1 = 2, dt2 = 3, R = 2, Zu = 10, hmin = 5, speed_up = 12). The meanings of these parameters are elucidated in Li et al.’s work (Li et al. 2012).

Finally, the last preprocessing step involves individual tree detection, seeking a single POINT object for each tree. The lidR::lmf function, an implementation of the tree data using a local maximum approach, is utilized for this purpose (Popescu and Wynne 2004). The results are stored in GeoPackage files within our data structure.

See Section 6.2 for the implementation of the preprocessing.

2.3 Analysis of different distributions

Analysis of data distributions is a critical aspect of our research, with a focus on comparing two or more distributions. Our objective extends beyond evaluating the disparities between species; we also aim to assess differences within a species. To gain a comprehensive understanding of the data, we employ various visualization techniques, including histograms, density functions, and box plots.

In tandem with visualizations, descriptive statistics, such as means, standard errors, and quantiles, are leveraged to provide key insights into the central tendency and variability of the data.

For a more quantitative analysis of distribution dissimilarity, statistical tests are employed. The Kullback-Leibler (KL) difference serves as a measure to compare the similarity of a set of distributions. This involves converting distributions into their density functions, with the standard error serving as the bandwidth. The KL difference is calculated for each pair of distributions, as it is asymmetric. For the two distributions the KL difference is defined as following (Kullback 1951):

\[ D_{KL}(P \, \| \, Q) = \sum_i P(i) \log\left(\frac{P(i)}{Q(i)}\right) \]

To obtain a symmetric score, the Jensen-Shannon Divergence (JSD) is utilized (Grosse et al. 2002), expressed by the formula:

\[ JS(P || Q) = \frac{1}{2} * KL(P || M) + \frac{1}{2} * KL(Q || M) \] Here, \(M = \frac{1}{2} * (P + Q)\). The JSD provides a balanced measure of dissimilarity between distributions (Brownlee 2019). For comparing the different scores to each other, we will use averages.

Additionally, the Kolmogorov-Smirnov Test is implemented to assess whether two distributions significantly differ from each other. This statistical test offers a formal evaluation of the dissimilarity between empirical distribution functions.

2.4 Random Forest for predicting species

The aim of our research is to investigate the feasibility of developing a predictive model for tree species classification based on derived parameters. To accomplish this objective, we employed the Random Forest algorithm, as introduced by (Breiman 2001), which leverages multiple decision trees to predict the species of a given tree based on its characteristics.

The dataset utilized in our study consists of identified trees, and our primary goal is to predict the species of each tree. To establish ground truth for training and validation of the model, we assigned the dominant species to each distinct patch.

For validation purposes, we adopted a spatial cross-validation methodology, specifically implementing the leave-one-out principle. This technique systematically excludes a particular portion of the dataset in each iteration (Brovelli et al. 2008). In our spatial cross-validation, the group left out is determined based on their spatial characteristics (Valavi et al. 2018).

Our combined approach entails designating the patches with tree detections as the left-out group. The model is trained on the training split and validated using the left-out (validation) split of the data. Ground truth and predictions from each iteration are stored, and the results from all iterations are aggregated into a confusion matrix for a comprehensive analysis.

In our evaluation of results, we utilized various statistical measurements. The accuracy, defined as the ratio of correct predictions to all predictions (Alvarez 2002), is calculated using the formula \[\text{Accuracy} = \frac{\text{Correct Predictions}}{\text{All Predictions}}\] This metric provides insight into the percentage of correct predictions.

To normalize accuracy in the context of a small number of classes (\(n = 4\)), Kohen’s Kappa Index was employed (Fleiss, Cohen, and Everitt 1969). The formula is given by \[\kappa = \frac{P_o - P_e}{1 - P_e}\] where \(P_o\) represents the observed agreement between raters, and \(P_e\) is the expected agreement, accounting for the probability of chance agreement.

Another crucial metric, precision, commonly used in binary classification, measures the accuracy of positive predictions (Cleverdon 1967). It is calculated as \[\text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}\]

Similarly, recall, also known as sensitivity or true positive rate, is essential in binary classification to assess the model’s ability to identify all relevant instances (Sparck Jones 1972). The formula for recall is \[\text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}\]

These statistical measures collectively provide a comprehensive evaluation of the model’s performance, taking into account both overall accuracy and class-specific performance metrics.

2.5 Implementation

In the implementation of our project aimed at utilizing LiDAR data for distinguishing tree species in monocultural forests, we chose R as our programming language. This decision was driven by the robust and well-maintained package infrastructure offered by R, providing a solid foundation for our work. Additionally, R’s inherent capability to handle large data structures efficiently proved advantageous for our project, where processing LiDAR point clouds required a scalable and effective solution.

Our implementation relied on four key R packages, each serving a crucial role in the project’s success:

lidR: This package played a fundamental role in handling and processing LiDAR point clouds. Its capabilities were essential for the core management of LiDAR data throughout our analysis (Roussel et al. 2020).
sf: Used for managing spatial vector data and conducting spatial operations, the sf package enriched our analysis with essential spatial components, enhancing the geographical dimension of our research (Pebesma 2018).
dplyr: As a package specialized in data manipulation, dplyr provided indispensable tools for shaping and organizing our data to meet the specific requirements of our analysis (Wickham et al. 2023).
ggplot2: Renowned for its proficiency in creating high-quality visualizations, ggplot2 was employed to present our results in a manner that is not only informative but also reproducible (Wickham 2016).

To streamline the implementation process and make our work accessible to others, we encapsulated these functionalities into a comprehensive R package. This package handles various aspects of the project seamlessly:

Automated Data Management: The R package takes care of organizing and saving data in a predefined directory structure, ensuring a systematic and orderly approach.
Meta Information Generation: It automatically generates meta-information based on the structure of the data, providing vital context and documentation for our analyses.
Analytic Function Application: Analytic functions are applied automatically to specified data subsets, streamlining the analytical process and enhancing the reproducibility of our results.

For those interested in exploring or contributing to our work, the GitHub repository containing the implementations can be accessed here. This centralized repository serves as a collaborative hub, inviting further development and utilization of our LiDAR forest analysis package (Danel and Bruch 2024). For further investigation you can find the package Documentation here: Section 6.5

3 Results

First we present the researched areas, after that we look into distribution characterics. Lastly we present a set of generated random forest models and their accuracy.

3.1 Researched areas

Code

library(ggplot2)
sf::sf_use_s2(FALSE)
patches <- sf::read_sf("research_areas.shp") |> sf::st_centroid()

de <- sf::read_sf("results/results/states_de/Bundesländer_2017_mit_Einwohnerzahl.shp") # Source: https://hub.arcgis.com/datasets/esri-de-content::bundesl%C3%A4nder-2017-mit-einwohnerzahl/explore?location=51.099647%2C10.454033%2C7.43
nrw <- de[5,] |> sf::st_geometry()


ggplot() + geom_sf(data = nrw) + 
    geom_sf(data = patches, mapping = aes(col = species))

Figure 1: Locations of the different patches with the dominant species for that patch. The patches centroids are displayed on a basemap describing the borders from NRW.

We draw three patches for each species from different regions (see Table 1). We download the LiDAR data for those patches and runned all preprocessing steps as described. We than checked with certain derived parameters (e.g. tree heights, tree distributions or tree density) that all patches contain valid forest data. In that step we discovered, that in one patch some forest clearance took place in the near past. This patch was removed from the dataset and was replaced with a new one.

In our research, drawing patches evenly distributed across Nordrhein-Westfalia is inherently constrained by natural factors. Consequently, the patches for oak and pine predominantly originate from the Münsterland region, as illustrated in Figure 1. For spruce, the patches were derived from Sauerland, reflecting the prevalence of spruce forests in this specific region within NRW, as corroborated by Welle et al. (Welle et al. 2022) and Blickensdörfer et al. (Blickensdoerfer 2022). Beech patches, on the other hand, were generated from diverse locations within NRW. Across all patches, no human-made objects were identified, with the exception of small paths for pedestrians and forestry vehicles.

The distribution of area and detections is notable for each four species. Beech covers 69,791.9 hectares with a total of 5,954 detections, oak spans 63,232.49 hectares with 5,354 detections, pine extends across 72,862.4 hectares with 8,912 detections, and spruce encompasses 57,940.02 hectares with 8,619 detections. Both the amount of detections and the corresponding area exhibit a relatively uniform distribution across the diverse patches, as summarized in Table 1.

With the selected dataset described, we intentionally chose three patches for each four species that exhibit a practical and usable size for our research objectives. These carefully chosen patches align with the conditions essential for our study, providing comprehensive and representative data for in-depth analysis and meaningful insights into the characteristics of each tree species within the specified areas.

Code

shp <- sf::read_sf("research_areas.shp")
table <- lfa::lfa_get_all_areas()

sf::sf_use_s2(FALSE)
for (row in 1:nrow(table)) {
  area <-
    dplyr::filter(shp, shp$species == table[row, "specie"] &
                    shp$name == table[row, "area"])
  area_size <- area |> sf::st_area()
  point <- area |> sf::st_centroid() |> sf::st_coordinates()
  table[row,"point"] <- paste0("(",round(point[1], digits = 4),", ",round(point[2],digits = 4),")")
  
  table[row, "area_size"] = round(area_size,digits = 2) #paste0(round(area_size,digits = 2), " m²")
  
  amount_det <- nrow(lfa::lfa_get_detection_area(table[row, "specie"], table[row, "area"]))
  if(is.null(amount_det)){
    cat(nrow(lfa::lfa_get_detection_area(table[row, "specie"], table[row, "area"])),table[row, "specie"],table[row, "area"])
  }
  table[row, "amount_detections"] = amount_det
  
  # table[row, "specie"] <- lfa::lfa_capitalize_first_char(table[row,"specie"])
  table[row, "area"] <- lfa::lfa_capitalize_first_char(table[row,"area"])
  }
table$area <- gsub("_", " ", table$area)
table$area <- gsub("ue", "ü", table$area)
table = table[,!names(table) %in% c("specie")]

knitr::kable(table, "html", col.names = c("Patch Name","Location","Area size (m²)","Amount tree detections" ), caption = NULL, digits = 2, escape = TRUE) |>
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "hold_position", "bordered","responsive"),
    stripe_index = c(1:3,7:9),
    full_width = FALSE
  ) |>
  kableExtra::pack_rows("Beech", 1, 3) |>
  kableExtra::pack_rows("Oak", 4, 6) |>
  kableExtra::pack_rows("Pine", 7, 9) |>
  kableExtra::pack_rows("Spruce", 10, 12) |>
  kableExtra::column_spec(1, bold = TRUE)

Table 1: Summary of researched patches grouped by species, with their location, area and the amount of detected trees.
Patch Name	Location	Area size (m²)	Amount tree detections
Beech
Bielefeld brackwede	(8.5244, 51.9902)	161410.57	1443
Billerbeck	(7.3273, 51.9987)	185887.25	1732
Wülfenrath	(7.0769, 51.2917)	350621.21	2779
Oak
Hamm	(7.8618, 51.6639)	269397.22	2441
Münster	(7.6187, 51.9174)	164116.61	1270
Rinkerode	(7.6744, 51.8598)	198811.09	1643
Pine
Greffen	(8.1697, 51.9913)	49418.81	513
Mesum	(7.5403, 52.2573)	405072.85	5031
Telgte	(7.7816, 52.0024)	274132.34	3368
Spruce
Brilon	(8.5352, 51.4084)	211478.20	3342
Oberhundem	(8.1861, 51.0909)	151895.53	2471
Osterwald	(8.3721, 51.2151)	216026.43	2806

3.2 Distribution of tree characteristics

The following subsections describe the distributions of different tree characterestics across the different species and patches.

3.2.1 Tree Heights

Code

detections <- lfa::lfa_get_detections()

In this study, we scrutinize the distribution of tree heights, focusing initially on the density distribution to unravel the nuances across various tree species. Notably, our examination reveals distinctive patterns, with Oak and Pine exhibiting significantly steeper peaks in their density curves compared to Beech and Spruce. While all species present unique density curves, a commonality emerges—each curve is characterized by a single peak, except for the intriguing exception observed in Telgte. Taking Beech as an illustrative example, our findings indicate a notable shift in the peak to a considerably higher extent. The varinace in the density curves indicating that an differentiation between species only with the help of tree height values could be difficult.

Code

lfa::lfa_create_density_plots(detections, value_column = "Z", category_column1 = "area", category_column2 = "specie", title = "Density of the height distributions", xlims = c(0,50))

Figure 2: Density of the height distribitions of the detectected trees. Splitted by the different researched areas and grouped by the dominant specie in this area.

To have a deeper look into the distributions of those Z-values we will now also have a look into the boxplots of the height distrubutions in the different areas. Noteworthy observations include the presence of outliers beyond the extended range of the Whisker Antennas (\(1.5*\text{IQR}\)) in all datasets. Of particular interest is the Rinkerode dataset, which exhibits a higher prevalence of outliers in the upper domain. Anomalies in this dataset are attributed to potential inaccuracies, urging a critical examination of data integrity. A pairwise examination of Oak and Pine species indicates higher mean heights for Oak compared to Pine. This insight underscores the significance of species-specific attributes in shaping overall height distributions. Further exploration into the factors contributing to these mean differences enhances our understanding of the unique characteristics inherent to each species. Contrary to expectations, the spread within a particular species does not exhibit significant divergence from the spread observed between different species. This finding suggests that while species-specific traits play a crucial role in shaping height distributions, certain overarching factors may contribute to shared patterns across diverse tree populations.

Code

lfa::lfa_create_boxplot(detections, value_column = "Z", category_column1 = "area", category_column2 = "specie", title = "Boxplots of the height distributions")

Figure 3: Boxplots of the height distribitions of the detectected trees. Splitted by the different researched areas and grouped by the dominant specie in this area.

Our examination of Kullback-Leibler Divergence (KLD) and Jensen-Shannon Divergence (JSD) metrics reveals low mean values (KLD: 5.252696, JSD: 2.246663) across different species, indicating overall similarity in tree species height distributions. However, within specific species, particularly Pine, higher divergence values (see Table 5 and Table 10) suggest significant intraspecific differences.

Notably, the Spruce species consistently demonstrates low divergence values across all tested areas, implying a high level of explainability. This finding highlights tree height as a reliable indicator for detecting Spruce trees, indicating its potential for accurate species identification in diverse forest ecosystems.

3.2.2 Number of returns

Code

data <- sf::st_read("data/tree_properties.gpkg")
neighbors <- lfa::lfa_get_neighbor_paths() |> lfa::lfa_combine_sf_obj(lfa::lfa_get_all_areas())
data = sf::st_join(data,neighbors, join = sf::st_within)

Examining the distribution of LiDAR returns per tree is the focus of our current investigation. Initial analysis involves the study of density graphs representing the distribution of LiDAR returns. The density curves for each species exhibit distinct peaks corresponding to their respective species, providing a clear differentiation in LiDAR return patterns. Notably, there is an exception observed in the Brilon patch (Spruce), where the curve deviates, possibly indicative of variations in forest age. A noteworthy trend is the divergent shape of density curves between coniferous and deciduous trees. Conifers exhibit steeper curves, indicating lower density for higher return values compared to deciduous trees. This disparity underscores the potential of LiDAR data to distinguish between tree types based on return density characteristics. In the case of Beech trees, the peaks’ heights vary among different curves, suggesting nuanced variations within the species. Despite these differences, all species consistently peak in similar regions, emphasizing the overarching similarities in LiDAR return patterns across diverse tree species.

Code

lfa::lfa_create_density_plots(data, value_column = "number_of_returns", category_column1 = "area", category_column2 = "specie", title = "Density of the distribution of LiDAR returns per individual tree", xlims = c(0,10000))

Figure 4: Density of the amount of LiDAR returns per detectected tree. Splitted by the different researched areas and grouped by the dominant specie in this area.

Currently, our investigation focuses on boxplots representing each patch. We observe significant size variations among plots within the same species. Notably, numerous outliers are present above the box in each patch. For Pines, the boxes exhibit a notable similarity. However, the box for Brilon is entirely shifted from other boxes associated with patches featuring Spruce forest.

Code

lfa::lfa_create_boxplot(data, value_column = "number_of_returns", category_column1 = "area", category_column2 = "specie", title = "Boxplots of the distribution of LiDAR returns per individual tree")

Figure 5: Boxplots of the the amount of LiDAR returns per detectected tree. Splitted by the different researched areas and grouped by the dominant specie in this area.

Overall, our analysis reveals very low results for both Kullback-Leibler Divergence (KLD) and Jensen-Shannon Divergence (JSD) metrics across different species. Within species, there is high explainability observed for the different LiDAR return curves between patches.

This suggests that the number of returns alone may not be a robust predictor for identifying the dominant species in a forest. However, the curves indicate a clear potential for distinguishing between conifers (Pine and Spruce) and deciduous trees (Beech and Oak) based on the number of returns. This observation is further supported by the JSD scores, as detailed in Table 47.

3.2.3 n-nearest Neighbours

Code

neighbors <- lfa::lfa_combine_sf_obj(lfa::lfa_get_neighbor_paths(),lfa::lfa_get_all_areas())

Overview

To initiate our analysis, we first establish a framework for selecting neighbors by examining the distance development with different \(n\), as illustrated in Figure 6. The curves share a similar design, but the actual values vary. Notably, as \(n\) increases, the distance between all patches also increases, indicating a broader spatial context.

Considering this trend, we extend our investigation beyond the nearest neighbor to include the 100th nearest neighbor. The \(\Delta\)distance shows a consistent decrease with each increment in \(n\), reinforcing our decision to limit exploration beyond \(n\) of a hundred. Additionally, the constraint is driven by practical considerations, as our sample size occasionally lacks the capacity to explore larger \(n\) values, resulting in inaccurate values due to the absence of the true nearest neighbor within the sample area.

Code

lfa::lfa_create_neighbor_mean_curves(neighbors) |> lfa::lfa_create_plot_per_area()

Figure 6: Average Distance to n-nearest neighbor from each patch. For simplicity colored by the dominant specie of each tree.

The Nearest Neighbour

Our initial focus centers on examining the distance to the nearest neighbor for each tree. Notably, the curve representing Spruce exhibits distinct characteristics compared to the three other curves—displaying a steeper profile with less variance, as depicted in Figure 7.

Further analysis of all patches reveals similar distributions, as evident in the boxplot shown in Figure 2 (Figure 8), where mean and variance demonstrate consistency across patches. However, these graphical statistics present challenges in effectively distinguishing between different tree species based on the distance to the nearest neighbor.

Code

lfa::lfa_create_density_plots(neighbors,value_column = "Neighbor_1",category_column1 = "area",category_column2 = "specie", title = "Density plots for the nearest neighbor among species and areas", xlims = c(0,15))

Figure 7: Density plot of the distance to the nearest neighbor distribution across all patches grouped by the dominant species.

Code

lfa::lfa_create_boxplot(neighbors,value_column = "Neighbor_1",category_column1 = "area",category_column2 = "specie", title = "Box plots for the nearest neighbor among species and areas")

Figure 8: Density plot of the distance to the nearest neighbor distribution across all patches grouped by the dominant species.

The 100th nearest Neighbor

Moving on to the analysis of the 100th nearest neighbor, intriguing patterns emerge. Peaks in the curves display varying heights and positions, with a notable example being the complete shift between Oak and Spruce, as illustrated in Figure 9.

However, it is essential to acknowledge the high variance observed between curves within a species, such as Pine or Beech. While this variance could serve as a potential indicator, it comes with the caveat that the sample size must be substantial for reliable conclusions.

Examining boxplots reveals numerous outliers above the boxes, hinting at potential edge effects on the sides of patches. This observation raises concerns about the adequacy of trees in these areas for a more in-depth analysis, posing challenges in deriving accurate insights.

Code

lfa::lfa_create_density_plots(neighbors,value_column = "Neighbor_100",category_column1 = "area",category_column2 = "specie", title = "Density plots for the nearest neighbor along species and areas", xlims = c(35,100))

Figure 9: Density plot of the distance to the nearest neighbor distribution across all patches grouped by the dominant species.

Code

lfa::lfa_create_boxplot(neighbors,value_column = "Neighbor_100",category_column1 = "area",category_column2 = "specie", title = "Box plots for the nearest neighbor along species and areas")

Figure 10: Density plot of the distance to the nearest neighbor distribution across all patches grouped by the dominant species.

Average distance to 100 nearest neighbors

Code

names <- paste0("Neighbor_",1:100)
neighbors$avg = rowMeans(dplyr::select(as.data.frame(neighbors),names))

Turning our attention to the averages of the first 100 neighbors, our analysis indicates strikingly similar results. There is considerable variance observed between different species, as well as within individual species, as depicted in Figure 11.

Despite the uniformity in average results, the issue of outliers persists, as evident in the boxplot representation shown in Figure 12. These outliers pose challenges and may be indicative of specific environmental conditions affecting tree distributions. Further exploration is required to better understand and mitigate the impact of outliers on our analysis.

Code

lfa::lfa_create_density_plots(neighbors,value_column = "avg",category_column1 = "area",category_column2 = "specie", title = "Density plots for the average of 100 nearest neighbors", xlims = c(25,60))

Figure 11: Density plot of the average distance to the nearest neighbor (n=100) distribution across all patches grouped by the dominant species.

The neighbor analysis proves potentially useful for distinguishing between tree species, yet the observed variances within each species suggest that relying solely on distance to neighbors may not suffice.

A critical consideration is the sample size problem, wherein more distinguishable patterns emerge with higher neighbor levels, but this necessitates a sufficiently large sample size. Unfortunately, deriving a clear relationship between sample size and the number of tree neighbors remains elusive in our current findings. This gap in understanding could be a pertinent subject for further research, delving into the intricate interplay between sample size and the effectiveness of neighbor analysis in species differentiation.

Code

lfa::lfa_create_boxplot(neighbors,value_column = "avg",category_column1 = "area",category_column2 = "specie", title = "Box plots for the average to the nearest neighbor across all species and areas")

Figure 12: Density plot of the average distance to the nearest neighbor (n = 100) distribution across all patches grouped by the dominant species.

3.2.4 Density of forest patches

Examining densities provides valuable insights into identifying the dominant species within patches. Spruce stands out as the densest species, surpassing all other patches. Following closely in density is Pine, as depicted in Figure 13.

Beech and Oak exhibit similar density levels, with Beech consistently denser across all patches. When comparing the highest density patches for each species, Beech consistently outpaces Oak. While Oak is slightly less dense overall (\(8.354499 \times 10^{-3} \frac{1}{m^2}\)) than Beech (\(8.727781 \times 10^{-3} \frac{1}{m^2}\)), the distinction in density remains noticeable.

Code

library(units)
lfa::lfa_calculate_patch_density() |>
  lfa::lfa_create_grouped_bar_plot(grouping_var = "species", value_col = "density", label_col = "name")

Figure 13: Barplot of the densitys of all patches (#detected trees/area of patch). Colorized by the dominant tree species of each patch.

In summary, our findings indicate that the density of each patch proves highly effective in distinguishing dominant species. Furthermore, the differentiation between conifers (Pine and Spruce) and deciduous trees (Beech and Oak) based on density aligns with patterns observed in the number of return points per detected tree. While distinguishing within conifers is straightforward, discerning between the deciduous tree species Beech and Oak, is possible but poses a moderate challenge.

3.2.5 Canopy Height Model

Code

chms <- lfa::lfa_visit_all_areas(lfa::lfa_chm)

In the upcoming section, we will delve into the examination of canopy heights within various patches. In order to acquire valuable insights, we extracted canopy height data from the point cloud, utilizing a resolution of \(0.5\times0.5\)m. Our subsequent analysis involved an exploration of the distribution patterns inherent in the canopy heights dataset.

Code

patches <- lfa::lfa_get_all_areas()
patches$chm_mean = NA
patches$chm_var = NA
patches$chm_median = NA
for (area_key in names(chms)) {
  area <- chms[area_key]
  area[[area_key]] |> as.vector() -> vec
  patches[patches$area == area_key, "chm_mean"] <-
    mean(vec, na.rm = T)
  patches[patches$area == area_key, "chm_var"] <-
    var(vec, na.rm = T)
  patches[patches$area == area_key, "chm_median"] <-
    median(vec, na.rm = T)
  
}

To initiate our examination, we will focus on the means within these distributions. Notably, pine exhibits relatively low mean values, indicating a consistent trend within this species. Conversely, the mean values for spruce are evenly distributed across the entire range of means, suggesting a broader variability within this species. In contrast, oak demonstrates relatively high mean values, indicative of a distinct pattern within the oak canopy heights. Moving on to beech, the mean values are moderate on average, with the exception of one outlier in Billerbeck, which boasts the highest mean canopy height among all patches. Interestingly, despite the absence of complete clustering based on dominant species, discernible trends emerge from the analysis. Overall, our exploration of means provides valuable insights into the varied patterns and trends present in canopy heights across different tree species.

Code

lfa::lfa_create_grouped_bar_plot(patches,"specie","chm_mean","area",ylab = "Mean", title = "Mean Canopy Height")

Figure 14: Mean Canopy Height across all patches colored by the dominant specie of each patch.

Shifting our focus to variances, our analysis reveals distinct patterns among different tree species. Beech canopy height exhibits notably high variances across various patches, with the exception of Bielefeld Brackwede. Similarly, spruce displays relatively high variances consistently across all patches. On the other hand, oak demonstrates moderate variances overall, but with one extreme outlier characterized by low variance in Rinkreode. In contrast, pine showcases relatively low variances across its patches.

It is noteworthy that while some clustering effects are observed, the data for beech and oak appears to be somewhat noisy. Interestingly, a trend emerges, indicating that conifers, such as spruce and pine, exhibit more pronounced clustering effects compared to deciduous trees. This variance analysis contributes to a deeper understanding of the diverse patterns and noise levels associated with canopy height data for different tree species.

Code

lfa::lfa_create_grouped_bar_plot(patches,"specie","chm_var","area",ylab = "Variance", title = "Variance of Canopy Height")

Figure 15: Variance Canopy Height across all patches colored by the dominant specie of each patch.

In the current examination, our focus turns to medians, offering additional insights into the central tendencies of canopy heights for various tree species. Beech exhibits high median values, accentuated by the presence of one outlier, namely Bielefeld Brackwede. Oak, on the other hand, displays median values ranging from moderate to relatively high, showcasing a consistent trend within this species. Contrastingly, pine reveals low median values, indicating a clustered distribution pattern. In the case of spruce, median values are spread out and unclustered, providing a distinctive characteristic for this coniferous species. The medians serve as valuable metrics, particularly for understanding the central tendencies within deciduous trees, as indicated by the noteworthy patterns observed in beech and oak.

Code

lfa::lfa_create_grouped_bar_plot(patches,"specie","chm_median","area",ylab = "Median", title = "Median Canopy Height")

Figure 16: Median Canopy Height across all patches colored by the dominant specie of each patch.

Diverse clustering effects emerge when considering different statistical measures, providing a preliminary overview of the canopy height distributions for various tree species. These statistics, including means, variances, and medians, offer a valuable initial perspective on the distribution patterns. However, a more in-depth examination of the distributions is imperative to gain a comprehensive overview and delve deeper into the nuances of the canopy height data for each tree species. This multi-statistical approach serves as an effective starting point for understanding the overarching patterns while emphasizing the necessity of further exploration to uncover the intricacies within the distributions.

Code

patches <- sf::st_read("./research_areas.shp")
result_df = NULL
for(patch in chms |> names()){
  species <- patches[patches$name == patch,]$species
  df <- chms[[patch]] |> as.data.frame()
  df$specie <- species
  df$area <- patch
  if(is.null(result_df)){
    result_df <- df
  } else {
    result_df <- dplyr::bind_rows(result_df,df)
  }
}

The scrutiny of density plots offers a nuanced exploration of the canopy height distributions for different tree species. For beech, the presence of one outlier, as next identified in the boxplot analysis, is evident. However, the overall shapes of the density curves for beech patches are notably similar. In the case of oak, the density curves exhibit similar shapes, with the peak of the Hamm patch appearing slightly lower compared to others, introducing a subtle variation within the species. Pine displays remarkably similar shapes across its density curves, characterized by higher peaks when compared to other species, suggesting a consistent and distinctive pattern within the pine patches. Turning to spruce, one outlier is identified, while the remaining density curves share similar shapes. Notably, the highest peaks in spruce patches are concentrated around zero values, indicative of a specific distribution characteristic within this coniferous species. The density plot analysis contributes to a more nuanced understanding of the distribution shapes and outlier presence within each tree species, enhancing the overall comprehension of canopy height variations across different patches.

Code

lfa::lfa_create_density_plots(result_df,"Z", xlims = c(0,40), title = "Density Plots for Canopy Height Distributions")

Density plots for the distribution of canopy heights (resolution \(0.5\times0.5\)m) across all patches grouped by dominant species

The examination of boxplots provides a more detailed perspective on the distribution characteristics of canopy heights for different tree species. In the case of beech, the boxes reveal limited shared domains, with one box significantly larger than the others, indicating a distinctive distribution pattern. For oak, numerous shared box connections are observed, accompanied by a prevalence of outliers in the lower domain, contributing to the complexity of the distribution. Pine exhibits notably similar medians, with boxes of comparable sizes, showcasing a clustered distribution pattern. Conversely, in the case of spruce, two boxplots display remarkable similarity in terms of box position, median, and antennas, while one box stands out significantly lower, suggesting a unique distribution pattern.

The boxplot analysis proves useful in capturing distribution details, highlighting both clustering effects and outliers. Notably, oak and pine present relatively clustered patterns, while beech and spruce show more variability, the latter with the presence of one outlier. This nuanced examination enhances our understanding of the canopy height distributions within each tree species.

Code

lfa::lfa_create_boxplot(result_df,"Z","area","specie", title = "Boxplots for Canopy Height Distributions")

Boxplots for the distribution of canopy heights (resolution \(0.5\times0.5\)m) across all patches grouped by dominant species

In summary, the exploration of canopy height proves to be a valuable endeavor, unveiling discernible patterns and differences among various tree species. Noteworthy clustering effects are evident, providing insights into the distinct characteristics of each species. However, it is important to acknowledge the presence of high variance within species, indicating considerable variability even within the same tree type. This underscores the complexity of canopy height distributions and emphasizes the need for a comprehensive and detailed analysis to capture the nuances inherent in the data. Overall, while overarching trends and differences are observable, the intricacies within each species’ canopy height distribution call for a thorough investigation to fully comprehend the extent of variability.

3.3 Random Forest Predictions

This section outlines the presentation of Random Forest results, encompassing three models, each trained on different parameters. Each chapter provides an explanation for the selection of a particular model, offering insights into the reasoning behind our choices.

3.3.1 Use neighbors and height

A Random Forest model was trained on tree heights (z-values) and the distance to the nearest 100 neighbors. Despite achieving only medium results, this straightforward model clearly demonstrates the feasibility of distinguishing between tree species.

Code

detections <- lfa::lfa_get_detections()
neighbors <- lfa::lfa_get_neighbor_paths() |> lfa::lfa_combine_sf_obj(lfa::lfa_get_all_areas())
neighbors <- sf::st_join(neighbors,detections, join = sf::st_within)
names(neighbors)[names(neighbors) == 'specie.x'] <- 'specie'
names(neighbors)[names(neighbors) == 'area.x'] <- 'area'
excluded_cols <- c("area.x","specie.x","treeID.y","Z.y","area.y","specie.y","geom","treeID.x","Z.x")

Code

data <- lfa::lfa_random_forest(tree_data = neighbors, excluded_input_columns = excluded_cols,response_variable = "specie")

Code

model.rf_neighbors <- data
save(model.rf_neighbors, file = "./models/neighbors.rData")

The classifier exhibits notable performance variations across different classes. Precision for the “Beech” class is high, indicating accurate positive predictions, but the lower recall suggests the possibility of missing some instances of “Beech.” In contrast, both precision and recall for the “Oak” and “Pine” classes are extremely low, highlighting challenges in accurately classifying instances. The “Spruce” class shows moderate precision and high recall, indicating comparatively better performance.

Code

cm <- data$confusion_matrix
lfa::lfa_plot_confusion_matrix(cm)

Figure 17: Confusion Matrix of randomForest on the distance to 100 nearest neighbors.

The model’s predictions for “Beech” are frequent but vary significantly, leading to substantial differences between Precision and Recall (see Figure 18). The model tends to make predictions that are either mostly correct or mostly incorrect, resulting in a homogenous prediction pattern, as illustrated in the confusion matrix (see Figure 17).

Code

data$confusion_matrix |> lfa::lfa_calculate_rf_metrics() |> lfa::lfa_visualize_rf_metrics()

Figure 18: Class wise precision and recall for randomForest-Classification with distance to the 100 nearest neighbors.

3.3.2 Enrich Neighbors with segmentation data

This model employs the same parameters as the preceding one, while incorporating additional features derived from the segmentation of each tree in the point cloud. The expanded feature set includes density of returns per tree, mean and variance of Z values of return points, mean and variance of intensity of LiDAR returns, number of LiDAR returns per tree, and the area in square meters of each tree. The aim is to compare the results obtained with this augmented feature set against those based solely on tree detection characteristics.

Code

data <- sf::st_read("./data/tree_properties.gpkg")
detections <- lfa::lfa_get_detections()
neighbors <- lfa::lfa_get_neighbor_paths() |> lfa::lfa_combine_sf_obj(lfa::lfa_get_all_areas())

Code

combined <- sf::st_join(data,detections,join = sf::st_within)

combined$Z.x = NULL
names(combined)[names(combined) == 'Z.y'] <- 'Z'

combined$treeID.segmentation <- NULL

combined[["density"]][is.na(combined[["density"]])] <- -1
combined[["Z.mean"]][is.na(combined[["Z.mean"]])] <- -1
combined[["Z.var"]][is.na(combined[["Z.var"]])] <- -1
combined[["Intensity.mean"]][is.na(combined[["Intensity.mean"]])] <- -1
combined[["Intensity.var"]][is.na(combined[["Intensity.var"]])] <- -1
combined[["number_of_returns"]][is.na(combined[["number_of_returns"]])] <- -1
combined[["tree_area"]][is.na(combined[["tree_area"]])] <- -1

neighbors$treeID = NULL
neighbors$Z = NULL
neighbors$area = NULL
neighbors$specie = NULL

combined = sf::st_join(combined, neighbors, sf::st_within)
excluded_cols <- c("Z.x", "treeID.detection","treeID.segmentation","name_las_file","treeID","area","specie","geom")

Code

data <- lfa::lfa_random_forest(tree_data = combined, excluded_input_columns = excluded_cols,response_variable = "specie")

Code

model.rf_segmentation_detection_params <- data
save(model.rf_segmentation_detection_params, file = "./models/segmentation_detection_params.rData")

Code

cm <- data$confusion_matrix
lfa::lfa_plot_confusion_matrix(cm)

Figure 19: Confusion Matrix of randomForest with all parameters derived from tree level.

The classifier exhibits suboptimal performance for the “Beech” and “Oak” classes, with low precision and recall values, indicative of challenges in accurately classifying instances for these categories. In contrast, the “Pine” class demonstrates higher precision and recall, suggesting improved performance in capturing true instances while minimizing false positives. The “Spruce” class displays moderate precision and high recall, indicating relatively better performance in capturing true instances with fewer false positives.

Notably, the overall accuracy (0.44) and Kappa (0.23) have significantly improved compared to the previous model (Accuracy = 0.35, Kappa = 0.14). The model demonstrates effectiveness for conifers but shows limitations for deciduous trees, as depicted in the confusion matrix at the tree-level (Figure 19). Specifically, “Oak” and “Beech” ground truth predictions are distributed among all classes, while “Pine” exhibits false negatives, especially for the “Spruce” class.

Additionally, this model achieves a more balanced distribution between Precision and Recall differences across classes, as illustrated in the Precision-Recall curve at the tree-level (Figure 20).

Code

data$confusion_matrix |> lfa::lfa_calculate_rf_metrics() |> lfa::lfa_visualize_rf_metrics()

Figure 20: Precsion and Recall of randomForest with all parameters derived from tree level.

3.3.3 Train with patch level information

In this phase of our study, we examined the impact of patch-level information on the model. The model training incorporated Z-Values, distance to the 100 nearest neighbors, density of trees within a patch, as well as statistics related to the canopy height, including mean, variance, and median.

Code

chms <- lfa::lfa_visit_all_areas(lfa::lfa_chm)
patches <- lfa::lfa_get_all_areas()
patches$chm_mean = NA
patches$chm_var = NA
patches$chm_median = NA
for (area_key in names(chms)) {
  area <- chms[area_key]
  area[[area_key]] |> as.vector() -> vec
  patches[patches$area == area_key, "chm_mean"] <-
    mean(vec, na.rm = T)
  patches[patches$area == area_key, "chm_var"] <-
    var(vec, na.rm = T)
  patches[patches$area == area_key, "chm_median"] <-
    median(vec, na.rm = T)
  
}

Code

neighbors <- lfa::lfa_get_neighbor_paths() |> lfa::lfa_combine_sf_obj(lfa::lfa_get_all_areas())

Code

detections <- lfa::lfa_get_detections()
density <- lfa::lfa_calculate_patch_density(detections = detections)
colnames(density) <- c("id","specie","area","geometry","area_size","detections","density")
detections <- dplyr::left_join(detections,density |> as.data.frame(),by=c("area","specie"))
detections <- dplyr::left_join(detections,patches, by = c("area","specie"))

detections <- sf::st_join(detections, neighbors, join = sf::st_within)

detections$treeID.x = NULL
names(detections)[names(detections) == 'treeID.y'] <- 'treeID'

detections$Z.x = NULL
names(detections)[names(detections) == 'Z.y'] <- 'Z'

detections$area.x = NULL
names(detections)[names(detections) == 'area.y'] <- 'area'

detections$specie.x = NULL
names(detections)[names(detections) == 'specie.y'] <- 'specie'

excluded_cols = c("treeID","geom","area","specie","id","geometry","area_size","detections","geometry")

Code

data <- lfa::lfa_random_forest(tree_data = detections, excluded_input_columns = excluded_cols,response_variable = "specie")

Code

model.rf_patch <- data
save(model.rf_patch, file = "./models/patch.rData")

Code

cm <- data$confusion_matrix
lfa::lfa_plot_confusion_matrix(cm)

Figure 21: Confusion Matrix of randomForest on returns per Tree.

The model yields favorable overall results, with an accuracy of 0.63 and a Kappa value of 0.49. Notably, predictions appear to be predominantly influenced by patch-level information, resulting in largely homogeneous predictions within most patches.

For the “Beech” and “Oak” classes, the classifier performs moderately well, exhibiting moderate precision and recall values. These values are within the same domain, with an equal balance between recall and precision for both classes.

In the case of the “Pine” class, high precision is accompanied by lower recall, indicating a conservative prediction approach. However, the classifier successfully captures a substantial portion of actual “Pine” instances. Noteworthy is the presence of outliers where “Pine” is falsely predicted as “Oak” (see Figure 21). These false predictions are concentrated in the Greffen patch, which contains a total of 513 detections. Notably, all significant Pine patches yield accurate predictions.

The “Spruce” class exhibits a good balance between precision and recall, with slightly better recall values. The deciduous vs. conifers comparison is notably successful, with conifers predicted mostly as true positives and false predictions for deciduous tree species predominantly leaning towards predicting deciduous trees.

Code

data$confusion_matrix |> lfa::lfa_calculate_rf_metrics() |> lfa::lfa_visualize_rf_metrics()

Figure 22: Class wise precision and recall for randomForest-Classification with LiDAR returns per tree.

4 Discussion

4.1 Findings

Now, we will delve into the findings derived from the previously presented results, discussing them in the context of the hypotheses outlined in the Introduction. As we move forward, the discussion will delve into the alignment of our findings with the initial hypotheses, shedding light on the effectiveness of the employed methodology and the insights gained from the analysis.

LiDAR data can effectively differentiate between tree species in monocultural forests in NRW, and there are statistically significant differences in the LiDAR-derived metrics among the various tree species.

The statement provided offers a nuanced perspective on the effectiveness of tree species differentiation based on the analysis conducted. It is acknowledged that the effectiveness of differentiation is intricately tied to the chosen criteria and the specific species under consideration. Notably, certain patch-level statistics, such as tree density, emerge as robust indicators for distinguishing between tree species. However, when scrutinizing the distribution of characteristics at the individual tree level, the variance between the distributions of patches from the same species is often not significantly smaller than the variance between distributions of different species. This observation suggests that relying solely on characteristics at the tree level might not yield distinct species differentiation. An interesting finding is the effectiveness of distinguishing between conifers and deciduous trees, with notable differences, particularly concerning Spruce trees. The identified dissimilarities, especially when comparing Spruce trees to other species, underscore the potential for effective differentiation in certain instances. The conclusion drawn emphasizes the importance of generating a comprehensive set of statistics for a defined area of interest and comparing the results to sample patches. This approach provides a robust indicator of the species distribution within the specified area. Importantly, the recommendation is against solely relying on one selected parameter, given the relatively high variance observed within the same species. Instead, a multifaceted analysis involving various parameters proves more effective in characterizing and distinguishing tree species in a given region.

Random Forest classification can be used to predict the tree species of trees in monoculteral forests.

The presented statement accurately captures the key findings and insights derived from the analysis of the tree models. It is affirmed that all tree models can be partially utilized to distinguish between tree species. However, not all models demonstrated consistently good results across all classes, with oak scoring moderately to low on the tree models. The observation that utilizing patch-level information for training the model enhances performance aligns with the understanding that aggregating information at a higher level of abstraction can provide more robust insights. Furthermore, the notable success in distinguishing between conifers and deciduous trees, especially with the last model, highlights the efficacy of the approach, showcasing high accuracy in this particular differentiation. The conclusion drawn, emphasizing the effectiveness of looking into all tree parameters for distinguishing between inspected tree species using the random forest method, underscores the comprehensive nature of the analysis. It suggests that a holistic consideration of multiple parameters enhances the model’s ability to accurately classify and distinguish between different tree species.

Random Forest classification performance varies significantly depending on the set of LiDAR parameters used, indicating that certain combinations of parameters contribute more effectively to accurate tree species classification.

The provided statement accurately characterizes the findings regarding the quality of results obtained from the three models and highlights important nuances in the analysis. Indeed, the quality of results exhibits high variance across all three models, considering overall statistics such as accuracy and Kohen’s Kappa, as well as class-specific measures like precision and recall. The mention of presenting only some highlights of the researched parameter combinations emphasizes that the models’ performance is influenced by a multitude of factors. The observation that incorporating segmentation parameters into the last model leads to a decrease in performance underscores the sensitivity of the model to specific parameter combinations. The statement further notes the high dependency of model performance on the utilized tree characteristics, with significant variations across different classes. For instance, the predictability of beech is highlighted, showcasing that certain parameters, such as tree heights and distance to neighbors, play a more crucial role in accurate predictions compared to others. The insight that another machine learning approach that can recognize the importance of specific parameters for the prediction of a certain class might be beneficial suggests avenues for further exploration and model refinement. Additionally, the variation in the power of differentiation between conifers and deciduous trees among the three models, with the last model outperforming others, reinforces the significance of the approach that incorporates patch-level information. This observation hints at the importance of considering a holistic view of the forest rather than relying solely on individual tree characteristics for certain tasks. In conclusion, the statement provides a comprehensive overview of the complexities and considerations involved in interpreting and optimizing the machine learning models for tree species prediction based on the dataset and analysis conducted.

4.2 Limitations

Our approach, while yielding valuable insights, is not without its set of assumptions and limitations, each deserving careful consideration to grasp the nuances of our findings.

Firstly, the assumption that all trees within the chosen areas of interest share the same species is integral to our methodology. While this assumption facilitates the development of predictive models, we acknowledge that 100% monocultural forests are rare in reality. Our validation process, primarily relying on Sentinel images, raises potential concerns regarding the inadvertent selection of mixed forests. Such a scenario could significantly impact the distribution of tree characteristics and introduce inaccuracies in the ground truth used for training the random forest algorithm. The temporal aspect of LiDAR data collection is a crucial factor that can significantly impact various LiDAR characteristics. LiDAR data may be captured at different seasons of the year, and these seasonal variations can introduce substantial differences in the derived information. For instance, the number of returns and intensities in LiDAR data can be highly influenced by the season during which the data is collected. Seasonal changes in vegetation, such as leaf growth or shedding, can impact the LiDAR returns. Another key assumption revolves around the representation of naturally grown forests in our areas of interest. Given the absence of true natural forests in Germany and North Rhine-Westphalia (NRW) due to forestry practices, our study areas may consist of newly planted or managed forests with differing characteristics. While this could potentially influence our analysis, we consider the impact relatively minor given the prevalent managed nature of forests in the region.

We assumed the independence of each forest in our study from its spatial environment. However, forests are inherently influenced by various factors such as neighborhood type, altitude, climate, and the presence of fauna. Despite our efforts to draw samples from diverse locations in NRW, these environmental variations persist, introducing a layer of complexity to our analysis. The relatively small sample size, with only three areas of interest per species, is another limitation. Increasing the sample size would undoubtedly enhance the robustness of our analysis and mitigate potential drawbacks associated with the limited number of samples. The correctness of our detection process, reliant on the accuracy of the LiDAR point cloud, is a crucial aspect. While the algorithm employed is established and widely used, the precision of our results hinges on the accuracy of the LiDAR detection process. Lastly, we acknowledge technical limitations, notably the tile-based segmentation approach used for processing efficiency. The choice of \(50\times50\)m tiles, while suitable for practical hardware constraints, introduces potential challenges near tile borders. Trees located at these borders may be counted as two separate trees during the segmentation process, impacting our results.

In conclusion, understanding these assumptions and limitations is paramount for a nuanced interpretation of our results. Each aspect informs the intricacies of our methodology and underscores potential areas for refinement in future studies.

4.3 Further Work

Our work has laid the foundation for further exploration and refinement in the field of tree species prediction and analysis. As we reflect on our findings, several avenues for future research and optimization strategies emerge.

One promising direction involves delving into the predictability of other tree species. While our study focused on Beech, Oak, Pine, and Spruce, primarily due to their regional relevance, investigating the performance of our methods on a broader spectrum of species could provide valuable insights into the generalizability of our models.

Expanding the geographical scope of our approach is another intriguing prospect. Our study was confined to a localized area in North Rhine-Westphalia (NRW). Testing the project’s performance in diverse regions with similar species could unravel variations influenced by different environmental conditions. Additionally, exploring potential spatial trends in derived parameters and model performance could enhance our understanding of broader patterns.

If tree-level datasets are available, future research could test the derived parameters without the assumption that all trees within a patch share the same species. This would offer a more nuanced representation of individual tree characteristics and potentially enhance the accuracy of the random forest models.

Quantitative parameter selection methods present an opportunity for optimization. Moving beyond a trial-and-error approach, incorporating metrics like Kullback-Leibler Divergence (KLD) and Jensen-Shannon Divergence (JSD) into the selection process could provide a systematic and data-driven way to identify relevant parameters. Advanced techniques, such as reinforced learning methods, may also hold promise in this context.

Dataset quality optimization, particularly the removal of outliers, is another avenue for exploration. Many character distributions in our dataset exhibited outliers, and assessing the impact of outlier removal on random forest accuracy could be a valuable optimization strategy.

Finally, exploring additional tree statistics beyond those considered in our current study could enrich our understanding. This might involve investigating intensity distribution, the number of returns distribution on the z-axis per tree, and more sophisticated spatial point pattern analyses (Pebesma and Bivand 2023). Techniques like Ripley’s reduced second moment function (Ripley 1977) could offer deeper insights into spatial patterns.

Incorporating these avenues into future research endeavors has the potential to advance the field, refining methodologies and expanding the applicability of tree species prediction and analysis. Our work, while a significant step forward, lays the groundwork for a more comprehensive exploration of the intricacies within this domain.

4.4 Conclusion

Our research aimed to investigate the usability of LiDAR data in distinguishing between tree species in monocultural forests, contributing to the field of forest monitoring. The overall success of our approach suggests promising outcomes. However, it’s important to acknowledge and further address certain limitations identified in our work. Future endeavors in this line of research could delve deeper into these limitations to enhance the robustness and applicability of the methodology. The potential utility of our approach in forest monitoring applications is a significant finding. The ability to distinguish between tree species in monocultural forests using LiDAR data offers valuable insights for forest management and ecological studies. The success of our methodology lays a foundation for continued research and application in real-world scenarios, potentially influencing forest monitoring practices.

In summary, our research has demonstrated the viability of using LiDAR data for distinguishing tree species in monocultural forests, contributing to the broader field of forest monitoring. By addressing and expanding upon the identified limitations, our approach has the potential to become a valuable tool in practical forest management and environmental research.

5 References

n.d. Naturefund. Naturefund e. V. https://www.naturefund.de/en/information/afforestation/forests_importance_and_function/the_importance_of_forests.

Agency, European Space. 2024. “Sentinel-1 - Missions - Sentinel Online - Sentinel Online — Sentinels.copernicus.eu.” https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-1.

Alvarez, Sergio A. 2002. “An Exact Analytical Relation Among Recall, Precision, and Classification Accuracy in Information Retrieval.” Boston College, Boston, Technical Report BCCS-02-01, 1–22.

Blickensdoerfer, Lukas. 2022. “Dominant Tree Species for Germany (2017/2018).” Waldatlas- Wald Und Waldnutzung. Thünen Atlas. https://atlas.thuenen.de/layers/geonode:Dominant_Species_Class.

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45: 5–32.

Brovelli, Maria Antonia, Mattia Crespi, Francesca Fratarcangeli, Francesca Giannone, and Eugenio Realini. 2008. “Accuracy Assessment of High Resolution Satellite Imagery Orientation by Leave-One-Out Method.” ISPRS Journal of Photogrammetry and Remote Sensing 63 (4): 427–40.

Brownlee, Jason. 2019. “How to Calculate the KL Divergence for Machine Learning.” MachineLearningMastery.com. https://machinelearningmastery.com/divergence-between-probability-distributions/.

Cleverdon, Cyril. 1967. “The Cranfield Tests on Index Language Devices.” In Aslib Proceedings, 19:173–94. 6. MCB UP Ltd.

Cremer, Felix, Mikhail Urbazaev, José Cortés, John Truckenbrodt, Christiane Schmullius, and Christian Thiel. 2020. “Potential of Recurrence Metrics from Sentinel-1 Time Series for Deforestation Mapping.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13: 5233–40.

Danel, Jakob, and Frederick Bruch. 2024. Lfa: LiDAR Forest Analysis: Diversity of Tree Species in an Oecosystem. https://github.com/jakobdanel/lidar-forest-analysis.

Dostálová, Alena, Mait Lang, Janis Ivanovs, Lars T Waser, and Wolfgang Wagner. 2021. “European Wide Forest Classification Based on Sentinel-1 Data.” Remote Sensing 13 (3): 337.

Fleiss, Joseph L, Jacob Cohen, and Brian S Everitt. 1969. “Large Sample Standard Errors of Kappa and Weighted Kappa.” Psychological Bulletin 72 (5): 323.

Gonzalez, Diana. “DLR - Raumfahrtagentur - Lidar — Dlr.de.” https://www.dlr.de/rd/desktopdefault.aspx/tabid-5626/9178_read-17527/.

Grosse, Ivo, Pedro Bernaola-Galván, Pedro Carpena, Ramón Román-Roldán, Jose Oliver, and H Eugene Stanley. 2002. “Analysis of Symbolic Sequences Using the Jensen-Shannon Divergence.” Physical Review E 65 (4): 041905.

Kullback, Solomon. 1951. “Kullback-Leibler Divergence.”

Li, Wenkai, Qinghua Guo, Marek Jakubowski, and Maggi Kelly. 2012. “A New Method for Segmenting Individual Trees from the Lidar Point Cloud.” Photogrammetric Engineering and Remote Sensing 78 (January): 75–84. https://doi.org/10.14358/PERS.78.1.75.

Mori, Akira S., Kenneth P. Lertzman, and Lena Gustafsson. 2017. “Biodiversity and Ecosystem Services in Forest Ecosystems: A Research Agenda for Applied Forest Ecology.” Journal of Applied Ecology 54 (1): 12–27. https://doi.org/https://doi.org/10.1111/1365-2664.12669.

Müller, Martin, and Nadja Imhof. 2019. “Käferkämpfe: Borkenkäfer Und Landschaftskonflikte Im Nationalpark Bayerischer Wald.” Landschaftskonflikte, 313–29.

Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.

Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in r. CRC Press.

Popescu, Sorin, and Randolph Wynne. 2004. “Seeing the Trees in the Forest: Using Lidar and Multispectral Data Fusion with Local Filtering and Variable Window Size for Estimating Tree Height.” Photogrammetric Engineering and Remote Sensing 70 (May): 589–604. https://doi.org/10.14358/PERS.70.5.589.

Ripley, Brian D. 1977. “Modelling Spatial Patterns.” Journal of the Royal Statistical Society: Series B (Methodological) 39 (2): 172–92.

Roussel, Jean-Romain, David Auty, Nicholas C. Coops, Piotr Tompalski, Tristan R. H. Goodbody, Andrew Sánchez Meador, Jean-François Bourdon, Florian de Boissieu, and Alexis Achim. 2020. “lidR: An r Package for Analysis of Airborne Laser Scanning (ALS) Data.” Remote Sensing of Environment 251: 112061. https://doi.org/10.1016/j.rse.2020.112061.

Sobczyk, Thomas. 2014. Der Eichenprozessionsspinner in Deutschland: Historie, Biologie, Gefahren, Bekämpfung. Deutschland/Bundesamt für Naturschutz.

Sparck Jones, Karen. 1972. “A Statistical Interpretation of Term Specificity and Its Application in Retrieval.” Journal of Documentation 28 (1): 11–21.

Szostak, Marta, Paweł Hawryło, and Dobrosława Piela. 2018. “Using of Sentinel-2 Images for Automation of the Forest Succession Detection.” European Journal of Remote Sensing 51 (1): 142–49.

Valavi, Roozbeh, Jane Elith, José J Lahoz-Monfort, and Gurutzeta Guillera-Arroita. 2018. “blockCV: An r Package for Generating Spatially or Environmentally Separated Folds for k-Fold Cross-Validation of Species Distribution Models.” Biorxiv, 357798.

Welle, Torsten, Lukas Aschenbrenner, Kevin Kuonath, Stefan Kirmaier, and Jonas Franke. 2022. “Mapping Dominant Tree Species of German Forests.” Remote Sensing 14 (14). https://doi.org/10.3390/rs14143330.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

6 Appendix

6.1 Contributions

Most of the work was done in pair programming. In the following sections we highlight, our key contributions to the project.

6.1.1 Frederick

My contributions to the project encompassed several areas: data acquisition, preprocessing, visualization, random forest implementation, and canopy height modeling.

In data acquisition, I sourced relevant datasets, particularly focusing on obtaining patches where one tree species exhibited dominance, navigating through existing datasets such as the Waldmonitor dataset and utilizing maps indicating dominant tree species in Nordrhein-Westfalia (NRW). Ensuring the validity of the selected patches, considering factors such as patch size and species diversity was a challenge I faced, teaching me the importance of meticulous data selection and validation.

Preprocessing involved essential steps such as tiling and height normalization to standardize the data and facilitate further analysis, with the challenge lying in optimizing these steps to retain relevant information while minimizing noise and artifacts, deepening my understanding of data standardization’s impact on downstream analyses.

Visualizing the data through density plots and boxplots was instrumental in gaining insights into the distribution and characteristics of the variables under study, with the challenge being to balance clarity and complexity, enhancing my visualization skills and effective communication of key findings.

Implementing the Random Forest algorithm for species classification involved model training, validation, and performance evaluation, with generating scores and confusion matrices allowing us to assess the model’s accuracy and identify areas for improvement, deepening my understanding of machine learning techniques.

Developing a canopy height model provided valuable insights into remote sensing techniques and their applicability in forestry research.

Overall, my contributions to the project have enriched my skills and knowledge in research methods and data analysis techniques, contributing to the project’s success while broadening my expertise in the field of forestry and environmental science.

6.1.2 Jakob

In the broader context of a comprehensive lidar data analysis project, my contributions were instrumental in advancing key components. The preprocessing phase saw enhancements in Intersection, Detection, and Segmentation processes. I played a pivotal role in refining Intersection by overlaying lidar data onto shapefiles, ensuring a coherent file structure. Additionally, my efforts in Detection focused on optimizing tree identification within the intersected data. I developed the tree Segmentation too.

In terms of visualization, my contributions elevated the understanding of lidar data. The presentation of the confusion matrix as a colored table provided a comprehensive view of classification performance. Through the implementation of bar plots for patch-level data, I contributed to insights into the distribution of features. Precision and recall metrics were effectively communicated via bar plots, offering a nuanced evaluation of the model’s performance. The integration of Canopy Height Plots further enriched the visual representation of canopy structures.

Within the tree-level distribution analysis, my role extended to Data Preparation and the application of KLD and JSD. I ensured that the dataset was well-structured and prepared for analysis. Moreover, my involvement in researching methods, implementing calculations, and dynamically building result tables for KLD and JSD significantly contributed to the depth of the analysis.

In the development of an interface for Lidar data, I played a crucial role in designing a structured file system for efficient data storage. The implementation of various functions, including lfa::map_tile_locations, lfa::visit_all_areas, and lfa::combine_sf_object, showcased my commitment to enhancing data iteration and combination processes.

Throughout the project, my contributions led to valuable learnings for the team. I actively addressed challenges associated with handling big data in R and underscored the importance of structured data storage. The development of an R-package was a testament to my commitment to enhancing functionality and promoting reusability. Additionally, my insights deepened the team’s understanding of package workflows in R.

Despite encountering challenges with lidR mapping functions, my proactive approach to debugging, including the development of a custom mapping solution, showcased my resilience and commitment to overcoming obstacles. In summary, my contributions significantly shaped the success and advancement of the broader lidar data analysis project.

6.2 Script which can be used to do all preprocessing

Load the file with the research areas

sf <- sf::read_sf(here::here("research_areas.shp"), quiet = TRUE)

Init the project

library(lfa)
sf::sf_use_s2(FALSE)
locations <- lfa_init("research_areas.shp")

Do all of the preprocessing steps

#lfa::lfa_map_tile_locations(locations, lfa::retile,check_flag = "retile")

lfa::lfa_map_tile_locations(locations, lfa::lfa_intersect_areas, ctg = NULL, areas_sf = sf,check_flag = "intersect")
lfa::lfa_map_tile_locations(locations, lfa::lfa_ground_correction, ctg = NULL,check_flag = "z_correction")
lfa::lfa_map_tile_locations(locations, lfa::lfa_segmentation, ctg = NULL,check_flag = "segmentation")
lfa::lfa_map_tile_locations(locations, lfa::lfa_detection, catalog = NULL, write_to_file = TRUE,check_flag = "detection")

6.3 Canopy Height Models

The following section contains all canopy height models of the researched patches, plotted.

library(raster)

Loading required package: sp

chms <- lfa::lfa_visit_all_areas(lfa::lfa_chm)

[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/bielefeld_brackwede/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/billerbeck/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/wuelfenrath/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/hamm/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/muenster/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/rinkerode/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/greffen/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/mesum/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/telgte/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/brilon/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/oberhundem/chm.tif"
[1] "Load /home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/osterwald/chm.tif"

chms$rinkerode[chms$rinkerode > 50] <- NA

Code

chms$bielefeld_brackwede |> plot(main = "Canopy Height Model: Bielefeld Brackwede (Beech")

Canopy Height Model of the patch Bielefeld Brackwede (Beech) colorized by the canopy height.

Code

chms$billerbeck |> plot(main = "Canopy Height Model: Billerbeck (Beech")

Canopy Height Model of the patch Billerbeck (Beech) colorized by the canopy height.

Code

chms$wuelfenrath |> plot(main = "Canopy Height Model: Wülfenrath (Beech")

Canopy Height Model of the patch Wülfenrath (Beech) colorized by the canopy height.

Code

chms$hamm |> plot(main = "Canopy Height Model: Hamm (Oak)")

Canopy Height Model of the patch Hamm (Oak) colorized by the canopy height.

Code

chms$muenster |> plot(main = "Canopy Height Model: Münster (Oak)")

Canopy Height Model of the patch Münster (Oak) colorized by the canopy height.

Code

chms$rinkerode |> plot(main = "Canopy Height Model: Rinkerode (Oak)")

Canopy Height Model of the patch Rinkerode (Oak) colorized by the canopy height.

Code

chms$greffen |> plot(main = "Canopy Height Model: Greffen (Pine)")

Canopy Height Model of the patch Greffen (Pine) colorized by the canopy height.

Code

chms$mesum |> plot(main = "Canopy Height Model: Mesum (Pine)")

Canopy Height Model of the patch Mesum (Pine) colorized by the canopy height.

Code

chms$telgte |> plot(main = "Canopy Height Model: Telgte (Pine)")

Canopy Height Model of the patch Telgte (Pine) colorized by the canopy height.

Code

chms$brilon |> plot(main = "Canopy Height Model: Brilon (Spruce)")

Canopy Height Model of the patch Brilon (Spruce) colorized by the canopy height.

Code

chms$oberhundem |> plot(main = "Canopy Height Model: Oberhundem (Spruce)")

Canopy Height Model of the patch Oberhundem (Spruce) colorized by the canopy height.

Code

chms$osterwald |> plot(main = "Canopy Height Model: Osterwald (Spruce)")

Canopy Height Model of the patch Osterwald (Spruce) colorized by the canopy height.

6.4 Quantitative Results

6.4.1 Distribution of Z-Values

Code

data <- lfa::lfa_get_detections()
value_column <- "Z"

Kullback-Leibler-Divergence

Code

kld_results_specie <- lfa::lfa_run_test_asymmetric(data,value_column,"specie",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_specie,"Kullback-Leibler-Divergence between species")

Table 2: Kullback-Leibler-Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute z-values
	Beech	Oak	Pine	Spruce
Beech	0.0	13.2	12.5	0.76
Oak	4.2	0.0	3.4	5.02
Pine	2.3	5.6	0.0	3.95
Spruce	2.4	14.7	16.1	0.00

colMeans(kld_results_specie, na.rm = TRUE) |> mean()

[1] 5.252696

Code

specie <- data[data$specie=="beech",]
kld_results_beech <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_beech,"Kullback-Leibler-Divergence between areas with beech")

Table 3: Kullback-Leibler-Divergence between the researched areas which have the dominante specie beech for the atrribute z-values
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0.00	0.4	3.1
Billerbeck	0.27	0.0	6.0
Wuelfenrath	1.13	2.4	0.0

colMeans(kld_results_beech, na.rm = TRUE) |> mean()

[1] 1.473353

Code

specie <- data[data$specie=="oak",]
kld_results_oak <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_oak,"Kullback-Leibler-Divergence between areas with oak")

Table 4: Kullback-Leibler-Divergence between the researched areas which have the dominante specie oak for the atrribute z-values
	Hamm	Muenster	Rinkerode
Hamm	0.0	2.1	16
Muenster	0.4	0.0	17
Rinkerode	7.6	17.8	0

colMeans(kld_results_oak, na.rm = TRUE) |> mean()

[1] 6.779863

Code

specie <- data[data$specie=="pine",]
kld_results_pine <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_pine,"Kullback-Leibler-Divergence between areas with pine")

Table 5: Kullback-Leibler-Divergence between the researched areas which have the dominante specie pine for the atrribute z-values
	Greffen	Mesum	Telgte
Greffen	0.00	0.74	16
Mesum	0.43	0.00	18
Telgte	3.87	6.82	0

colMeans(kld_results_pine, na.rm = TRUE) |> mean()

[1] 5.129383

Code

specie <- data[data$specie=="spruce",]
kld_results_spruce <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_spruce,"Kullback-Leibler-Divergence between areas with spruce")

Table 6: Kullback-Leibler-Divergence between the researched areas which have the dominante specie spruce for the atrribute z-values
	Brilon	Oberhundem	Osterwald
Brilon	0.000	0.092	1.7
Oberhundem	0.081	0.000	2.1
Osterwald	1.521	2.178	0.0

colMeans(kld_results_spruce, na.rm = TRUE) |> mean()

[1] 0.8509258

Jensen-Shannon Divergence

Code

jsd_results_specie <- lfa::lfa_run_test_symmetric(data,value_column,"specie",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_specie,"Jensen-Shannon Divergence between species")

Table 7: Jensen-Shannon Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute z-values
	Beech	Oak	Pine	Spruce
Beech	0	4.5	4.6	2.4
Oak	NA	0.0	3.9	6.1
Pine	NA	NA	0.0	7.1
Spruce	NA	NA	NA	0.0

colMeans(jsd_results_specie, na.rm = TRUE) |> mean()

[1] 2.246663

Code

specie <- data[data$specie=="beech",]
jsd_results_beech <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_beech,"Jensen-Shannon Divergence between areas with beech")

Table 8: Jensen-Shannon Divergence between the researched areas which have the dominante specie beech for the atrribute z-values
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0	1.1	3.3
Billerbeck	NA	0.0	4.9
Wuelfenrath	NA	NA	0.0

colMeans(jsd_results_beech, na.rm = TRUE) |> mean()

[1] 1.10555

Code

specie <- data[data$specie=="oak",]
jsd_results_oak <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_oak,"Jensen-Shannon Divergence between areas with oak")

Table 9: Jensen-Shannon Divergence between the researched areas which have the dominante specie oak for the atrribute z-values
	Hamm	Muenster	Rinkerode
Hamm	0	1.6	6.5
Muenster	NA	0.0	6.4
Rinkerode	NA	NA	0.0

colMeans(jsd_results_oak, na.rm = TRUE) |> mean()

[1] 1.692942

Code

specie <- data[data$specie=="pine",]
jsd_results_pine <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_pine,"Jensen-Shannon Divergence between areas with pine")

Table 10: Jensen-Shannon Divergence between the researched areas which have the dominante specie pine for the atrribute z-values
	Greffen	Mesum	Telgte
Greffen	0	3.1	12
Mesum	NA	0.0	10
Telgte	NA	NA	0

colMeans(jsd_results_pine, na.rm = TRUE) |> mean()

[1] 2.956354

Code

specie <- data[data$specie=="spruce",]
jsd_results_spruce <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_spruce,"Jensen-Shannon Divergence between areas with spruce")

Table 11: Jensen-Shannon Divergence between the researched areas which have the dominante specie spruce for the atrribute z-values
	Brilon	Oberhundem	Osterwald
Brilon	0	0.31	4.0
Oberhundem	NA	0.00	5.5
Osterwald	NA	NA	0.0

colMeans(jsd_results_spruce, na.rm = TRUE) |> mean()

[1] 1.100383

6.4.2 Nearest Neighbours

Distribution of nearest neighbor distances

Code

data <- lfa::lfa_combine_sf_obj(lfa::lfa_get_neighbor_paths(),lfa::lfa_get_all_areas())

Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/bielefeld_brackwede/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1443 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 466999.8 ymin: 5759839 xmax: 467617.1 ymax: 5760261
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/billerbeck/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1732 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 384890.8 ymin: 5761918 xmax: 385590.9 ymax: 5762478
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/wuelfenrath/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2779 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 365546.3 ymin: 5683711 xmax: 366356.1 ymax: 5684321
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/hamm/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2441 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 420953.3 ymin: 5723884 xmax: 421596 ymax: 5724609
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/muenster/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1270 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 404615.6 ymin: 5752535 xmax: 405396.8 ymax: 5752971
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/rinkerode/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1643 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 408428.2 ymin: 5746021 xmax: 409014.8 ymax: 5746511
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/greffen/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 513 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 442816.1 ymin: 5760217 xmax: 443148.9 ymax: 5760567
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/mesum/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 5031 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 399930.6 ymin: 5790412 xmax: 400969.7 ymax: 5790950
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/telgte/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 3368 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 416135.1 ymin: 5761663 xmax: 416697.1 ymax: 5762477
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/brilon/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 3342 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 467305.7 ymin: 5695055 xmax: 467996.9 ymax: 5695593
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/oberhundem/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2471 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 442631.7 ymin: 5660096 xmax: 443309.5 ymax: 5660502
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/osterwald/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2806 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 455822 ymin: 5673761 xmax: 456483.2 ymax: 5674162
Projected CRS: ETRS89 / UTM zone 32N

Code

value_column <- "Neighbor_1"

Kullback-Leibler-Divergence

Code

kld_results_specie <- lfa::lfa_run_test_asymmetric(data,value_column,"specie",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_specie,"Kullback-Leibler-Divergence between species")

Table 12: Kullback-Leibler-Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute nearest-neighbor-1
	Beech	Oak	Pine	Spruce
Beech	0.000	0.029	0.40	3.3
Oak	0.031	0.000	0.25	3.9
Pine	0.213	0.128	0.00	4.9
Spruce	2.735	3.199	4.52	0.0

colMeans(kld_results_specie, na.rm = TRUE) |> mean()

[1] 1.477983

Code

specie <- data[data$specie=="beech",]
kld_results_beech <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_beech,"Kullback-Leibler-Divergence between areas with beech")

Table 13: Kullback-Leibler-Divergence between the researched areas which have the dominante specie beech for the atrribute nearest-neighbor-1
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0.000	0.35	0.051
Billerbeck	0.380	0.00	0.138
Wuelfenrath	0.059	0.15	0.000

colMeans(kld_results_beech, na.rm = TRUE) |> mean()

[1] 0.1249588

Code

specie <- data[data$specie=="oak",]
kld_results_oak <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_oak,"Kullback-Leibler-Divergence between areas with oak")

Table 14: Kullback-Leibler-Divergence between the researched areas which have the dominante specie oak for the atrribute nearest-neighbor-1
	Hamm	Muenster	Rinkerode
Hamm	0.000	0.079	0.078
Muenster	0.092	0.000	0.019
Rinkerode	0.086	0.020	0.000

colMeans(kld_results_oak, na.rm = TRUE) |> mean()

[1] 0.04167636

Code

specie <- data[data$specie=="pine",]
kld_results_pine <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_pine,"Kullback-Leibler-Divergence between areas with pine")

Table 15: Kullback-Leibler-Divergence between the researched areas which have the dominante specie pine for the atrribute nearest-neighbor-1
	Greffen	Mesum	Telgte
Greffen	0.00	0.495	0.258
Mesum	0.48	0.000	0.098
Telgte	0.22	0.076	0.000

colMeans(kld_results_pine, na.rm = TRUE) |> mean()

[1] 0.1812239

Code

specie <- data[data$specie=="spruce",]
kld_results_spruce <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_spruce,"Kullback-Leibler-Divergence between areas with spruce")

Table 16: Kullback-Leibler-Divergence between the researched areas which have the dominante specie spruce for the atrribute nearest-neighbor-1
	Brilon	Oberhundem	Osterwald
Brilon	0.00	0.67	5.1
Oberhundem	0.41	0.00	7.2
Osterwald	6.09	6.23	0.0

colMeans(kld_results_spruce, na.rm = TRUE) |> mean()

[1] 2.863587

Jensen-Shannon Divergence

Code

jsd_results_specie <- lfa::lfa_run_test_symmetric(data,value_column,"specie",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_specie,"Jensen-Shannon Divergence between species")

Table 17: Jensen-Shannon Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute nearest-neighbor-1
	Beech	Oak	Pine	Spruce
Beech	0	0.22	2.1	9.3
Oak	NA	0.00	1.3	10.6
Pine	NA	NA	0.0	14.7
Spruce	NA	NA	NA	0.0

colMeans(jsd_results_specie, na.rm = TRUE) |> mean()

[1] 2.470051

Code

specie <- data[data$specie=="beech",]
jsd_results_beech <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_beech,"Jensen-Shannon Divergence between areas with beech")

Table 18: Jensen-Shannon Divergence between the researched areas which have the dominante specie beech for the atrribute nearest-neighbor-1
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0	2.2	0.39
Billerbeck	NA	0.0	0.85
Wuelfenrath	NA	NA	0.00

colMeans(jsd_results_beech, na.rm = TRUE) |> mean()

[1] 0.5042359

Code

specie <- data[data$specie=="oak",]
jsd_results_oak <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_oak,"Jensen-Shannon Divergence between areas with oak")

Table 19: Jensen-Shannon Divergence between the researched areas which have the dominante specie oak for the atrribute nearest-neighbor-1
	Hamm	Muenster	Rinkerode
Hamm	0	0.57	0.61
Muenster	NA	0.00	0.17
Rinkerode	NA	NA	0.00

colMeans(jsd_results_oak, na.rm = TRUE) |> mean()

[1] 0.1803836

Code

specie <- data[data$specie=="pine",]
jsd_results_pine <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_pine,"Jensen-Shannon Divergence between areas with pine")

Table 20: Jensen-Shannon Divergence between the researched areas which have the dominante specie pine for the atrribute nearest-neighbor-1
	Greffen	Mesum	Telgte
Greffen	0	3.6	1.89
Mesum	NA	0.0	0.68
Telgte	NA	NA	0.00

colMeans(jsd_results_pine, na.rm = TRUE) |> mean()

[1] 0.891592

Code

specie <- data[data$specie=="spruce",]
jsd_results_spruce <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_spruce,"Jensen-Shannon Divergence between areas with spruce")

Table 21: Jensen-Shannon Divergence between the researched areas which have the dominante specie spruce for the atrribute nearest-neighbor-1
	Brilon	Oberhundem	Osterwald
Brilon	0	4.1	16
Oberhundem	NA	0.0	18
Osterwald	NA	NA	0

colMeans(jsd_results_spruce, na.rm = TRUE) |> mean()

[1] 4.471632

Distribution of distances to 100th nearest neighbor

Code

data <- lfa::lfa_combine_sf_obj(lfa::lfa_get_neighbor_paths(),lfa::lfa_get_all_areas())

Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/bielefeld_brackwede/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1443 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 466999.8 ymin: 5759839 xmax: 467617.1 ymax: 5760261
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/billerbeck/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1732 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 384890.8 ymin: 5761918 xmax: 385590.9 ymax: 5762478
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/wuelfenrath/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2779 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 365546.3 ymin: 5683711 xmax: 366356.1 ymax: 5684321
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/hamm/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2441 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 420953.3 ymin: 5723884 xmax: 421596 ymax: 5724609
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/muenster/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1270 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 404615.6 ymin: 5752535 xmax: 405396.8 ymax: 5752971
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/rinkerode/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1643 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 408428.2 ymin: 5746021 xmax: 409014.8 ymax: 5746511
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/greffen/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 513 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 442816.1 ymin: 5760217 xmax: 443148.9 ymax: 5760567
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/mesum/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 5031 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 399930.6 ymin: 5790412 xmax: 400969.7 ymax: 5790950
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/telgte/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 3368 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 416135.1 ymin: 5761663 xmax: 416697.1 ymax: 5762477
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/brilon/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 3342 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 467305.7 ymin: 5695055 xmax: 467996.9 ymax: 5695593
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/oberhundem/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2471 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 442631.7 ymin: 5660096 xmax: 443309.5 ymax: 5660502
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/osterwald/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2806 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 455822 ymin: 5673761 xmax: 456483.2 ymax: 5674162
Projected CRS: ETRS89 / UTM zone 32N

Code

value_column <- "Neighbor_100"

Kullback-Leibler-Divergence

Code

kld_results_specie <- lfa::lfa_run_test_asymmetric(data,value_column,"specie",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_specie,"Kullback-Leibler-Divergence between species")

Table 22: Kullback-Leibler-Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute nearest-neighbor-100
	Beech	Oak	Pine	Spruce
Beech	0.000	0.194	0.082	0.89
Oak	0.183	0.000	0.063	0.67
Pine	0.084	0.069	0.000	0.86
Spruce	1.083	0.809	1.200	0.00

colMeans(kld_results_specie, na.rm = TRUE) |> mean()

[1] 0.3862841

Code

specie <- data[data$specie=="beech",]
kld_results_beech <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_beech,"Kullback-Leibler-Divergence between areas with beech")

Table 23: Kullback-Leibler-Divergence between the researched areas which have the dominante specie beech for the atrribute nearest-neighbor-100
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0.00	0.12	0.12
Billerbeck	0.14	0.00	0.40
Wuelfenrath	0.12	0.31	0.00

colMeans(kld_results_beech, na.rm = TRUE) |> mean()

[1] 0.1338066

Code

specie <- data[data$specie=="oak",]
kld_results_oak <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_oak,"Kullback-Leibler-Divergence between areas with oak")

Table 24: Kullback-Leibler-Divergence between the researched areas which have the dominante specie oak for the atrribute nearest-neighbor-100
	Hamm	Muenster	Rinkerode
Hamm	0.00	0.19	0.11
Muenster	0.20	0.00	0.06
Rinkerode	0.11	0.07	0.00

colMeans(kld_results_oak, na.rm = TRUE) |> mean()

[1] 0.08182597

Code

specie <- data[data$specie=="pine",]
kld_results_pine <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_pine,"Kullback-Leibler-Divergence between areas with pine")

Table 25: Kullback-Leibler-Divergence between the researched areas which have the dominante specie pine for the atrribute nearest-neighbor-100
	Greffen	Mesum	Telgte
Greffen	0.00	0.25	0.51
Mesum	0.20	0.00	0.25
Telgte	0.54	0.26	0.00

colMeans(kld_results_pine, na.rm = TRUE) |> mean()

[1] 0.22229

Code

specie <- data[data$specie=="spruce",]
kld_results_spruce <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_spruce,"Kullback-Leibler-Divergence between areas with spruce")

Table 26: Kullback-Leibler-Divergence between the researched areas which have the dominante specie spruce for the atrribute nearest-neighbor-100
	Brilon	Oberhundem	Osterwald
Brilon	0.000	0.05	0.23
Oberhundem	0.046	0.00	0.37
Osterwald	0.276	0.46	0.00

colMeans(kld_results_spruce, na.rm = TRUE) |> mean()

[1] 0.1591879

Jensen-Shannon Divergence

Code

jsd_results_specie <- lfa::lfa_run_test_symmetric(data,value_column,"specie",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_specie,"Jensen-Shannon Divergence between species")

Table 27: Jensen-Shannon Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute nearest-neighbor-100
	Beech	Oak	Pine	Spruce
Beech	0	0.38	0.14	1.27
Oak	NA	0.00	0.30	0.78
Pine	NA	NA	0.00	1.39
Spruce	NA	NA	NA	0.00

colMeans(jsd_results_specie, na.rm = TRUE) |> mean()

[1] 0.2997233

Code

specie <- data[data$specie=="beech",]
jsd_results_beech <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_beech,"Jensen-Shannon Divergence between areas with beech")

Table 28: Jensen-Shannon Divergence between the researched areas which have the dominante specie beech for the atrribute nearest-neighbor-100
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0	0.22	0.21
Billerbeck	NA	0.00	0.57
Wuelfenrath	NA	NA	0.00

colMeans(jsd_results_beech, na.rm = TRUE) |> mean()

[1] 0.124106

Code

specie <- data[data$specie=="oak",]
jsd_results_oak <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_oak,"Jensen-Shannon Divergence between areas with oak")

Table 29: Jensen-Shannon Divergence between the researched areas which have the dominante specie oak for the atrribute nearest-neighbor-100
	Hamm	Muenster	Rinkerode
Hamm	0	0.34	0.17
Muenster	NA	0.00	0.23
Rinkerode	NA	NA	0.00

colMeans(jsd_results_oak, na.rm = TRUE) |> mean()

[1] 0.1007612

Code

specie <- data[data$specie=="pine",]
jsd_results_pine <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_pine,"Jensen-Shannon Divergence between areas with pine")

Table 30: Jensen-Shannon Divergence between the researched areas which have the dominante specie pine for the atrribute nearest-neighbor-100
	Greffen	Mesum	Telgte
Greffen	0	0.45	0.86
Mesum	NA	0.00	0.50
Telgte	NA	NA	0.00

colMeans(jsd_results_pine, na.rm = TRUE) |> mean()

[1] 0.2265055

Code

specie <- data[data$specie=="spruce",]
jsd_results_spruce <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_spruce,"Jensen-Shannon Divergence between areas with spruce")

Table 31: Jensen-Shannon Divergence between the researched areas which have the dominante specie spruce for the atrribute nearest-neighbor-100
	Brilon	Oberhundem	Osterwald
Brilon	0	0.1	0.57
Oberhundem	NA	0.0	0.73
Osterwald	NA	NA	0.00

colMeans(jsd_results_spruce, na.rm = TRUE) |> mean()

[1] 0.1613747

Distribution of average nearest neighbor distances

Code

data <- lfa::lfa_combine_sf_obj(lfa::lfa_get_neighbor_paths(),lfa::lfa_get_all_areas())

Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/bielefeld_brackwede/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1443 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 466999.8 ymin: 5759839 xmax: 467617.1 ymax: 5760261
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/billerbeck/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1732 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 384890.8 ymin: 5761918 xmax: 385590.9 ymax: 5762478
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/beech/wuelfenrath/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2779 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 365546.3 ymin: 5683711 xmax: 366356.1 ymax: 5684321
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/hamm/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2441 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 420953.3 ymin: 5723884 xmax: 421596 ymax: 5724609
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/muenster/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1270 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 404615.6 ymin: 5752535 xmax: 405396.8 ymax: 5752971
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/oak/rinkerode/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 1643 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 408428.2 ymin: 5746021 xmax: 409014.8 ymax: 5746511
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/greffen/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 513 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 442816.1 ymin: 5760217 xmax: 443148.9 ymax: 5760567
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/mesum/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 5031 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 399930.6 ymin: 5790412 xmax: 400969.7 ymax: 5790950
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/pine/telgte/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 3368 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 416135.1 ymin: 5761663 xmax: 416697.1 ymax: 5762477
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/brilon/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 3342 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 467305.7 ymin: 5695055 xmax: 467996.9 ymax: 5695593
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/oberhundem/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2471 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 442631.7 ymin: 5660096 xmax: 443309.5 ymax: 5660502
Projected CRS: ETRS89 / UTM zone 32N
Reading layer `neighbours' from data source 
  `/home/jakob/gi-master/project-courses/lidar-forest-analysis/src/data/spruce/osterwald/neighbours.gpkg' 
  using driver `GPKG'
Simple feature collection with 2806 features and 102 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 455822 ymin: 5673761 xmax: 456483.2 ymax: 5674162
Projected CRS: ETRS89 / UTM zone 32N

Code

names <- paste0("Neighbor_",1:100)
data$avg = rowMeans(dplyr::select(as.data.frame(data),names))
value_column <- "avg"

Kullback-Leibler-Divergence

Code

kld_results_specie <- lfa::lfa_run_test_asymmetric(data,value_column,"specie",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_specie,"Kullback-Leibler-Divergence between species")

Table 32: Kullback-Leibler-Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute nearest-neighbor-avg
	Beech	Oak	Pine	Spruce
Beech	0.000	0.31	0.065	1.28
Oak	0.302	0.00	0.178	0.83
Pine	0.067	0.17	0.000	1.23
Spruce	1.660	0.92	1.869	0.00

colMeans(kld_results_specie, na.rm = TRUE) |> mean()

[1] 0.5552882

Code

specie <- data[data$specie=="beech",]
kld_results_beech <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_beech,"Kullback-Leibler-Divergence between areas with beech")

Table 33: Kullback-Leibler-Divergence between the researched areas which have the dominante specie beech for the atrribute nearest-neighbor-avg
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0.000	0.052	0.50
Billerbeck	0.052	0.000	0.91
Wuelfenrath	0.348	0.612	0.00

colMeans(kld_results_beech, na.rm = TRUE) |> mean()

[1] 0.27574

Code

specie <- data[data$specie=="oak",]
kld_results_oak <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_oak,"Kullback-Leibler-Divergence between areas with oak")

Table 34: Kullback-Leibler-Divergence between the researched areas which have the dominante specie oak for the atrribute nearest-neighbor-avg
	Hamm	Muenster	Rinkerode
Hamm	0.00	0.166	0.217
Muenster	0.16	0.000	0.031
Rinkerode	0.21	0.037	0.000

colMeans(kld_results_oak, na.rm = TRUE) |> mean()

[1] 0.09154318

Code

specie <- data[data$specie=="pine",]
kld_results_pine <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_pine,"Kullback-Leibler-Divergence between areas with pine")

Table 35: Kullback-Leibler-Divergence between the researched areas which have the dominante specie pine for the atrribute nearest-neighbor-avg
	Greffen	Mesum	Telgte
Greffen	0.00	0.17	0.29
Mesum	0.14	0.00	0.30
Telgte	0.26	0.32	0.00

colMeans(kld_results_pine, na.rm = TRUE) |> mean()

[1] 0.1637513

Code

specie <- data[data$specie=="spruce",]
kld_results_spruce <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_spruce,"Kullback-Leibler-Divergence between areas with spruce")

Table 36: Kullback-Leibler-Divergence between the researched areas which have the dominante specie spruce for the atrribute nearest-neighbor-avg
	Brilon	Oberhundem	Osterwald
Brilon	0.000	0.11	0.29
Oberhundem	0.097	0.00	0.59
Osterwald	0.341	0.75	0.00

colMeans(kld_results_spruce, na.rm = TRUE) |> mean()

[1] 0.2404004

Jensen-Shannon Divergence

Code

jsd_results_specie <- lfa::lfa_run_test_symmetric(data,value_column,"specie",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_specie,"Jensen-Shannon Divergence between species")

Table 37: Jensen-Shannon Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute nearest-neighbor-avg
	Beech	Oak	Pine	Spruce
Beech	0	0.73	0.19	2.6
Oak	NA	0.00	0.64	1.4
Pine	NA	NA	0.00	3.0
Spruce	NA	NA	NA	0.0

colMeans(jsd_results_specie, na.rm = TRUE) |> mean()

[1] 0.5999417

Code

specie <- data[data$specie=="beech",]
jsd_results_beech <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_beech,"Jensen-Shannon Divergence between areas with beech")

Table 38: Jensen-Shannon Divergence between the researched areas which have the dominante specie beech for the atrribute nearest-neighbor-avg
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0	0.14	1.0
Billerbeck	NA	0.00	1.7
Wuelfenrath	NA	NA	0.0

colMeans(jsd_results_beech, na.rm = TRUE) |> mean()

[1] 0.3215991

Code

specie <- data[data$specie=="oak",]
jsd_results_oak <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_oak,"Jensen-Shannon Divergence between areas with oak")

Table 39: Jensen-Shannon Divergence between the researched areas which have the dominante specie oak for the atrribute nearest-neighbor-avg
	Hamm	Muenster	Rinkerode
Hamm	0	0.41	0.53
Muenster	NA	0.00	0.26
Rinkerode	NA	NA	0.00

colMeans(jsd_results_oak, na.rm = TRUE) |> mean()

[1] 0.1558436

Code

specie <- data[data$specie=="pine",]
jsd_results_pine <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_pine,"Jensen-Shannon Divergence between areas with pine")

Table 40: Jensen-Shannon Divergence between the researched areas which have the dominante specie pine for the atrribute nearest-neighbor-avg
	Greffen	Mesum	Telgte
Greffen	0	0.44	0.76
Mesum	NA	0.00	0.89
Telgte	NA	NA	0.00

colMeans(jsd_results_pine, na.rm = TRUE) |> mean()

[1] 0.2560143

Code

specie <- data[data$specie=="spruce",]
jsd_results_spruce <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_spruce,"Jensen-Shannon Divergence between areas with spruce")

Table 41: Jensen-Shannon Divergence between the researched areas which have the dominante specie spruce for the atrribute nearest-neighbor-avg
	Brilon	Oberhundem	Osterwald
Brilon	0	0.32	1.1
Oberhundem	NA	0.00	1.8
Osterwald	NA	NA	0.0

colMeans(jsd_results_spruce, na.rm = TRUE) |> mean()

[1] 0.3713411

6.4.3 Distribution of the number of returns

Code

data <- sf::st_read("data/tree_properties.gpkg")
neighbors <- lfa::lfa_get_neighbor_paths() |> lfa::lfa_combine_sf_obj(lfa::lfa_get_all_areas())
data = sf::st_join(data,neighbors, join = sf::st_within)
value_column <- "number_of_returns"

Kullback-Leibler-Divergence

Code

kld_results_specie <- lfa::lfa_run_test_asymmetric(data,value_column,"specie",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_specie,"Kullback-Leibler-Divergence between species")

Table 42: Kullback-Leibler-Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute number-of-returns
	Beech	Oak	Pine	Spruce
Beech	0.000	0.083	0.57	0.049
Oak	0.051	0.000	0.84	0.059
Pine	0.432	0.833	0.00	0.526
Spruce	0.036	0.059	0.54	0.000

colMeans(kld_results_specie, na.rm = TRUE) |> mean()

[1] 0.2550987

Code

specie <- data[data$specie=="beech",]
kld_results_beech <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_beech,"Kullback-Leibler-Divergence between areas with beech")

Table 43: Kullback-Leibler-Divergence between the researched areas which have the dominante specie beech for the atrribute number-of-returns
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0.00	0.15	0.082
Billerbeck	0.21	0.00	0.136
Wuelfenrath	0.13	0.19	0.000

colMeans(kld_results_beech, na.rm = TRUE) |> mean()

[1] 0.09985223

Code

specie <- data[data$specie=="oak",]
kld_results_oak <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_oak,"Kullback-Leibler-Divergence between areas with oak")

Table 44: Kullback-Leibler-Divergence between the researched areas which have the dominante specie oak for the atrribute number-of-returns
	Hamm	Muenster	Rinkerode
Hamm	0.00	0.46	0.846
Muenster	0.41	0.00	0.077
Rinkerode	0.81	0.09	0.000

colMeans(kld_results_oak, na.rm = TRUE) |> mean()

[1] 0.2994815

Code

specie <- data[data$specie=="pine",]
kld_results_pine <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_pine,"Kullback-Leibler-Divergence between areas with pine")

Table 45: Kullback-Leibler-Divergence between the researched areas which have the dominante specie pine for the atrribute number-of-returns
	Greffen	Mesum	Telgte
Greffen	0.00	0.1444	0.1773
Mesum	0.14	0.0000	0.0047
Telgte	0.16	0.0045	0.0000

colMeans(kld_results_pine, na.rm = TRUE) |> mean()

[1] 0.07005788

Code

specie <- data[data$specie=="spruce",]
kld_results_spruce <- lfa::lfa_run_test_asymmetric(specie,value_column,"area",lfa::lfa_kld_from_vec)
lfa::lfa_generate_result_table_tests(kld_results_spruce,"Kullback-Leibler-Divergence between areas with spruce")

Table 46: Kullback-Leibler-Divergence between the researched areas which have the dominante specie spruce for the atrribute number-of-returns
	Brilon	Oberhundem	Osterwald
Brilon	0.000	0.04	0.034
Oberhundem	0.041	0.00	0.079
Osterwald	0.045	0.10	0.000

colMeans(kld_results_spruce, na.rm = TRUE) |> mean()

[1] 0.03779495

Jensen-Shannon Divergence

Code

jsd_results_specie <- lfa::lfa_run_test_symmetric(data,value_column,"specie",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_specie,"Jensen-Shannon Divergence between species")

Table 47: Jensen-Shannon Divergence between the researched species Beech, Oak, Pine and Spruce for the atrribute number-of-returns
	Beech	Oak	Pine	Spruce
Beech	0	3e-04	0.019	0.0014
Oak	NA	0e+00	0.021	0.0016
Pine	NA	NA	0.000	0.0143
Spruce	NA	NA	NA	0.0000

colMeans(jsd_results_specie, na.rm = TRUE) |> mean()

[1] 0.004419638

Code

specie <- data[data$specie=="beech",]
jsd_results_beech <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_beech,"Jensen-Shannon Divergence between areas with beech")

Table 48: Jensen-Shannon Divergence between the researched areas which have the dominante specie beech for the atrribute number-of-returns
	Bielefeld_brackwede	Billerbeck	Wuelfenrath
Bielefeld_brackwede	0	0.0035	0.00099
Billerbeck	NA	0.0000	0.00554
Wuelfenrath	NA	NA	0.00000

colMeans(jsd_results_beech, na.rm = TRUE) |> mean()

[1] 0.001314268

Code

specie <- data[data$specie=="oak",]
jsd_results_oak <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_oak,"Jensen-Shannon Divergence between areas with oak")

Table 49: Jensen-Shannon Divergence between the researched areas which have the dominante specie oak for the atrribute number-of-returns
	Hamm	Muenster	Rinkerode
Hamm	0	0.0068	0.0128
Muenster	NA	0.0000	0.0017
Rinkerode	NA	NA	0.0000

colMeans(jsd_results_oak, na.rm = TRUE) |> mean()

[1] 0.002747351

Code

specie <- data[data$specie=="pine",]
jsd_results_pine <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_pine,"Jensen-Shannon Divergence between areas with pine")

Table 50: Jensen-Shannon Divergence between the researched areas which have the dominante specie pine for the atrribute number-of-returns
	Greffen	Mesum	Telgte
Greffen	0	0.0035	0.00458
Mesum	NA	0.0000	0.00037
Telgte	NA	NA	0.00000

colMeans(jsd_results_pine, na.rm = TRUE) |> mean()

[1] 0.001130537

Code

specie <- data[data$specie=="spruce",]
jsd_results_spruce <- lfa::lfa_run_test_symmetric(specie,value_column,"area",lfa::lfa_jsd_from_vec)
lfa::lfa_generate_result_table_tests(jsd_results_spruce,"Jensen-Shannon Divergence between areas with spruce")

Table 51: Jensen-Shannon Divergence between the researched areas which have the dominante specie spruce for the atrribute number-of-returns
	Brilon	Oberhundem	Osterwald
Brilon	0	0.0069	0.005
Oberhundem	NA	0.0000	0.002
Osterwald	NA	NA	0.000

colMeans(jsd_results_spruce, na.rm = TRUE) |> mean()

[1] 0.001939104

6.5 Documentation

6.5.1 `lfa_calculate_patch_density`

Calculate patch density for specified areas based on detection data

Arguments

Argument	Description
`areas_location`	The file path to a shapefile containing spatial polygons representing the areas for which patch density needs to be calculated. Default is “research_areas.shp”.
`detections`	A data frame containing detection information, where each row represents a detection and includes the ‘area’ column specifying the corresponding area. Default is obtained using lfa_get_detections().

Description

This function calculates patch density for specified areas using detection data. It reads the spatial polygons from a shapefile, computes the area size for each patch, counts the number of detections in each patch, and calculates the patch density.

Value

A data frame with patch density information for each specified area. Columns include ‘name’ (area name), ‘geometry’ (polygon geometry), ‘area_size’ (patch area size), ‘detections’ (number of detections in the patch), and ‘density’ (computed patch density).

Examples

# Assuming you have a shapefile 'your_research_areas.shp' and detection data
# from lfa_get_detections()
density_data <- lfa_calculate_patch_density(areas_location = "your_research_areas.shp")
print(density_data)

Usage

lfa_calculate_patch_density(
  areas_location = "research_areas.shp",
  detections = lfa::lfa_get_detections()
)

6.5.2 `lfa_calculate_rf_metrics`

Calculate Precision and Recall Metrics for Random Forest Classification

Arguments

Argument	Description
`conf_matrix`	Confusion matrix obtained from a Random Forest classification.

Description

This function calculates precision and recall metrics for each class based on the provided confusion matrix from a Random Forest classification.

Details

The function calculates precision and recall metrics for each class based on the confusion matrix obtained from a Random Forest classification.

Value

A data frame containing precision and recall metrics for each class.

Examples

# Example confusion matrix from a Random Forest classification
rf_cm <- table(predicted = c("A", "B", "A", "B"), actual = c("A", "A", "B", "B"))
# Calculate precision and recall metrics
rf_metrics_df <- lfa_calculate_rf_metrics(rf_cm)

Usage

lfa_calculate_rf_metrics(conf_matrix)

6.5.3 `lfa_capitalize_first_char`

Capitalize First Character of a String

Arguments

Argument	Description
`input_string`	A single-character string to be processed.

Concept

String Manipulation

Description

This function takes a string as input and returns the same string with the first character capitalized. If the first character is already capitalized, the function does nothing. If the first character is not from the alphabet, an error is thrown.

Details

This function performs the following steps:

Checks if the input is a single-character string.
Verifies if the first character is from the alphabet (A-Z or a-z).
If the first character is not already capitalized, it capitalizes it.
Returns the modified string.

Keyword

alphabet

Note

This function is case-sensitive and assumes ASCII characters.

References

None

Value

A modified string with the first character capitalized if it is not already. If the first character is already capitalized, the original string is returned.

Examples

# Capitalize the first character of a string
capitalize_first_char("hello") # Returns "Hello"
capitalize_first_char("World") # Returns "World"

# Error example (non-alphabetic first character)
capitalize_first_char("123abc") # Throws an error

Usage

lfa_capitalize_first_char(input_string)

6.5.4 `lfa_check_flag`

Check if a flag is set, indicating the completion of a specific process.

Arguments

Argument	Description
`flag_name`	A character string specifying the name of the flag file. It should be a descriptive and unique identifier for the process being checked.

Description

This function checks for the existence of a hidden flag file at a specified location within the working directory. If the flag file is found, a message is printed, and the function returns TRUE to indicate that the associated processing step has already been completed. If the flag file is not found, the function returns FALSE , indicating that further processing can proceed.

Value

A logical value indicating whether the flag is set ( TRUE ) or not ( FALSE ).

Examples

# Check if the flag for a process named "data_processing" is set
lfa_check_flag("data_processing")

Usage

lfa_check_flag(flag_name)

6.5.5 `lfa_chm`

Create Canopy Height Model (CHM) from Lidar Data

Arguments

Argument	Description
`specie`	Character string indicating the species name.
`area`	Character string indicating the specific area or location.
`res`	Numeric value indicating the spatial resolution of the CHM. Default is 0.5.
`save_to_file`	Logical. If TRUE, the generated CHM will be saved to a GeoTIFF file. Default is TRUE.
`overwrite`	Logical. If TRUE, existing CHM file will be overwritten. Default is FALSE.
`...`	Additional arguments to be passed to the underlying functions, such as lidR::catalog_map.

Description

This function generates a Canopy Height Model (CHM) from Lidar data using the lidR package.

Details

The behavior of the function with different input parameters is as follows:

When a CHM file already exists at the specified path and overwrite is FALSE, the function loads the existing CHM and returns it.
If the CHM file does not exist or overwrite is TRUE, the function processes Lidar data using lfa_rasterize_chunk and creates a CHM.
The spatial resolution of the CHM can be controlled with the res parameter.
If save_to_file is TRUE, the generated CHM will be saved to a GeoTIFF file.

Value

A raster layer representing the Canopy Height Model (CHM).

Examples

# Generate CHM for a specific species and area
chm <- lfa_chm(specie = "ExampleSpecies", area = "ExampleArea", res = 1.0)

# Generate CHM and save it to a file
chm <- lfa_chm(specie = "ExampleSpecies", area = "ExampleArea", res = 1.0, save_to_file = TRUE)

Usage

lfa_chm(specie, area, res = 0.5, save_to_file = TRUE, overwrite = FALSE, ...)

6.5.6 `lfa_combine_sf_obj`

Combine Spatial Feature Objects from Multiple GeoPackage Files

Arguments

Argument	Description
`paths`	A character vector containing file paths to GeoPackage files with neighbor information.
`area_infos`	A data frame or list containing information about the corresponding detection areas, including “area” and “specie” columns.

Description

This function reads spatial feature objects (sf) from multiple GeoPackage files and combines them into a single sf object. Each GeoPackage file is assumed to contain neighbor information for a specific detection area, and the resulting sf object includes additional columns indicating the corresponding area and species information.

Value

A combined sf object with additional columns for area and specie information.

Examples

# Assuming paths and area_infos are defined
combined_sf <- lfa_combine_sf_obj(paths, area_infos)

# Print the combined sf object
print(combined_sf)

Usage

lfa_combine_sf_obj(paths, area_infos)

6.5.7 `lfa_count_returns_all_areas`

Count tree returns for all species and areas, returning a consolidated data frame.

Description

This function iterates through all species and areas obtained from the function lfa_get_all_areas . For each combination of species and area, it reads the corresponding area as a catalog, counts the returns per tree using lfa_count_returns_per_tree , and consolidates the results into a data frame. The resulting data frame includes columns for the species, area, and return counts per tree.

Keyword

counting

Value

A data frame with columns for species, area, and return counts per tree.

Examples

# Count tree returns for all species and areas
returns_counts <- lfa_count_returns_all_areas()

Usage

lfa_count_returns_all_areas()

6.5.8 `lfa_count_returns_per_tree`

Count returns per tree for a given lidR catalog.

Arguments

Argument	Description
`ctg`	A lidR catalog object containing LAS files to be processed.

Description

This function takes a lidR catalog as input and counts the returns per tree. It uses the lidR package to read LAS files from the catalog and performs the counting operation on each tree. The result is a data frame containing the counts of returns for each unique tree ID within the lidR catalog.

Keyword

counting

Value

A data frame with columns for tree ID and the corresponding count of returns.

Examples

# Count returns per tree for a lidR catalog
ctg <- lfa_read_area_as_catalog("SpeciesA", "Area1")
returns_counts_per_tree <- lfa_count_returns_per_tree(ctg)

Usage

lfa_count_returns_per_tree(ctg)

6.5.9 `lfa_create_boxplot`

Create a box plot from a data frame

Arguments

Argument	Description
`data`	A data frame containing the data.
`value_column`	The name of the column containing the values for the box plot.
`category_column1`	The name of the column containing the first categorical variable.
`category_column2`	The name of the column containing the second categorical variable.
`title`	An optional title for the plot. If not provided, a default title is generated based on the data frame name.

Description

This function generates a box plot using ggplot2 based on the specified data frame and columns.

Details

The function creates a box plot where the x-axis is based on the second categorical variable, the y-axis is based on the specified value column, and the box plots are colored based on the first categorical variable. The grouping of box plots is done based on the unique values in the second categorical variable.

Value

A ggplot object representing the box plot.

Examples

# Assuming you have a data frame 'your_data' with columns 'value', 'category1', and 'category2'
create_boxplot(your_data, "value", "category1", "category2")

Usage

lfa_create_boxplot(
  data,
  value_column,
  category_column1,
  category_column2,
  title = NULL
)

6.5.10 `lfa_create_density_plots`

Create density plots for groups in a data frame

Arguments

Argument	Description
`data`	A data frame containing the data.
`value_column`	The name of the column containing the values for the density plot.
`category_column1`	The name of the column containing the categorical variable for grouping.
`category_column2`	The name of the column containing the categorical variable for arranging plots.
`title`	An optional title for the plot. If not provided, a default title is generated based on the data frame name.
`xlims`	Optional limits for the x-axis. Should be a numeric vector with two elements (lower and upper bounds).
`ylims`	Optional limits for the y-axis. Should be a numeric vector with two elements (lower and upper bounds).

Description

This function generates density plots using ggplot2 based on the specified data frame and columns.

Details

The function creates density plots where the x-axis is based on the specified value column, and the density plots are colored based on the first categorical variable. The arrangement of plots is done based on the unique values in the second categorical variable. The plots are arranged in a 2x2 grid.

Value

A ggplot object representing the density plots arranged in a 2x2 grid.

Examples

# Assuming you have a data frame 'your_data' with columns 'value', 'category1', and 'category2'
create_density_plots(your_data, "value", "category1", "category2", title = "Density Plots", xlims = c(0, 10), ylims = c(0, 0.5))

Usage

lfa_create_density_plots(
  data,
  value_column,
  category_column1 = "area",
  category_column2 = "specie",
  title = NULL,
  xlims = NULL,
  ylims = NULL
)

6.5.11 `lfa_create_grouped_bar_plot`

Create a barplot using ggplot2

Arguments

Argument	Description
`xlab`	Label for x-Axis.
`ylab`	Label for y-Axis.
`title`	Title of the plot.
`df`	A data frame containing the relevant columns for the barplot.
`value_column`	The column containing the values to be plotted.
`label_column`	The column used for labeling the bars on the x-axis. Default is “name”.
`grouping_column`	The column used for grouping the bars. Default is “species”.

Description

This function generates a barplot using ggplot2 based on the specified data frame columns. The barplot displays the values from the specified column, grouped by another column. The grouping can be further differentiated by color if desired.

Value

A ggplot2 barplot.

Examples

# Assuming you have a data frame 'your_data_frame' with columns "name", "species", and "value"
lfa_create_barplot(your_data_frame, value_column = "value", label_column = "name", grouping_column = "species")

Usage

lfa_create_grouped_bar_plot(
  data,
  grouping_var,
  value_col,
  label_col,
  xlab = "Name of Patch",
  ylab = "Density",
  title = "Tree density across the different patches, grouped by specie"
)

6.5.12 `lfa_create_neighbor_mean_curves`

Create neighbor mean curves for specified areas

Arguments

Argument	Description
`neighbors`	A data frame containing information about neighbors, where each column represents a specific neighbor, and each row corresponds to an area.
`use_avg`	Logical. If TRUE, the function computes average curves across all neighbors. If FALSE, it computes curves for individual neighbors.

Description

This function generates mean curves for a specified set of areas based on neighbor data. The user can choose to compute mean curves for individual neighbors or averages across neighbors.

Value

A data frame with mean curves for each specified area. Columns represent areas, and rows represent index values.

Examples

# Assuming you have a data frame 'your_neighbors_data' with neighbor information
mean_curves <- lfa_create_neighbor_mean_curves(your_neighbors_data, use_avg = TRUE)
print(mean_curves)

Usage

lfa_create_neighbor_mean_curves(neighbors, use_avg = FALSE)

6.5.13 `lfa_create_plot_per_area`

Create a line plot per area with one color per specie

Arguments

Argument	Description
`data`	A data frame with numeric columns and a column named ‘specie’ for species information.

Description

This function takes a data frame containing numeric columns and creates a line plot using ggplot2. Each line in the plot represents a different area, with one color per specie.

Value

A ggplot2 line plot.

Examples

data <- data.frame(
specie = rep(c("Species1", "Species2", "Species3"), each = 10),
column1 = rnorm(30),
column2 = rnorm(30),
column3 = rnorm(30)
)
lfa_create_plot_per_area(data)

Usage

lfa_create_plot_per_area(data)

6.5.14 `lfa_create_ppp_from_area`

Create a point pattern from tree detections in a specified area for a given species.

Arguments

Argument	Description
`species_identifier`	A character string specifying the target species for which the point pattern is to be generated.
`area_identifier`	A character string specifying the target area for which the point pattern is to be generated.

Description

This function generates a point pattern from tree detections for a specific species within a defined area. It filters the detections using the provided species_identifier and area_identifier parameters. The area is defined by a shapefile named “research_areas.shp,” and the resulting point pattern is created within the specified area.

Keyword

data

Value

A point pattern representing tree detections for the specified species within the defined area.

Examples

lfa_create_ppp_from_area(species_identifier = "SpeciesA", area_identifier = "Area1")

# Create a point pattern for a specific species in a given area
pp <- lfa_create_ppp_from_area(species_identifier = "SpeciesA", area_identifier = "Area1")

Usage

lfa_create_ppp_from_area(species_identifier, area_identifier)

6.5.15 `lfa_create_stacked_distributions_plot`

Create a stacked distribution plot for tree detections, visualizing the distribution of a specified variable on the x-axis, differentiated by another variable.

Arguments

Argument	Description
`trees`	A data frame containing tree detection data.
`x_value`	A character string specifying the column name used for finding the values on the x-axis of the histogram.
`fill_value`	A character string specifying the column name by which the data are differentiated in the plot.
`bin`	An integer specifying the number of bins for the histogram. Default is 100.
`ylab`	A character string specifying the y-axis label. Default is “Amount trees.”
`xlim`	A numeric vector of length 2 specifying the x-axis limits. Default is c(0, 100).
`ylim`	A numeric vector of length 2 specifying the y-axis limits. Default is c(0, 1000).
`title`	The title of the plot.

Description

This function generates a stacked distribution plot using the ggplot2 package, providing a visual representation of the distribution of a specified variable ( x_value ) on the x-axis, with differentiation based on another variable ( fill_value ). The data for the plot are derived from the provided trees data frame.

Keyword

data

Value

A ggplot object representing the stacked distribution plot.

Examples

# Create a stacked distribution plot for variable "Z," differentiated by "area"
trees <- lfa_get_detections()
lfa_create_stacked_distributions_plot(trees, "Z", "area")

Usage

lfa_create_stacked_distributions_plot(
  trees,
  x_value,
  fill_value,
  bin = 100,
  ylab = "Amount trees",
  xlim = c(0, 100),
  ylim = c(0, 1000),
  title =
    "Histograms of height distributions between species 'beech', 'oak', 'pine' and 'spruce' divided by the different areas of Interest"
)

6.5.16 `lfa_create_stacked_histogram`

Create a stacked histogram for tree detections, summing up the values for each species.

Arguments

Argument	Description
`trees`	A data frame containing tree detection data.
`x_value`	A character string specifying the column name used for finding the values on the x-axis of the histogram.
`fill_value`	A character string specifying the column name by which the data are differentiated in the plot.
`bin`	An integer specifying the number of bins for the histogram. Default is 30.
`ylab`	A character string specifying the y-axis label. Default is “Frequency.”
`xlim`	A numeric vector of length 2 specifying the x-axis limits. Default is c(0, 100).
`ylim`	A numeric vector of length 2 specifying the y-axis limits. Default is NULL.

Description

This function generates a stacked histogram using the ggplot2 package, summing up the values for each species and visualizing the distribution of a specified variable ( x_value ) on the x-axis, differentiated by another variable ( fill_value ). The data for the plot are derived from the provided trees data frame.

Keyword

data

Value

A ggplot object representing the stacked histogram.

Examples

# Create a stacked histogram for variable "Z," differentiated by "area"
trees <- lfa_get_detections()
lfa_create_stacked_histogram(trees, "Z", "area")

Usage

lfa_create_stacked_histogram(
  trees,
  x_value,
  fill_value,
  bin = 30,
  ylab = "Frequency",
  xlim = c(0, 100),
  ylim = NULL
)

6.5.17 `lfa_create_tile_location_objects`

Create tile location objects

Author

Jakob Danel

Description

This function traverses a directory structure to find LAZ files and creates tile location objects for each file. The function looks into the the data directory of the repository/working directory. It then creates tile_location objects based on the folder structure. The folder structure should not be touched by hand, but created by lfa_init_data_structure() which builds the structure based on a shape file.

Value

A vector containing tile location objects.

Examples

lfa_create_tile_location_objects()

lfa_create_tile_location_objects()

Usage

lfa_create_tile_location_objects()

6.5.18 `lfa_detection`

Perform tree detection on a lidar catalog and optionally save the results to a file.

Arguments

Argument	Description
`catalog`	A lidar catalog containing point cloud data. If set to NULL, the function attempts to read the catalog from the specified tile location.
`tile_location`	An object specifying the location of the lidar tile. If catalog is NULL, the function attempts to read the catalog from this tile location.
`write_to_file`	A logical value indicating whether to save the detected tree information to a file. Default is TRUE.

Description

This function utilizes lidar data to detect trees within a specified catalog. The detected tree information can be optionally saved to a file in the GeoPackage format. The function uses parallel processing to enhance efficiency.

Value

A sf style data frame containing information about the detected trees.

Examples

# Perform tree detection on a catalog and save the results to a file
lfa_detection(catalog = my_catalog, tile_location = my_tile_location, write_to_file = TRUE)

Usage

lfa_detection(catalog, tile_location, write_to_file = TRUE)

6.5.19 `lfa_download_areas`

Download areas based on spatial features

Arguments

Argument	Description
`sf_areas`	Spatial features representing areas to be downloaded. It must include columns like “species” “name” See details for more information.

Author

Jakob Danel

Description

This function initiates the data structure and downloads areas based on spatial features.

Details

The input data frame, sf_areas , must have the following columns:

“species”: The species associated with the area.
“name”: The name of the area.

The function uses the lfa_init_data_structure function to set up the data structure and then iterates through the rows of sf_areas to download each specified area.

Value

None

Examples

lfa_download_areas(sf_areas)


# Example spatial features data frame
sf_areas <- data.frame(
species = c("SpeciesA", "SpeciesB"),
name = c("Area1", "Area2"),
# Must include also other attributes specialized to sf objects
# such as geometry, for processing of the download
)

lfa_download_areas(sf_areas)

Usage

lfa_download_areas(sf_areas)

6.5.20 `lfa_download`

Download an las file from the state NRW from a specific location

Arguments

Argument	Description
`species`	The species of the tree which is observed at this location
`name`	The name of the area that is observed
`location`	An sf object, which holds the location information for the area where the tile should be downloaded from.

Description

It will download the file and save it to data/ list(list(“html”), list(list(“”))) / list(list(“html”), list(list(“”))) with the name of the tile

Value

The LASCatalog object of the downloaded file

Usage

lfa_download(species, name, location)

6.5.21 `lfa_find_n_nearest_trees`

Find n Nearest Trees

Arguments

Argument	Description
`trees`	A sf object containing tree coordinates.
`n`	The number of nearest trees to find for each tree (default is 100).

Description

This function calculates the distances to the n nearest trees for each tree in the input dataset.

Value

A data frame with additional columns representing the distances to the n nearest trees.

Examples

# Load tree data using lfa_get_detections() (not provided)
tree_data <- lfa_get_detections()

# Filter tree data for a specific species and area
tree_data = tree_data[tree_data$specie == "pine" & tree_data$area == "greffen", ]

# Find the 100 nearest trees for each tree in the filtered dataset
tree_data <- lfa_find_n_nearest_trees(tree_data)

Usage

lfa_find_n_nearest_trees(trees, n = 100)

6.5.22 `lfa_generate_result_table_tests`

Generate Result Table for Tests

Arguments

Argument	Description
`table`	A data frame representing the result table.

Description

This function generates a result table for tests using the knitr::kable function.

Details

This function uses the knitr::kable function to create a formatted table, making it suitable for HTML output. The input table is expected to be a data frame with test results, and the resulting table will have capitalized row and column names with lines between columns and rows.

Value

A formatted table suitable for HTML output with lines between columns and rows.

Examples

# Generate a result table for tests
result_table <- data.frame(
Test1 = c(0.05, 0.10, 0.03),
Test2 = c(0.02, 0.08, 0.01),
Test3 = c(0.08, 0.12, 0.05)
)
formatted_table <- lfa_generate_result_table_tests(result_table)
print(formatted_table)

Usage

lfa_generate_result_table_tests(table, caption = "Table Caption")

6.5.23 `lfa_get_all_areas`

Retrieve a data frame containing all species and corresponding areas.

Description

This function scans the “data” directory within the current working directory to obtain a list of species. It then iterates through each species to retrieve the list of areas associated with that species. The resulting data frame contains two columns: “specie” representing the species and “area” representing the corresponding area.

Keyword

data

Value

A data frame with columns “specie” and “area” containing information about all species and their associated areas.

Examples

# Retrieve a data frame with information about all species and areas
all_areas_df <- lfa_get_all_areas()

Usage

lfa_get_all_areas()

6.5.24 `lfa_get_detection_area`

Get Detection for an area

Arguments

Argument	Description
`species`	A character string specifying the target species.
`name`	A character string specifying the name of the tile.

Description

Retrieves the tree detection information for a specified species and tile.

Details

This function reads tree detection data from geopackage files within the specified tile location for a given species. It then combines the data into a single SF data frame and returns it. The function assumes that the tree detection files follow a naming convention with the pattern “_detection.gpkg”.

Keyword

spatial

References

This function is part of the LiDAR Forest Analysis (LFA) package.

Value

A Simple Features (SF) data frame containing tree detection information for the specified species and tile.

Examples

# Retrieve tree detection data for species "example_species" in tile "example_tile"
trees_data <- lfa_get_detection_tile_location("example_species", "example_tile")

# Example usage:
trees_data <- lfa_get_detection_tile_location("example_species", "example_tile")

# No trees found scenario:
empty_data <- lfa_get_detection_tile_location("nonexistent_species", "nonexistent_tile")
# The result will be an empty data frame if no trees are found for the specified species and tile.

# Error handling:
# In case of invalid inputs, the function may throw errors. Ensure correct species and tile names are provided.

Usage

lfa_get_detection_area(species, name)

6.5.25 `lfa_get_detections_species`

Retrieve detections for a specific species.

Arguments

Argument	Description
`species`	A character string specifying the target species.

Description

This function retrieves detection data for a given species from multiple areas.

Details

The function looks for detection data in the “data” directory for the specified species. It then iterates through each subdirectory (representing different areas) and consolidates the detection data into a single data frame.

Value

A data frame containing detection information for the specified species in different areas.

Examples

# Example usage:
detections_data <- lfa_get_detections_species("example_species")

Usage

lfa_get_detections_species(species)

6.5.26 `lfa_get_detections`

Retrieve aggregated detection data for multiple species.

Concept

data retrieval functions

Description

This function obtains aggregated detection data for multiple species by iterating through the list of species obtained from lfa_get_species . For each species, it calls lfa_get_detections_species to retrieve the corresponding detection data and aggregates the results into a single data frame. The resulting data frame includes columns for the species, tree detection data, and the area in which the detections occurred.

Keyword

aggregation

Value

A data frame containing aggregated detection data for multiple species.

Examples

lfa_get_detections()

# Retrieve aggregated detection data for multiple species
detections_data <- lfa_get_detections()

Usage

lfa_get_detections()

6.5.27 `lfa_get_flag_path`

Get the path to a flag file indicating the completion of a specific process.

Arguments

Argument	Description
`flag_name`	A character string specifying the name of the flag file. It should be a descriptive and unique identifier for the process being flagged.

Description

This function constructs and returns the path to a hidden flag file, which serves as an indicator that a particular processing step has been completed. The flag file is created in a designated location within the working directory.

Value

A character string representing the absolute path to the hidden flag file.

Examples

# Get the flag path for a process named "data_processing"
lfa_get_flag_path("data_processing")

Usage

lfa_get_flag_path(flag_name)

6.5.28 `lfa_get_neighbor_paths`

Get Paths to Neighbor GeoPackage Files

Description

This function retrieves the file paths to GeoPackage files containing neighbor information for each detection area. The GeoPackage files are assumed to be named “neighbours.gpkg” and organized in a directory structure under the “data” folder.

Value

A character vector containing file paths to GeoPackage files for each detection area’s neighbors.

Examples

# Get paths to neighbor GeoPackage files for all areas
paths <- lfa_get_neighbor_paths()

# Print the obtained file paths
print(paths)

Usage

lfa_get_neighbor_paths()

6.5.29 `lfa_get_species`

Get a list of species from the data directory.

Concept

data retrieval functions

Description

This function retrieves a list of species by scanning the “data” directory located in the current working directory.

Keyword

data

References

This function relies on the list.dirs function for directory listing.

Value

A character vector containing the names of species found in the “data” directory.

Examples

# Retrieve the list of species
species_list <- lfa_get_species()

Usage

lfa_get_species()

6.5.30 `lfa_ground_correction`

Correct the point clouds for correct ground imagery

Arguments

Argument	Description
`ctg`	An LASCatalog object. If not null, it will perform the actions on this object, if NULL inferring the catalog from the tile_location
`tile_location`	A tile_location type object holding the information about the location of the cataog. This is used to save the catalog after processing too.

Author

Jakob Danel

Description

This function is needed to correct the Z value of the point cloud, relative to the real ground height. After using this function to your catalog, the Z values can be seen as the real elevation about the ground. At the moment the function uses the tin() function from the lidr package. NOTE : The operation is inplace and can not be reverted, the old values of the point cloud will be deleted!

Value

A catalog with the corrected z values. The catalog is always stored at tile_location and holding only the transformed values.

Usage

lfa_ground_correction(ctg, tile_location)

6.5.31 `lfa_init_data_structure`

Initialize data structure for species and areas

Arguments

Argument	Description
`sf_species`	A data frame with information about species and associated areas.

Description

This function initializes the data structure for storing species and associated areas.

Details

The input data frame, sf_species , should have at least the following columns:

“species”: The names of the species for which the data structure needs to be initialized.
“name”: The names of the associated areas.

The function creates directories based on the species and area information provided in the sf_species data frame. It checks whether the directories already exist and creates them if they don’t.

Value

None

Examples

# Example species data frame
sf_species <- data.frame(
species = c("SpeciesA", "SpeciesB"),
name = c("Area1", "Area2"),
# Other necessary columns
)

lfa_init_data_structure(sf_species)

# Example species data frame
sf_species <- data.frame(
species = c("SpeciesA", "SpeciesB"),
name = c("Area1", "Area2"),
# Other necessary columns
)

lfa_init_data_structure(sf_species)

Usage

lfa_init_data_structure(sf_species)

6.5.32 `lfa_init`

Initialize LFA (LiDAR forest analysis) data processing

Arguments

Argument	Description
`sf_file`	A character string specifying the path to the shapefile containing spatial features of research areas.

Description

This function initializes the LFA data processing by reading a shapefile containing spatial features of research areas, downloading the specified areas, and creating tile location objects for each area.

Details

This function reads a shapefile ( sf_file ) using the sf package, which should contain information about research areas. It then calls the lfa_download_areas function to download the specified areas and lfa_create_tile_location_objects to create tile location objects based on Lidar data files in those areas. The shapefile MUST follow the following requirements:

Each geometry must be a single object of type polygon
Each entry must have the following attributes:
species: A string describing the tree species of the area.
name: A string describing the location of the area.

Value

A vector containing tile location objects.

Examples

# Initialize LFA processing with the default shapefile
lfa_init()

# Initialize LFA processing with a custom shapefile
lfa_init("custom_areas.shp")

# Example usage with the default shapefile
lfa_init()

# Example usage with a custom shapefile
lfa_init("custom_areas.shp")

Usage

lfa_init(sf_file = "research_areas.shp")

6.5.33 `lfa_intersect_areas`

Intersect Lidar Catalog with Spatial Features

Arguments

Argument	Description
`ctg`	A LAScatalog object representing the Lidar data to be processed.
`tile_location`	A tile location object representing the specific area of interest.
`areas_sf`	Spatial features defining areas.

Description

This function intersects a Lidar catalog with a specific area defined by spatial features.

Details

The function intersects the Lidar catalog specified by ctg with a specific area defined by the tile_location object and areas_sf . It removes points outside the specified area and returns a modified LAScatalog object.

The specified area is identified based on the species and name attributes in the tile_location object. If a matching area is not found in areas_sf , the function stops with an error.

The function then transforms the spatial reference of the identified area to match that of the Lidar catalog using sf::st_transform .

The processing is applied to each chunk in the catalog using the identify_area function, which merges spatial information and filters out points that are not classified as inside the identified area. After processing, the function writes the modified LAS files back to the original file locations, removing points outside the specified area.

If an error occurs during the processing of a chunk, a warning is issued, and the function continues processing the next chunks. If no points are found after filtering, a warning is issued, and NULL is returned.

Value

A modified LAScatalog object with points outside the specified area removed.

Examples

# Example usage
lfa_intersect_areas(ctg, tile_location, areas_sf)

# Example usage
lfa_intersect_areas(ctg, tile_location, areas_sf)

Usage

lfa_intersect_areas(ctg, tile_location, areas_sf)

6.5.34 `lfa_jsd_from_vec`

Compute Jensen-Shannon Divergence from Vectors

Arguments

Argument	Description
`x`	A numeric vector.
`y`	A numeric vector.

Description

This function calculates the Jensen-Shannon Divergence (JSD) between two vectors.

Value

Jensen-Shannon Divergence between the density distributions of x and y.

Examples

x <- rnorm(100)
y <- rnorm(100, mean = 2)
lfa_jsd_from_vec(x, y)

Usage

lfa_jsd_from_vec(x, y)

6.5.35 `lfa_jsd`

Jensen-Shannon Divergence Calculation

Arguments

Argument	Description
`p`	A numeric vector representing the probability distribution P.
`q`	A numeric vector representing the probability distribution Q.
`epsilon`	A small positive constant added to both P and Q to avoid logarithm of zero. Default is 1e-10.

Description

This function calculates the Jensen-Shannon Divergence (JSD) between two probability distributions P and Q.

Details

The JSD is computed using the Kullback-Leibler Divergence (KLD) as follows: sum((p * log((p + epsilon) / (m + epsilon)) + q * log((q + epsilon) / (m + epsilon))) / 2) where m = (p + q) / 2 .

Value

A numeric value representing the Jensen-Shannon Divergence between P and Q.

Examples

# Calculate JSD between two probability distributions
p_distribution <- c(0.2, 0.3, 0.5)
q_distribution <- c(0.1, 0, 0.9)
jsd_result <- jsd(p_distribution, q_distribution)
print(jsd_result)

Usage

lfa_jsd(p, q, epsilon = 1e-10)

6.5.36 `lfa_kld_from_vec`

Compute Kullback-Leibler Divergence from Vectors

Arguments

Argument	Description
`x`	A numeric vector.
`y`	A numeric vector.

Description

This function calculates the Kullback-Leibler Divergence (KLD) between two vectors.

Value

Kullback-Leibler Divergence between the density distributions of x and y.

Examples

x <- rnorm(100)
y <- rnorm(100, mean = 2)
lfa_kld_from_vec(x, y)

Usage

lfa_kld_from_vec(x, y)

6.5.37 `lfa_kld`

Kullback-Leibler Divergence Calculation

Arguments

Argument	Description
`p`	A numeric vector representing the probability distribution P.
`q`	A numeric vector representing the probability distribution Q.
`epsilon`	A small positive constant added to both P and Q to avoid logarithm of zero. Default is 1e-10.

Description

This function calculates the Kullback-Leibler Divergence (KLD) between two probability distributions P and Q.

Details

The KLD is computed using the formula: sum(p * log((p + epsilon) / (q + epsilon))) This avoids issues when the denominator (Q) contains zero probabilities.

Value

A numeric value representing the Kullback-Leibler Divergence between P and Q.

Examples

# Calculate KLD between two probability distributions
p_distribution <- c(0.2, 0.3, 0.5)
q_distribution <- c(0.1, 0, 0.9)
kld_result <- kld(p_distribution, q_distribution)
print(kld_result)

Usage

lfa_kld(p, q, epsilon = 1e-10)

6.5.38 `lfa_ks_test`

Kolmogorov-Smirnov Test Wrapper Function

Arguments

Argument	Description
`x`	A numeric vector representing the first sample.
`y`	A numeric vector representing the second sample.
`output_variable`	A character string specifying the output variable to extract from the ks.test result. Default is “p.value”. Other possible values include “statistic” and “alternative”.
`...`	Additional arguments to be passed to the ks.test function.

Description

This function serves as a wrapper for the Kolmogorov-Smirnov (KS) test between two samples.

Details

The function uses the ks.test function to perform a two-sample KS test and returns the specified output variable. The default output variable is the p-value. Other possible output variables include “statistic” and “alternative”.

Value

A numeric value representing the specified output variable from the KS test result.

Examples

# Perform KS test and extract the p-value
result <- lfa_ks_test(sample1, sample2)
print(result)

# Perform KS test and extract the test statistic
result_statistic <- lfa_ks_test(sample1, sample2, output_variable = "statistic")
print(result_statistic)

Usage

lfa_ks_test(x, y, output_variable = "p.value", ...)

6.5.39 `lfa_load_ctg_if_not_present`

Loading the catalog if it is not present

Arguments

Argument	Description
`ctg`	Catalog object. Can be NULL
`tile_location`	The location to look for the catalog tiles, if their are not present

Description

This function checks if the catalog is NULL . If it is it will load the catalog from the tile_location

Value

The provided ctg object if not null, else the catalog for the tiles of the tile_location.

Usage

lfa_load_ctg_if_not_present(ctg, tile_location)

6.5.40 `lfa_map_tile_locations`

Map Function Over Tile Locations

Arguments

Argument	Description
`tile_locations`	A list of tile location objects.
`map_function`	The mapping function to be applied to each tile location.
`...`	Additional arguments to be passed to the mapping function.

Description

This function applies a specified mapping function to each tile location in a list.

Details

This function iterates over each tile location in the provided list ( tile_locations ) and applies the specified mapping function ( map_function ) to each tile location. The mapping function should accept a tile location object as its first argument, and additional arguments can be passed using the ellipsis ( ... ) syntax.

This function is useful for performing operations on multiple tile locations concurrently, such as loading Lidar data, processing areas, or other tasks that involve tile locations.

Value

None

Examples

# Example usage
lfa_map_tile_locations(tile_locations, my_mapping_function, param1 = "value")

# Example usage
lfa_map_tile_locations(tile_locations, my_mapping_function, param1 = "value")

Usage

lfa_map_tile_locations(tile_locations, map_function, check_flag = NULL, ...)

6.5.41 `lfa_merge_and_save`

Merge and Save Text Files in a Directory

Arguments

Argument	Description
`input_directory`	The path to the input directory containing text files.
`output_name`	The name for the output file where the merged content will be saved.

Description

This function takes an input directory and an output name as arguments. It merges the textual content of all files in the specified directory into a single string, with each file’s content separated by a newline character. The merged content is then saved into a file named after the output name in the same directory. After the merging is complete, all input files are deleted.

Details

This function reads the content of each text file in the specified input directory and concatenates them into a single string. Each file’s content is separated by a newline character. The merged content is then saved into a file named after the output name in the same directory. Finally, all input files are deleted from the directory.

Value

This function does not explicitly return any value. It prints a message indicating the successful completion of the merging and saving process.

Examples

# Merge text files in the "data_files" directory and save the result in "merged_output"
lfa_merge_and_save("data_files", "merged_output")

# Merge text files in the "data_files" directory and save the result in "merged_output"
lfa_merge_and_save("data_files", "merged_output")

Usage

lfa_merge_and_save(input_directory, output_name)

6.5.42 `lfa_plot_confusion_matrix`

Plot Confusion Matrix

Arguments

Argument	Description
`conf_matrix`	Confusion matrix, typically obtained from classification evaluation.

Description

This function generates a heatmap plot of a confusion matrix using ggplot2.

Details

The function takes a confusion matrix as input and generates a heatmap plot using ggplot2. The plot represents the relationship between the predicted and actual classes, with cell colors indicating the frequency of each combination. Additionally, the plot includes labels for accuracy and kappa statistics based on the confusion matrix.

Value

A ggplot object representing the confusion matrix heatmap plot.

Examples

# Example confusion matrix
cm <- table(predicted = c("A", "B", "A", "B"), actual = c("A", "A", "B", "B"))
# Plot confusion matrix
lfa_plot_confusion_matrix(cm)

Usage

lfa_plot_confusion_matrix(conf_matrix)

6.5.43 `lfa_precision_per_class`

Calculate Precision per Class from Confusion Matrix

Arguments

Argument	Description
`confusion_matrix`	Confusion matrix obtained from a classification evaluation.

Description

This function calculates precision for each class based on the provided confusion matrix.

Details

Precision is a measure of the accuracy of the positive predictions for a specific class. It is calculated as the ratio of true positives to the sum of true positives and false positives.

Value

A numeric vector representing precision for each class.

Examples

# Example confusion matrix
cm <- table(predicted = c("A", "B", "A", "B"), actual = c("A", "A", "B", "B"))
# Calculate precision per class
precision_vector <- lfa_precision_per_class(cm)

Usage

lfa_precision_per_class(confusion_matrix)

6.5.44 `lfa_random_forest`

Random Forest Classifier with Leave-One-Out Cross-Validation

Arguments

Argument	Description
`tree_data`	A data frame containing the tree data, including the response variable (“specie”) and predictor variables.
`excluded_input_columns`	A character vector specifying columns to be excluded from predictor variables.
`response_variable`	The response variable to be predicted (default is “specie”).
`seed`	An integer to set the seed for reproducibility (default is 123).
`...`	Additional parameters to be passed to the randomForest function.

Description

This function performs a random forest classification using leave-one-out cross-validation for each area in the input tree data. It returns a list containing various results, including predicted species, confusion matrix, accuracy, and the formula used for modeling.

Value

A list containing the following elements:

predicted_species_absolute : A data frame with observed and predicted species for each area.
predicted_species_relative : A data frame wit the relative precictions per speices and areas, normalized by the total predictions in each area.
confusion_matrix : A confusion matrix showing the counts of predicted vs. observed species.
accuracy : The accuracy of the model, calculated as the sum of diagonal elements in the confusion matrix divided by the total count.
formula : The formula used for modeling.

Examples

# Assuming tree_data is defined
results <- lfa_random_forest(tree_data, excluded_input_columns = c("column1", "column2"))

# Print the list of results
print(results)

Usage

lfa_random_forest(
  tree_data,
  excluded_input_columns,
  response_variable = "specie",
  ntree = 100,
  seed = 123,
  ...
)

6.5.45 `lfa_rasterize_chunk`

Rasterize Lidar Chunk

Arguments

Argument	Description
`chunk`	Lidar chunk object to be rasterized.
`...`	Additional arguments to be passed to the underlying lidR::rasterize_canopy function.

Description

This function rasterizes a Lidar chunk to generate a raster representation of the canopy.

Details

The function takes a Lidar chunk as input and uses lidR::rasterize_canopy to generate a raster representation of the canopy. Additional arguments can be passed to customize the rasterization process.

Value

A raster layer representing the rasterized canopy.

Examples

# Example Lidar chunk
lidar_chunk <- readLAS("lidar_data.las", select = "xyz")
# Rasterize Lidar chunk
rasterized_canopy <- lfa_rasterize_chunk(lidar_chunk)

Usage

lfa_rasterize_chunk(chunk, ...)

6.5.46 `lfa_rd_to_qmd`

Convert Rd File to Markdown

Arguments

Argument	Description
`rdfile`	The path to the Rd file or a parsed Rd object.
`outfile`	The path to the output Markdown file (including the file extension).
`append`	Logical, indicating whether to append to an existing file (default is FALSE).

Description

IMPORTANT NOTE: This function is nearly identical to the Rd2md::Rd2markdown function from the Rd2md package. We needed to implement our own version of it because of various reasons:

The algorithm uses hardcoded header sizes (h1 and h2 in original) which is not feasible for our use-case of the markdown.
We needed to add some Quarto Markdown specifics, e.g. to make sure that the examples will not be runned.
We want to exclude certain tags from our implementation.

Details

For that reason we copied the method and made changes as needed and also added this custom documentation.

This function converts an Rd (R documentation) file to Markdown format (.md) and saves the converted file at the specified location. The function allows appending to an existing file or creating a new one. The resulting Markdown file includes sections for the function’s name, title, and additional content such as examples, usage, arguments, and other sections present in the Rd file.

The function performs the following steps:

Parses the Rd file using the Rd2md package.
Creates a Markdown file with sections for the function’s name, title, and additional content.
Appends the content to an existing file if append is set to TRUE.
Saves the resulting Markdown file at the specified location.

Value

This function does not explicitly return any value. It saves the converted Markdown file at the specified location as described in the details section.

Examples

# Convert Rd file to Markdown and save it
lfa_rd_to_md("path/to/your/file.Rd", "path/to/your/output/file.md")

# Convert Rd file to Markdown and append to an existing file
lfa_rd_to_md("path/to/your/file.Rd", "path/to/existing/output/file.md", append = TRUE)

Usage

lfa_rd_to_qmd(rdfile, outfile, append = FALSE)

6.5.47 `lfa_rd_to_results`

Convert Rd Files to Markdown and Merge Results

Description

This function converts all Rd (R documentation) files in the “man” directory to Markdown format (.qmd) and saves the converted files in the “results/appendix/package-docs” directory. It then merges the converted Markdown files into a single string and saves the merged content into a file named “docs.qmd” in the “results/appendix/package-docs” directory.

Details

The function performs the following steps:

Removes any existing “docs.qmd” file in the “results/appendix/package-docs” directory.
Finds all Rd files in the “man” directory.
Converts each Rd file to Markdown format (.qmd) using the lfa_rd_to_qmd function.
Saves the converted Markdown files in the “results/appendix/package-docs” directory.
Merges the content of all converted Markdown files into a single string.
Saves the merged content into a file named “docs.qmd” in the “results/appendix/package-docs” directory.

Value

This function does not explicitly return any value. It performs the conversion, merging, and saving operations as described in the details section.

Examples

# Convert Rd files to Markdown and merge the results
lfa_rd_to_results()

Usage

lfa_rd_to_results()

6.5.48 `lfa_read_area_as_catalog`

Read LiDAR data from a specified species and location as a catalog.

Arguments

Argument	Description
`specie`	A character string specifying the species of interest.
`location_name`	A character string specifying the name of the location.

Description

This function constructs the file path based on the specified specie and location_name , lists the directories at that path, and reads the LiDAR data into a lidR::LAScatalog .

Value

A lidR::LAScatalog object containing the LiDAR data from the specified location and species.

Examples

lfa_read_area_as_catalog("beech", "location1")

Usage

lfa_read_area_as_catalog(specie, location_name)

6.5.49 `lfa_recall_per_class`

Calculate Recall per Class from Confusion Matrix

Arguments

Argument	Description
`confusion_matrix`	Confusion matrix obtained from a classification evaluation.

Description

This function calculates recall for each class based on the provided confusion matrix.

Details

Recall (Sensitivity or True Positive Rate) is a measure of the ability of a classification model to identify all relevant instances. It is calculated as the ratio of true positives to the sum of true positives and false negatives.

Value

A numeric vector representing recall for each class.

Examples

# Example confusion matrix
cm <- table(predicted = c("A", "B", "A", "B"), actual = c("A", "A", "B", "B"))
# Calculate recall per class
recall_vector <- lfa_recall_per_class(cm)

Usage

lfa_recall_per_class(confusion_matrix)

6.5.50 `lfa_run_test_asymmetric`

Asymmetric Pairwise Test for Categories

Arguments

Argument	Description
`data`	A data frame containing the relevant columns.
`data_column`	A character string specifying the column containing the numerical data.
`category_column`	A character string specifying the column containing the categorical variable.
`test_function`	A function used to perform the pairwise test between two sets of data. It should accept two vectors of numeric data and additional parameters specified by `...` . The function should return a numeric value representing the test result.
`...`	Additional parameters to be passed to the `test_function` .

Description

This function performs an asymmetric pairwise test for categories using a user-defined test_function .

Details

The function calculates the test results for each unique combination of categories using the specified test_function . The resulting table is asymmetric, containing the test results for comparisons from the rows to the columns.

Value

A data frame representing the results of the asymmetric pairwise tests between categories.

Examples

# Define a custom test function
custom_test_function <- function(x, y) {
# Your test logic here
# Return a numeric result
return(mean(x) - mean(y))
}

# Perform an asymmetric pairwise test
result <- lfa_run_test_asymmetric(your_data, "numeric_column", "category_column", custom_test_function)

Usage

lfa_run_test_asymmetric(data, data_column, category_column, test_function, ...)

6.5.51 `lfa_run_test_symmetric`

Symmetric Pairwise Test for Categories

Arguments

Argument	Description
`data`	A data frame containing the relevant columns.
`data_column`	A character string specifying the column containing the numerical data.
`category_column`	A character string specifying the column containing the categorical variable.
`test_function`	A function used to perform the pairwise test between two sets of data. It should accept two vectors of numeric data and additional parameters specified by `...` . The function should return a numeric value representing the test result.
`...`	Additional parameters to be passed to the `test_function` .

Description

This function performs a symmetric pairwise test for categories using a user-defined test_function .

Details

The function calculates the test results for each unique combination of categories using the specified test_function . The resulting table is symmetric, containing the test results for comparisons from the rows to the columns. The upper triangle of the matrix is filled with NA to avoid duplicate results.

Value

A data frame representing the results of the symmetric pairwise tests between categories.

Examples

# Define a custom test function
custom_test_function <- function(x, y) {
# Your test logic here
# Return a numeric result
return(mean(x) - mean(y))
}

# Perform a symmetric pairwise test
result <- lfa_run_test_symmetric(your_data, "numeric_column", "category_column", custom_test_function)

Usage

lfa_run_test_symmetric(data, data_column, category_column, test_function, ...)

6.5.52 `lfa_save_all_neighbours`

Save Neighbors for All Areas

Arguments

Argument	Description
`n`	The number of nearest trees to find for each tree (default is 100).

Description

This function iterates through all detection areas, finds the n nearest trees for each tree, and saves the result to a GeoPackage file for each area.

Examples

# Save neighbors for all areas with default value (n=100)
lfa_save_all_neighbours()

# Save neighbors for all areas with a specific value of n (e.g., n=50)
lfa_save_all_neighbours(n = 50)

Usage

lfa_save_all_neighbours(n = 100)

6.5.53 `lfa_segmentation`

Segment the elements of an point cloud by trees

Arguments

Argument	Description
`ctg`	An LASCatalog object. If not null, it will perform the actions on this object, if NULL inferring the catalog from the tile_location
`tile_location`	A tile_location type object holding the information about the location of the catalog. This is used to save the catalog after processing too.

Author

Jakob Danel

Description

This function will try to to divide the hole point cloud into unique trees. Therefore it is assigning for each chunk of the catalog a treeID for each point. Therefore the algorithm uses the li2012 implementation with the following parameters: li2012(dt1 = 2, dt2 = 3, R = 2, Zu = 10, hmin = 5, speed_up = 12) NOTE : The operation is in place and can not be reverted, the old values of the point cloud will be deleted!

Value

A catalog where each chunk has additional treeID values indicating the belonging tree.

Usage

lfa_segmentation(ctg, tile_location)

6.5.54 `lfa_set_flag`

Set a flag to indicate the completion of a specific process.

Arguments

Argument	Description
`flag_name`	A character string specifying the name of the flag file. It should be a descriptive and unique identifier for the process being flagged.

Description

This function creates a hidden flag file at a specified location within the working directory to indicate that a particular processing step has been completed. If the flag file already exists, a warning is issued.

Value

This function does not have a formal return value.

Examples

# Set the flag for a process named "data_processing"
lfa_set_flag("data_processing")

Usage

lfa_set_flag(flag_name)

6.5.55 `lfa_visit_all_areas`

Visit All Areas and Apply Preprocessing Function

Arguments

Argument	Description
`preprocessing_function`	The preprocessing function to be applied to each area. It should take specie, area, and additional parameters as inputs.
`areas`	Data frame containing information about different areas, including columns “specie” and “area.”
`...`	Additional arguments to be passed to the preprocessing function.

Description

This function iterates over all specified areas and applies a preprocessing function to each one.

Details

The function iterates over all areas specified in the ‘areas’ parameter, and for each area, it applies the provided preprocessing function. The ‘areas’ parameter is expected to be a data frame with columns “specie” and “area,” containing information about different areas to visit. Additional arguments passed via ‘…’ are forwarded to the preprocessing function.

Value

A list containing the results of applying the preprocessing function to each area.

Examples

# Example preprocessing function
my_preprocessing_function <- function(specie, area, ...) {
# Your preprocessing logic here
# Return the result
return(result)
}
# Visit all areas and apply the preprocessing function
results_list <- lfa_visit_all_areas(my_preprocessing_function)

Usage

lfa_visit_all_areas(preprocessing_function, areas = lfa_get_all_areas(), ...)

6.5.56 `lfa_visualize_rf_metrics`

Visualize Precision and Recall Metrics for Random Forest Classification

Arguments

Argument	Description
`metrics_df`	Data frame containing precision and recall metrics for each class.

Description

This function creates a bar plot to visualize precision and recall metrics for each class obtained from a Random Forest classification.

Details

The function creates a bar plot to visualize precision and recall metrics for each class obtained from a Random Forest classification.

Value

A ggplot object representing the bar plot of precision and recall metrics.

Examples

# Example data frame containing precision and recall metrics
example_metrics_df <- data.frame(
Class = c("ClassA", "ClassB"),
Precision = c(0.85, 0.92),
Recall = c(0.78, 0.88)
)
# Visualize precision and recall metrics
lfa_visualize_rf_metrics(example_metrics_df)

Usage

lfa_visualize_rf_metrics(metrics_df)

Footnotes

https://www.opengeodata.nrw.de/produkte/geobasis/hm/3dm_l_las/3dm_l_las/, last visited 7th Dec 2023↩︎
https://github.com/joheisig/GEDIcalibratoR, last visited 7th Dec 2023↩︎

1 Introduction

2 Methods

2.1 Data acquisition

2.2 Preprocessing

2.3 Analysis of different distributions

2.4 Random Forest for predicting species

2.5 Implementation

3 Results

3.1 Researched areas

3.2 Distribution of tree characteristics

3.2.1 Tree Heights

3.2.2 Number of returns

3.2.3 n-nearest Neighbours

Overview

The Nearest Neighbour

The 100th nearest Neighbor

Average distance to 100 nearest neighbors

3.2.4 Density of forest patches

3.2.5 Canopy Height Model

3.3 Random Forest Predictions

3.3.1 Use neighbors and height

3.3.2 Enrich Neighbors with segmentation data

3.3.3 Train with patch level information

4 Discussion

4.1 Findings

4.2 Limitations

4.3 Further Work

4.4 Conclusion

5 References

6 Appendix

6.1 Contributions

6.1.1 Frederick

6.1.2 Jakob

6.2 Script which can be used to do all preprocessing

6.3 Canopy Height Models

6.4 Quantitative Results

6.4.1 Distribution of Z-Values

Kullback-Leibler-Divergence

Jensen-Shannon Divergence

6.4.2 Nearest Neighbours

Distribution of nearest neighbor distances

Kullback-Leibler-Divergence

Jensen-Shannon Divergence

Distribution of distances to 100th nearest neighbor

Kullback-Leibler-Divergence

Jensen-Shannon Divergence

Distribution of average nearest neighbor distances

Kullback-Leibler-Divergence

Jensen-Shannon Divergence

6.4.3 Distribution of the number of returns

Kullback-Leibler-Divergence

Jensen-Shannon Divergence

6.5 Documentation

6.5.1 lfa_calculate_patch_density

Arguments

Description

Value

Examples

Usage

6.5.2 lfa_calculate_rf_metrics

Arguments

Description

Details

Seealso

Value

Examples

Usage

6.5.3 lfa_capitalize_first_char

Arguments

Concept

Description

Details

Keyword

Note

References

Seealso

Value

Examples

Usage

6.5.4 lfa_check_flag

6.5.1 `lfa_calculate_patch_density`

6.5.2 `lfa_calculate_rf_metrics`

6.5.3 `lfa_capitalize_first_char`

6.5.4 `lfa_check_flag`

6.5.5 `lfa_chm`

6.5.6 `lfa_combine_sf_obj`

6.5.7 `lfa_count_returns_all_areas`

6.5.8 `lfa_count_returns_per_tree`

6.5.9 `lfa_create_boxplot`

6.5.10 `lfa_create_density_plots`

6.5.11 `lfa_create_grouped_bar_plot`

6.5.12 `lfa_create_neighbor_mean_curves`

6.5.13 `lfa_create_plot_per_area`

6.5.14 `lfa_create_ppp_from_area`