Gun Crime Data: Initial Processing for New Query Returns

Introduction

This file provides documentation and scripting for pre-processing Newport News Police Department RMS data for use in ArcGIS Pro.

The NN GIS team should receive query outputs in a single Excel Workbook file containing three separate tabs

Tab 1: Incidents,

Tab 2: Offenses, and

Tab 3: Incident Victims.

The SQL queries that the Excel file derives from are available at the following path: “N:/Newport News/CityManager/GunViolence/Data for Analysis/Data Documentation/SQL Queries”

The script will generate three .csv files:

gunCrimeIncidents_Routput.csv,
offenderLocations_Routput.csv’, and
victimInformation_Routput.csv

Following the successful execution of this script, refer to the ‘Script2_GeocodeRematching’ document for next steps.

Install Necessary Packages

Run the code chunk below to ensure that the packages required by this script are installed on your machine.

requiredPackages <- c("tidyverse", "openxlsx")
for (package in requiredPackages) {
  if (!package %in% installed.packages()) {
    install.packages(package)
  }
}

Required Manual User Input

Replace folder_name in the code chunk below with your desired new folder name where output will be stored (e.g., ‘SecondHalf2023’ or ‘2022to2023’). Be sure to keep the name in quotation marks. This folder does not need to exist already. The script will create the folder within the R_output directory.

Replace path/filename.xlsx with the complete path to the Excel workbook containing the new data.

IMPORTANT: Slashes must be written in path names as either a single forward slash (“path/file”) or two backslashes (“path\\file”)

newFolder <- "folder_name"  

# Use FORWARD slashes in file name, or else TWO backslashes
xlsx_file <- "path/filename.xlsx"

Once the variables above are defined, use the ‘Run All’ command (Ctrl + Alt + R) to execute the script. Alternatively, you can click the “Run” button in the top right corner of this pane and select ‘Run All’ from the drop-down list.

Pre-Processing Script (w/ Documentation)

Create Output Directory

# Define output directory path
output_dir <- paste0("N:/Newport News/CityManager/GunViolence/Data/Processed Data [R Output]/", newFolder)

if (!dir.exists(output_dir)){
  dir.create(output_dir)
}

Read NNPD Query Output .xlsx into R

# Load the required libraries
library(openxlsx)
library(tidyverse)

# Get the names of all sheets in the Excel file
sheet_names <- getSheetNames(xlsx_file)

# Read each sheet into a list of dataframes
data_frames <- setNames(
  lapply(sheet_names, function(sheet_name) {
    read.xlsx(xlsx_file, sheet = sheet_name)
  }),
  sheet_names
)

# Assign dataframes to variable names that conform to the naming conventions of this script
Incidents <- data_frames[[which(
  grepl("incident", names(data_frames), ignore.case = TRUE) & 
    !grepl("offen|victim", names(data_frames), ignore.case = TRUE))]]

Offenses <- data_frames[[which(grepl("offen", names(data_frames), ignore.case = TRUE))]]
  
Victims <- data_frames[[which(grepl("victim", names(data_frames), ignore.case = TRUE))]]

Data Manipulation

The three data sets have now been read into the R environment. These data will be used as inputs to produce a ‘master’ Newport News gun crime data set for the queried time period along with a separate ‘victim information’ data set. The victim information can be joined to a copy of the gun crime incident ‘master’ layer, thereby creating a spatial data set for gun crime victims with associated demographic data (age, sex, race).

Identify and Remove non-Newport News Locations

Because of mutual aid agreements with other municipalities, these data sets may contain information about crime incidents that occurred in other jurisdictions. The following code aims to identify and remove as many of these incidents as possible. It also removes incidents for which location information is ‘NA’ (i.e., cannot be mapped).

# Remove as many non-Newport News incidents as possible
  ## Create vector with strings indicating non-NN addresses
  outsideNN_values <- 
            c("VIRGINIA BEACH", "WILLIAMSBURG", "SMITHFIELD", 
              "YORKTOWN, VA", "HAMPTON, VA", "HP", "NORFOLK", 
              "SUFFOLK", "CHESAPEAKE, VA", "PORTSMOUTH", "POQUOSON", 
              "JAMES CITY COUNTY") 

  ## Initialize an empty vector to store incident numbers
    outsideNN_incNos <- c()
    
  ## Loop through each string and identify gun incident locations that contain the string
    for (string in outsideNN_values) {
      
      matching_incidents <- 
        Incidents[grepl(string, Incidents$ADDRESS), "INCIDENTNO"]
      
      outsideNN_incNos <- 
        c(outsideNN_incNos, matching_incidents)
    }

  ## Filter out these incidents from the main data set
    Incidents_NN <- Incidents %>%
    filter(!INCIDENTNO %in% outsideNN_incNos)
  
  ## Filter out incidents for which location information is unavailable
    Incidents_NN <- Incidents_NN %>%
      filter(!is.na(ADDRESS))

Remove Incidents with No Locational Data

Some small fraction of incidents will likely contain only “, NEWPORT NEWS, VA” as the ADDRESS value. These incidents cannot be mapped and should be removed from the data set. The code below will remove those incidents but preserve them in a separate .csv file.

Incidents_NN <- Incidents_NN %>%
  filter(ADDRESS != ", NEWPORT NEWS, VA")

incidents_noLocation <- Incidents_NN %>%
  filter(ADDRESS == ", NEWPORT NEWS, VA")


write_csv(incidents_noLocation, paste0(output_dir, "/removedIncidents_noLocation.csv"))

    paste(nrow(incidents_noLocation), "incidents in this data set were removed due to absence of location information.")

Remove Duplicate Incidents

This is a precaution. There was once an instance of a duplicate row in the original data that led to issues when trying to carry out a one-to-many join at a later stage.

Incidents_NN <- Incidents_NN %>%
  distinct(INCIDENTNO, .keep_all = TRUE)

Create `YEAR` Field

#`DATETIME` field MUST be converted from Excel's internal datetime representation into a more legible datetime format.
  ## The options() function is necessary here to ensure that *all* the digits corresponding to datetime are printed when the vector is passed to the `as.numeric()` function in next line
  options(digits=17) 

  ## Convert `DATETIME` to numeric vector before converting to actual datetime object. Properly formatted datetime is then coerced back into a character vector to ensure smooth entry into ArcGIS (Pro will automatically recognize it as a datetime field).
  Incidents_NN$DATETIME <- 
    as.character(
      convertToDateTime(as.numeric(Incidents_NN$DATETIME))
    )
  
      ### Format `DATETIME` field to eliminate trailing ":00" (seconds)
      Incidents_NN$DATETIME <- 
        gsub(":00$", "", Incidents_NN$DATETIME)

      
# Extract the year from the `DATETIME` field and use it to populate `YEAR` field.
    ## a. Identify and define the indices of incidents with an associated datetime. Those incidents without a datetime will be addressed in step c.
    has_time <- 
      grepl(":", Incidents_NN$DATETIME)

    ## b. Extract year from `DATETIME` field
      ### This line of code will likely produce a warning indicated some number 'failed to parse'. This can be safely ignored.
    Incidents_NN$YEAR <- 
      ifelse(has_time, 
             year(ymd_hm(Incidents_NN$DATETIME)), 
             year(ymd(Incidents_NN$DATETIME))
      )
            

    ## c. Incidents with missing DATETIME information can still have YEAR populated with a high degree of confidence based on the first four digits of the INCIDENTNO
    NA_indices <- 
      which(is.na(Incidents_NN$YEAR)
            )
    
    Incidents_NN[NA_indices, ]$YEAR <- 
      substr(as.character(Incidents_NN[NA_indices,]$INCIDENTNO), 1, 4)

Create `ALLOFFENSES` Field

Each unique incident from the ‘Incidents’ data set has a primary offense associated with it. Many incidents have multiple offenses associated with them. Using the OFFENSE field from the Incident Offenses data set it is possible to create a single field containing a string with all offenses for each incident.

# Group incidents by 'INCIDENTNO' and create a field all offenses concatenated
offConcat <- 
  Offenses %>%
    group_by(INCIDENTNO) %>%
    arrange(desc(ISPRIMARY)
            ) %>%   # This puts the row with "Y" for ISPRIMRAY first and ensures primary offense will be first in the concatenated string
    mutate(ALLOFFENSES = paste0(OFFENSE, collapse = "; ")
           ) %>%
    ungroup()

# Select for only inc # and concatenated offense col
offConcat <- 
  offConcat[, c("INCIDENTNO", "ALLOFFENSES")]


# Eliminate repeat rows to produce data frame with one row per incident
uniqInc_offConcat <- 
  distinct(offConcat, INCIDENTNO, .keep_all = TRUE)

Remove suicide-only incidents

Using the ALLOFFENSES field, it is now possible to identify ostensibly self-directed gun-violence incidents that did not involve another victim and filter them out.

# Identify incidents in which suicide was NOT accompanied by homicide, assault, or kidnapping
suicide_only <- uniqInc_offConcat %>%
  filter(
    grepl("SUICIDE", ALLOFFENSES) &
    !grepl("HOMICIDE|KIDNAP|ASSAULT", ALLOFFENSES)
  )

# Filter those incidents from data set
Incidents_NN <- Incidents_NN[!Incidents_NN$INCIDENTNO %in% suicide_only$INCIDENTNO, ]

Create `FIREARM` Field

Most incidents have a single weapon type associated with them, while others have several (e.g., firearm + knife, or firearm + hands). Given that our focus is on gun violence it seems extraneous to include non-firearm weapons, and doing so would complicate attempts at summary statistics (e.g., “HANDGUN; PERSONAL WEAPONS (HANDS, ETC.)” would be recognized as a distinct category and not be grouped with “HANDGUN”).

# Define the list of possible values
firearm_values <- c("HANDGUN", "FIREARM (TYPE NOT STATED)", "OTHER FIREARM", "SHOTGUN", "RIFLE")

# Create the new FIREARM field
Offenses$FIREARM <- 
  ifelse(Offenses$WEAPON %in% firearm_values, Offenses$WEAPON, NA)

offenses_narm <- 
  na.omit(Offenses)


# Group by INCIDENTNO and concatenate FIREARM values for each group
uniqueInc_firearmOffenses <- 
  offenses_narm %>%
    group_by(INCIDENTNO) %>%
    arrange(WEAPON) %>%
    summarize(FIREARM = paste(unique(FIREARM), collapse = ", ")
              )

Victim Totals by Incident

This next code chunk applies different filters to the ‘Incident Victims’ data set and then groups rows by incident number to arrive at the number of unique victimized individuals , physically injured, or killed. There is a “VICDECEASED” field, but it does not provide an accurate record of victims killed. That must be done by linking individual victims to the specific offense of ‘HOMICIDE/MURDER’. An individual who is a victim of more than one separate incident is treated as multiple victims.

vic_totals <- 
  Victims %>%
    filter(NAMETYPE == "INDIVIDUAL") %>%
    group_by(INCIDENTNO) %>%
    summarize(VICTOTAL = n_distinct(VICNO)  # n_distinct counts the number of distinct, or unique, values per group
              ) 


injured <- 
  Victims %>%
    filter(VICINJURED == "Y") %>%
    group_by(INCIDENTNO) %>%
    summarize(TOTALINJURED = n_distinct(VICNO)
              )


gun_deaths <- 
  Victims %>%
    filter(NAMETYPE == "INDIVIDUAL" & OFFENSE == "HOMICIDE/MURDER" ) %>%
    group_by(INCIDENTNO) %>%
    summarize(DEATHS = n_distinct(VICNO)
              )

Modify `ADDRESS` Field ( “/” -> “&”)

The Newport News geocoder fails to accurately identify some cross streets indicated by a “/” (e.g. “Warwick Ave / Main St”). This code replaces all forward slashes in the address field with an ampersand.

Incidents_NN$ADDRESS <- 
  gsub("/", "&", Incidents_NN$ADDRESS)

Create an `OFFENSECAT` Field

There are potentially 25+ unique primary offense types, many of which comprise only a tiny fraction of total incidents. To make it easier for users to filter incident based on offense, the offenses have been assigned to groups.

# Function to assign offenses to broad category groups for easier filtering in Insights/Dashboards
offenseType <- function(offenses) {
  result <- c()
  for (i in seq_along(offenses)) {
    if (offenses[i] %in% c("RAPE", "FORCIBLE SODOMY", "SEXUAL ASSAULT W/AN OBJECT")) {
      result[i] <- "Sexual Violence"
    } 
    else if (offenses[i] %in% c("AGGRAVATED ASSAULT", "ROBBERY", 
                                  "SIMPLE ASSAULT", "ROBBERY-INDIVIDUAL", "ROBBERY-BUSINESS", 
                                  "KIDNAP/ABDUCTION", "INTIMIDATION")) {
      result[i] <- "Violence Against Persons"
    } 
    else if (offenses[i] == "HOMICIDE/MURDER") {
      result[i] <- "Homicide/Murder"
    }
    else if (offenses[i] %in% c("BURGLARY/B & E", "DAMAGE PROPERTY", 
                                  "STOLEN PROPERTY OFFENSES", "VEHICLE THEFT", 
                                  "THEFT FROM VEHICLE", "THEFT FROM BUILDING", 
                                  "SHOPLIFTING", "LARCENY", "IDENTITY THEFT")) {
      result[i] <- "Property Crimes"
    } 
    else if (offenses[i] %in% c("DRUG EQUIPMENT - POSSESS", "DRUG/NARCOTICS OFFENSES")) {
      result[i] <- "Drug Crimes"
    } 
    else if (offenses[i] == "WEAPON OFFENSE") {
      result[i] <- "Weapon Offense"
    } 
    else {
      result[i] <- "Other Offenses"
    }
  }
  return(result)
}


# Pass `PRIMOFFENSE` to the above function and use output to populate a new field (OFFENSECAT) in the Incidents_NN df, then move it so that it sits next to the PRIMOFFENSE category (on which its value is based)
Incidents_NN$OFFENSECAT <- 
  offenseType(Incidents_NN$PRIMOFFENSE)

relocate(Incidents_NN, OFFENSECAT, .after = PRIMOFFENSE)

Create ‘Master’ Data Set of Gun Crime Incidents

# Join derived data fields to incidents data frame
MASTER_LIST <- 
  Incidents_NN %>%
    left_join(vic_totals, by = "INCIDENTNO") %>%  
    left_join(injured, by = "INCIDENTNO") %>% 
    left_join(gun_deaths, by = "INCIDENTNO") %>%
    left_join(uniqInc_offConcat, by = "INCIDENTNO") %>%
    left_join(uniqueInc_firearmOffenses, by = "INCIDENTNO")

# Convert NA values to 0 (i.e., wherever there were no 'individual' victims)
MASTER_LIST$VICTOTAL[is.na(MASTER_LIST$VICTOTAL)] <- 0
MASTER_LIST$TOTALINJURED[is.na(MASTER_LIST$TOTALINJURED)] <- 0
MASTER_LIST$DEATHS[is.na(MASTER_LIST$DEATHS)] <- 0


# Write out data frame as .csv file
write_csv(MASTER_LIST, paste0(output_dir, "/gunCrimeIncidents_Routput.csv"))

Create ‘Victim Information’ Data Set

# Filter out SOCIETY and BUSINESS victim types (Commonwealth of VA) 
vics <- 
  Victims[Victims$NAMETYPE == "INDIVIDUAL", ]
    
# Use distinct() to ensure that only one row for each unique combination of INCIDENTNO and VICNO is retained
unique_vics <- vics %>%
  distinct(INCIDENTNO, VICNO, .keep_all = TRUE) # .keep_all = TRUE ensures all cols are kept
    
# Select only victim-relevant cols
unique_vics <- 
  unique_vics[, c("INCIDENTNO", "RECORDNO", "VICNO",
                               "VICAGE", "VICSEX", "VICRACE", "VICINJURED",
                               "OFFENSE")]

# Turn VICINJURED into a binary numeric field to allow for dashboard statistics/summaries
unique_vics$VICINJURED_BINARY <- 
  as.numeric(ifelse(unique_vics$VICINJURED == "Y", 1, 0)
             )

# Create a VICDECEASED_BINARY field based on whether the victim is tied to a 'homicide/murder' offense
unique_vics$VICDECEASED_BINARY <- 
  as.numeric(ifelse(grepl("HOMICIDE", unique_vics$OFFENSE), 1, 0)
             )

# Create a VICDECEASD character field for more intuitive presentation of information in, e.g., pop-ups
unique_vics$VICDECEASED <- 
  ifelse(unique_vics$VICDECEASED_BINARY == 1, "Y", "N")

# Write out data frame as .csv file  
write_csv(unique_vics, paste0(output_dir, "/victimInformation_Routput.csv")
          )

Create ‘Offender Location’ Data Set

# Identify total unique offenders across all incidents. As with victimss, if offender appears in multiple incidents, each appearance is counted separately
distinctOffenders <- 
  Victims %>%
  distinct(INCIDENTNO, OFFENDNO, .keep_all = TRUE)

# Filter out rows with NULL value for OFFENDLOC
offenderloc_no.null <- 
  distinctOffenders[!grepl("NULL", distinctOffenders$OFFENDLOC), ]  

# Calculate a 'METAOFFENSE' field 
## -----------
# custum function to take offense as input and group into a 'meta-offense' category
calculate_metaoffense <- function(offense) {
  result <- character(length(offense))  # Initialize a character vector for the results
  
  for (i in 1:length(offense)) {
    if (offense[i] == "HOMICIDE/MURDER") {
      result[i] <- "Homicide/Murder"
    } else if (offense[i] == "DAMAGE PROPERTY") {
      result[i] <- "Property Damage"
    } else if (offense[i] %in% c("ROBBERY", "ROBBERY-BUSINESS", "ROBBERY-INDIVIDUAL")) {
      result[i] <- "Robbery"
    } else if (offense[i] == "INTIMIDATION") {
      result[i] <- "Intimidation"
    } else if (offense[i] %in% c("SIMPLE ASSAULT", "AGGRAVATED ASSAULT")) {
      result[i] <- "Assault"
    } else {
      result[i] <- "Other"
    }
  }
  return(result)
}
## -------------

# Apply custum meta-offense function
offenderloc_no.null$METAOFFENSE <- 
  calculate_metaoffense(offenderloc_no.null$OFFENSE)

# Remove offenders for whom no locational data is available (i.e., "UNKNOWN" or "HOMELESS")
offenderloc_final <-
  offenderloc_no.null[!grepl("HOMELESS", offenderloc_no.null$OFFENDLOC) & 
                      !grepl("UNKNOWN", offenderloc_no.null$OFFENDLOC), ]


# Write out data frame as .csv file  
write_csv(offenderloc_final, paste0(output_dir, "/offenderLocations_Routput.csv")
          )

# Write out data .csv with offenders missing locational information, for reference purposes
write_csv(offenderloc_no.null[grepl("HOMELESS", offenderloc_no.null$OFFENDLOC) | 
                              grepl("UNKNOWN", offenderloc_no.null$OFFENDLOC), ],
          paste0(output_dir, "/removedOffenders_noLocation.csv")
          )