This file provides documentation and scripting for pre-processing Newport News Police Department RMS data for use in ArcGIS Pro.
The NN GIS team should receive query outputs in a single Excel Workbook file containing three separate tabs
Tab 1: Incidents,
Tab 2: Offenses, and
Tab 3: Incident Victims.
The SQL queries that the Excel file derives from are available at the following path: “N:/Newport News/CityManager/GunViolence/Data for Analysis/Data Documentation/SQL Queries”
The script will generate three .csv files:
gunCrimeIncidents_Routput.csv,
offenderLocations_Routput.csv’, and
victimInformation_Routput.csv
Following the successful execution of this script, refer to the ‘Script2_GeocodeRematching’ document for next steps.
Run the code chunk below to ensure that the packages required by this script are installed on your machine.
requiredPackages <- c("tidyverse", "openxlsx")
for (package in requiredPackages) {
if (!package %in% installed.packages()) {
install.packages(package)
}
}
Replace folder_name
in the code chunk below with your
desired new folder name where output will be stored (e.g.,
‘SecondHalf2023’ or ‘2022to2023’). Be sure to keep the
name in quotation marks. This folder does not need to exist already. The
script will create the folder within the R_output
directory.
Replace path/filename.xlsx
with the complete path to the
Excel workbook containing the new data.
IMPORTANT: Slashes must be written in path names as either a single forward slash (“path/file”) or two backslashes (“path\\file”)
newFolder <- "folder_name"
# Use FORWARD slashes in file name, or else TWO backslashes
xlsx_file <- "path/filename.xlsx"
Once the variables above are defined, use the ‘Run All’ command (Ctrl + Alt + R) to execute the script. Alternatively, you can click the “Run” button in the top right corner of this pane and select ‘Run All’ from the drop-down list.
# Define output directory path
output_dir <- paste0("N:/Newport News/CityManager/GunViolence/Data/Processed Data [R Output]/", newFolder)
if (!dir.exists(output_dir)){
dir.create(output_dir)
}
# Load the required libraries
library(openxlsx)
library(tidyverse)
# Get the names of all sheets in the Excel file
sheet_names <- getSheetNames(xlsx_file)
# Read each sheet into a list of dataframes
data_frames <- setNames(
lapply(sheet_names, function(sheet_name) {
read.xlsx(xlsx_file, sheet = sheet_name)
}),
sheet_names
)
# Assign dataframes to variable names that conform to the naming conventions of this script
Incidents <- data_frames[[which(
grepl("incident", names(data_frames), ignore.case = TRUE) &
!grepl("offen|victim", names(data_frames), ignore.case = TRUE))]]
Offenses <- data_frames[[which(grepl("offen", names(data_frames), ignore.case = TRUE))]]
Victims <- data_frames[[which(grepl("victim", names(data_frames), ignore.case = TRUE))]]
The three data sets have now been read into the R environment. These data will be used as inputs to produce a ‘master’ Newport News gun crime data set for the queried time period along with a separate ‘victim information’ data set. The victim information can be joined to a copy of the gun crime incident ‘master’ layer, thereby creating a spatial data set for gun crime victims with associated demographic data (age, sex, race).
Because of mutual aid agreements with other municipalities, these data sets may contain information about crime incidents that occurred in other jurisdictions. The following code aims to identify and remove as many of these incidents as possible. It also removes incidents for which location information is ‘NA’ (i.e., cannot be mapped).
# Remove as many non-Newport News incidents as possible
## Create vector with strings indicating non-NN addresses
outsideNN_values <-
c("VIRGINIA BEACH", "WILLIAMSBURG", "SMITHFIELD",
"YORKTOWN, VA", "HAMPTON, VA", "HP", "NORFOLK",
"SUFFOLK", "CHESAPEAKE, VA", "PORTSMOUTH", "POQUOSON",
"JAMES CITY COUNTY")
## Initialize an empty vector to store incident numbers
outsideNN_incNos <- c()
## Loop through each string and identify gun incident locations that contain the string
for (string in outsideNN_values) {
matching_incidents <-
Incidents[grepl(string, Incidents$ADDRESS), "INCIDENTNO"]
outsideNN_incNos <-
c(outsideNN_incNos, matching_incidents)
}
## Filter out these incidents from the main data set
Incidents_NN <- Incidents %>%
filter(!INCIDENTNO %in% outsideNN_incNos)
## Filter out incidents for which location information is unavailable
Incidents_NN <- Incidents_NN %>%
filter(!is.na(ADDRESS))
Some small fraction of incidents will likely contain only “, NEWPORT
NEWS, VA” as the ADDRESS
value. These incidents cannot be
mapped and should be removed from the data set. The code below will
remove those incidents but preserve them in a separate .csv
file.
Incidents_NN <- Incidents_NN %>%
filter(ADDRESS != ", NEWPORT NEWS, VA")
incidents_noLocation <- Incidents_NN %>%
filter(ADDRESS == ", NEWPORT NEWS, VA")
write_csv(incidents_noLocation, paste0(output_dir, "/removedIncidents_noLocation.csv"))
paste(nrow(incidents_noLocation), "incidents in this data set were removed due to absence of location information.")
This is a precaution. There was once an instance of a duplicate row in the original data that led to issues when trying to carry out a one-to-many join at a later stage.
Incidents_NN <- Incidents_NN %>%
distinct(INCIDENTNO, .keep_all = TRUE)
YEAR
Field#`DATETIME` field MUST be converted from Excel's internal datetime representation into a more legible datetime format.
## The options() function is necessary here to ensure that *all* the digits corresponding to datetime are printed when the vector is passed to the `as.numeric()` function in next line
options(digits=17)
## Convert `DATETIME` to numeric vector before converting to actual datetime object. Properly formatted datetime is then coerced back into a character vector to ensure smooth entry into ArcGIS (Pro will automatically recognize it as a datetime field).
Incidents_NN$DATETIME <-
as.character(
convertToDateTime(as.numeric(Incidents_NN$DATETIME))
)
### Format `DATETIME` field to eliminate trailing ":00" (seconds)
Incidents_NN$DATETIME <-
gsub(":00$", "", Incidents_NN$DATETIME)
# Extract the year from the `DATETIME` field and use it to populate `YEAR` field.
## a. Identify and define the indices of incidents with an associated datetime. Those incidents without a datetime will be addressed in step c.
has_time <-
grepl(":", Incidents_NN$DATETIME)
## b. Extract year from `DATETIME` field
### This line of code will likely produce a warning indicated some number 'failed to parse'. This can be safely ignored.
Incidents_NN$YEAR <-
ifelse(has_time,
year(ymd_hm(Incidents_NN$DATETIME)),
year(ymd(Incidents_NN$DATETIME))
)
## c. Incidents with missing DATETIME information can still have YEAR populated with a high degree of confidence based on the first four digits of the INCIDENTNO
NA_indices <-
which(is.na(Incidents_NN$YEAR)
)
Incidents_NN[NA_indices, ]$YEAR <-
substr(as.character(Incidents_NN[NA_indices,]$INCIDENTNO), 1, 4)
ALLOFFENSES
FieldEach unique incident from the ‘Incidents’ data set has a primary
offense associated with it. Many incidents have multiple offenses
associated with them. Using the OFFENSE
field from the
Incident Offenses data set it is possible to create a single field
containing a string with all offenses for each incident.
# Group incidents by 'INCIDENTNO' and create a field all offenses concatenated
offConcat <-
Offenses %>%
group_by(INCIDENTNO) %>%
arrange(desc(ISPRIMARY)
) %>% # This puts the row with "Y" for ISPRIMRAY first and ensures primary offense will be first in the concatenated string
mutate(ALLOFFENSES = paste0(OFFENSE, collapse = "; ")
) %>%
ungroup()
# Select for only inc # and concatenated offense col
offConcat <-
offConcat[, c("INCIDENTNO", "ALLOFFENSES")]
# Eliminate repeat rows to produce data frame with one row per incident
uniqInc_offConcat <-
distinct(offConcat, INCIDENTNO, .keep_all = TRUE)
Using the ALLOFFENSES
field, it is now possible to
identify ostensibly self-directed gun-violence incidents that did not
involve another victim and filter them out.
# Identify incidents in which suicide was NOT accompanied by homicide, assault, or kidnapping
suicide_only <- uniqInc_offConcat %>%
filter(
grepl("SUICIDE", ALLOFFENSES) &
!grepl("HOMICIDE|KIDNAP|ASSAULT", ALLOFFENSES)
)
# Filter those incidents from data set
Incidents_NN <- Incidents_NN[!Incidents_NN$INCIDENTNO %in% suicide_only$INCIDENTNO, ]
FIREARM
FieldMost incidents have a single weapon type associated with them, while others have several (e.g., firearm + knife, or firearm + hands). Given that our focus is on gun violence it seems extraneous to include non-firearm weapons, and doing so would complicate attempts at summary statistics (e.g., “HANDGUN; PERSONAL WEAPONS (HANDS, ETC.)” would be recognized as a distinct category and not be grouped with “HANDGUN”).
# Define the list of possible values
firearm_values <- c("HANDGUN", "FIREARM (TYPE NOT STATED)", "OTHER FIREARM", "SHOTGUN", "RIFLE")
# Create the new FIREARM field
Offenses$FIREARM <-
ifelse(Offenses$WEAPON %in% firearm_values, Offenses$WEAPON, NA)
offenses_narm <-
na.omit(Offenses)
# Group by INCIDENTNO and concatenate FIREARM values for each group
uniqueInc_firearmOffenses <-
offenses_narm %>%
group_by(INCIDENTNO) %>%
arrange(WEAPON) %>%
summarize(FIREARM = paste(unique(FIREARM), collapse = ", ")
)
This next code chunk applies different filters to the ‘Incident Victims’ data set and then groups rows by incident number to arrive at the number of unique victimized individuals , physically injured, or killed. There is a “VICDECEASED” field, but it does not provide an accurate record of victims killed. That must be done by linking individual victims to the specific offense of ‘HOMICIDE/MURDER’. An individual who is a victim of more than one separate incident is treated as multiple victims.
vic_totals <-
Victims %>%
filter(NAMETYPE == "INDIVIDUAL") %>%
group_by(INCIDENTNO) %>%
summarize(VICTOTAL = n_distinct(VICNO) # n_distinct counts the number of distinct, or unique, values per group
)
injured <-
Victims %>%
filter(VICINJURED == "Y") %>%
group_by(INCIDENTNO) %>%
summarize(TOTALINJURED = n_distinct(VICNO)
)
gun_deaths <-
Victims %>%
filter(NAMETYPE == "INDIVIDUAL" & OFFENSE == "HOMICIDE/MURDER" ) %>%
group_by(INCIDENTNO) %>%
summarize(DEATHS = n_distinct(VICNO)
)
ADDRESS
Field ( “/” -> “&”)The Newport News geocoder fails to accurately identify some cross streets indicated by a “/” (e.g. “Warwick Ave / Main St”). This code replaces all forward slashes in the address field with an ampersand.
Incidents_NN$ADDRESS <-
gsub("/", "&", Incidents_NN$ADDRESS)
OFFENSECAT
FieldThere are potentially 25+ unique primary offense types, many of which comprise only a tiny fraction of total incidents. To make it easier for users to filter incident based on offense, the offenses have been assigned to groups.
# Function to assign offenses to broad category groups for easier filtering in Insights/Dashboards
offenseType <- function(offenses) {
result <- c()
for (i in seq_along(offenses)) {
if (offenses[i] %in% c("RAPE", "FORCIBLE SODOMY", "SEXUAL ASSAULT W/AN OBJECT")) {
result[i] <- "Sexual Violence"
}
else if (offenses[i] %in% c("AGGRAVATED ASSAULT", "ROBBERY",
"SIMPLE ASSAULT", "ROBBERY-INDIVIDUAL", "ROBBERY-BUSINESS",
"KIDNAP/ABDUCTION", "INTIMIDATION")) {
result[i] <- "Violence Against Persons"
}
else if (offenses[i] == "HOMICIDE/MURDER") {
result[i] <- "Homicide/Murder"
}
else if (offenses[i] %in% c("BURGLARY/B & E", "DAMAGE PROPERTY",
"STOLEN PROPERTY OFFENSES", "VEHICLE THEFT",
"THEFT FROM VEHICLE", "THEFT FROM BUILDING",
"SHOPLIFTING", "LARCENY", "IDENTITY THEFT")) {
result[i] <- "Property Crimes"
}
else if (offenses[i] %in% c("DRUG EQUIPMENT - POSSESS", "DRUG/NARCOTICS OFFENSES")) {
result[i] <- "Drug Crimes"
}
else if (offenses[i] == "WEAPON OFFENSE") {
result[i] <- "Weapon Offense"
}
else {
result[i] <- "Other Offenses"
}
}
return(result)
}
# Pass `PRIMOFFENSE` to the above function and use output to populate a new field (OFFENSECAT) in the Incidents_NN df, then move it so that it sits next to the PRIMOFFENSE category (on which its value is based)
Incidents_NN$OFFENSECAT <-
offenseType(Incidents_NN$PRIMOFFENSE)
relocate(Incidents_NN, OFFENSECAT, .after = PRIMOFFENSE)
# Join derived data fields to incidents data frame
MASTER_LIST <-
Incidents_NN %>%
left_join(vic_totals, by = "INCIDENTNO") %>%
left_join(injured, by = "INCIDENTNO") %>%
left_join(gun_deaths, by = "INCIDENTNO") %>%
left_join(uniqInc_offConcat, by = "INCIDENTNO") %>%
left_join(uniqueInc_firearmOffenses, by = "INCIDENTNO")
# Convert NA values to 0 (i.e., wherever there were no 'individual' victims)
MASTER_LIST$VICTOTAL[is.na(MASTER_LIST$VICTOTAL)] <- 0
MASTER_LIST$TOTALINJURED[is.na(MASTER_LIST$TOTALINJURED)] <- 0
MASTER_LIST$DEATHS[is.na(MASTER_LIST$DEATHS)] <- 0
# Write out data frame as .csv file
write_csv(MASTER_LIST, paste0(output_dir, "/gunCrimeIncidents_Routput.csv"))
# Filter out SOCIETY and BUSINESS victim types (Commonwealth of VA)
vics <-
Victims[Victims$NAMETYPE == "INDIVIDUAL", ]
# Use distinct() to ensure that only one row for each unique combination of INCIDENTNO and VICNO is retained
unique_vics <- vics %>%
distinct(INCIDENTNO, VICNO, .keep_all = TRUE) # .keep_all = TRUE ensures all cols are kept
# Select only victim-relevant cols
unique_vics <-
unique_vics[, c("INCIDENTNO", "RECORDNO", "VICNO",
"VICAGE", "VICSEX", "VICRACE", "VICINJURED",
"OFFENSE")]
# Turn VICINJURED into a binary numeric field to allow for dashboard statistics/summaries
unique_vics$VICINJURED_BINARY <-
as.numeric(ifelse(unique_vics$VICINJURED == "Y", 1, 0)
)
# Create a VICDECEASED_BINARY field based on whether the victim is tied to a 'homicide/murder' offense
unique_vics$VICDECEASED_BINARY <-
as.numeric(ifelse(grepl("HOMICIDE", unique_vics$OFFENSE), 1, 0)
)
# Create a VICDECEASD character field for more intuitive presentation of information in, e.g., pop-ups
unique_vics$VICDECEASED <-
ifelse(unique_vics$VICDECEASED_BINARY == 1, "Y", "N")
# Write out data frame as .csv file
write_csv(unique_vics, paste0(output_dir, "/victimInformation_Routput.csv")
)
# Identify total unique offenders across all incidents. As with victimss, if offender appears in multiple incidents, each appearance is counted separately
distinctOffenders <-
Victims %>%
distinct(INCIDENTNO, OFFENDNO, .keep_all = TRUE)
# Filter out rows with NULL value for OFFENDLOC
offenderloc_no.null <-
distinctOffenders[!grepl("NULL", distinctOffenders$OFFENDLOC), ]
# Calculate a 'METAOFFENSE' field
## -----------
# custum function to take offense as input and group into a 'meta-offense' category
calculate_metaoffense <- function(offense) {
result <- character(length(offense)) # Initialize a character vector for the results
for (i in 1:length(offense)) {
if (offense[i] == "HOMICIDE/MURDER") {
result[i] <- "Homicide/Murder"
} else if (offense[i] == "DAMAGE PROPERTY") {
result[i] <- "Property Damage"
} else if (offense[i] %in% c("ROBBERY", "ROBBERY-BUSINESS", "ROBBERY-INDIVIDUAL")) {
result[i] <- "Robbery"
} else if (offense[i] == "INTIMIDATION") {
result[i] <- "Intimidation"
} else if (offense[i] %in% c("SIMPLE ASSAULT", "AGGRAVATED ASSAULT")) {
result[i] <- "Assault"
} else {
result[i] <- "Other"
}
}
return(result)
}
## -------------
# Apply custum meta-offense function
offenderloc_no.null$METAOFFENSE <-
calculate_metaoffense(offenderloc_no.null$OFFENSE)
# Remove offenders for whom no locational data is available (i.e., "UNKNOWN" or "HOMELESS")
offenderloc_final <-
offenderloc_no.null[!grepl("HOMELESS", offenderloc_no.null$OFFENDLOC) &
!grepl("UNKNOWN", offenderloc_no.null$OFFENDLOC), ]
# Write out data frame as .csv file
write_csv(offenderloc_final, paste0(output_dir, "/offenderLocations_Routput.csv")
)
# Write out data .csv with offenders missing locational information, for reference purposes
write_csv(offenderloc_no.null[grepl("HOMELESS", offenderloc_no.null$OFFENDLOC) |
grepl("UNKNOWN", offenderloc_no.null$OFFENDLOC), ],
paste0(output_dir, "/removedOffenders_noLocation.csv")
)