forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathExceptions-Debugging.rmd
648 lines (486 loc) · 32.4 KB
/
Exceptions-Debugging.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
---
title: Exceptions and debugging
layout: default
---
# Debugging, condition handling and defensive programming
What happens when something goes wrong? This chapter will teach you how you to fix unanticipated problems (debugging), show you how functions can communicate expected problems to their users and how you can take action based on that communication (condition handling), and teach you how to avoid some problems before they occur (defensive programming).
Debugging is the art and science of fixing unexpected problems in your code. In this section you'll learn tools and techniques help you get to the root cause of an error when you encounter it. You'll learn both general strategies for debugging, as well as RStudio and R specific tools like `traceback()` and `browser()`.
Not all problems are unexpected. When writing a function, you can often anticipate potential problems (like a file not existing, or the wrong type of input). Communicating these problems back to the user is the job of __conditions__, which include errors, warnings and messages:
* Fatal errors are rasied by `stop()` and force all execution to stop.
Errors are used when there is no way for a function to continue.
* Warnings are generated by `warning()` and are used to display potential
problems, or when some elements of a vectorised input are invalid,
for example `log(-1:2)` and `sqrt(-1:2)`.
* Messages are generated by `message()` and are used to give informative output
in a way that can easily be suppressed by the user
(with `suppressMessages()`). I often use messages when filling in
important missing arguments that have a non-trivial impact on the function.
Conditions are usually printed in bold or coloured red (depending on your R interface). You can tell them apart because errors always start with "Error" and warnings with "Warning message". By default, warnings are aggregated together and displayed in a batch when you call `warnings()`. Function authors can also communicate with their users with `print()` or `cat()`, but I don't recommend it because it's hard to capture and selectively ignore this sort of output (and it's not a condition, so you can't use any of the useful condition handling tools).
Condition handling tools, like `try()`, `tryCatch()` and `withCallingHandlers()`, allow you to take specific actions when a condition occurs. For example, you could continue fitting models even if fitting one dataset fails with an error because the model doesn't converge. R offers an exceptionally powerful condition handling system based on ideas from common lisp, but it's currently not very well documented or often used. This chapter will introduce you to the most important basics, but if you want to learn more, I recommend the following two primary sources:
* [A prototype of a condition system for R](http://homepage.stat.uiowa.edu/~luke/R/exceptions/simpcond.html) by Robert Gentleman and Luke Tierney. This is describes an early version of R's condition system. The implementation changed somewhat since this was written, but it provides a good overview of how the pieces fit together, and some motivation for the design.
* [Beyond Exception Handling: Conditions and Restarts](http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html) by Peter Seibel. This describes exception handling in LISP, but the ideas are very similar in R, it provides useful motivation and examples. I have provided an R translation of the chapter at [beyond-exception-handling.html](beyond-exception-handling.html).
Finally, you can many avoid errors in the first place by using programming "defensively". You'll spend more time upfront writing your code, but you'll say time in the long run by reducing errors and providing more informative error messages. The basic principle is to "fail fast", raising an error as soon as you know there's something wrong, rather than trying to silently struggle through. In R, this has three particular applications: checking inputs are correct, avoiding non-standard evaluation and avoiding functions that can return different types of output.
## Debugging techniques
> Finding your bug is a process of confirming the many things
> that you believe are true --- until you find one which is not
> true. \
> --- Norm Matloff
Debugging code is challenging. R provides some useful tools which we'll discuss in the next section, but if you have a good technique, you can still productively debug a problem with just `print()`. There are four key components to the debugging process:
1. __Realise that you have a bug__
If you're reading this chapter, you've probably already completed this step.
But this is a surprisingly important step: you can't fix a bug until you're
aware of it. This is one reason why automated test suites are so important
when producing high-quality code. Automated testing is unfortunately
outside the scope of this book, but you can read some notes about it at
http://adv-r.had.co.nz/Testing.html.
2. __Make it repeatable__
Once you've determined you have a bug, you need to be able to recreate it
on command. This can be the most frustrating part of debugging, but if you
can't consistently recreate the bug, then it's extremely difficult to
isolate why it's occuring, and it's impossible to confirm that you've
fixed it.
Generally, you will start with a big block of code that you know causes the
error and then slowly whittle it down to get to the smallest possible
snippet that still causes the error. If it takes a long time to generate
the bug, it's also worthwhile figuring how to make it faster. It may be
worthwhile to use a caching strategy to save incremental results (but be
careful that you don't create new bugs by doing that).
As you work on creating a minimal example, you'll also discover similar
inputs that don't cause the bug. Make a note of those: they will be
helpful when diagnosing the cause of the bug.
If you're using automated testing, this is a good time to create an
automated test case. If your existing test coverage is low, take the
opportunity to add some nearby tests to reduce your chances of creating
a new bug.
3. __Figure out where it is.__
If you're lucky, one of the tools in the following section will allow you
to quickly navigate to the line of code that's causing the bug. Usually,
however, you'll probably have to think a bit more about the problem. Two
general useful techniques are binary search and the scientific process.
To do a binary search, you repeatedly remove half of the code. The bug
will either bug appear or not; but either way you've reduced the amount of
code to look through by half. This allows you to quickly narrow down the
problem even if you have a lot of code.
If binary search doesn't work, adopt the scientific process. Generate
hypotheses, design experiments to test them and then record your results.
This does seem like more work, but a systematic approach will end up
saving you time in the long run because each step you take will move you
towards a solution. You can generate initial hypothesis by comparing the
inputs that cause the bug with those that don't.
4. __Fix it and test it.__
Once you've found the bug, you need to figure out how to fix it, and then
check that it actually worked. Again, it's very useful to have
automated tests so that you can ensure that you've actually fixed the bug,
and you haven't created any new bugs in the process.
In my experience, it doesn't matter so much exactly what your process is, just that you have one. I often end up wasting too much time trying to rely on my intuition when I would have been better off taking a systematic approach.
## Debugging tools
As well as a broad strategy to follow when debugging code, you also need some concrete tools. In this section you'll learn tools provided both by R and the RStudio IDE. Rstudio's integrated debugging support makes life easier, but it mostly exposes existing R tools in a user friendly way. I'll show you both the Rstudio way and the regular R way so that you can work with whatever environment you have. You may also want to refer to the official [Rstudio debugging documentation](http://www.rstudio.com/ide/docs/debugging/overview) - this will always reflect the functionality in the latest version of Rstudio.
There are three key debugging tools:
* Determining the sequence of calls that lead to the error with the Rstudio
error inspector or `traceback()`.
* Entering an interactive session where an error occured with Rstudio's "Rerun
on debug" or `recover()`.
* Entering an interactive session in arbitrary code with Rstudio's breakpoints
or `browser()`.
I'll explain each tool in more detail below.
Note that you shouldn't need to use these tools when writing new functions. If you find yourself using them frequently with new code, you may want to reconsider your approach: it's much easier to start simple and test interactively as you go, rather than writing something big and complicated and then trying to figure out exactly where the problem is.
### Determining the sequence of calls
The most important tool to start with is the traceback, the sequence of calls that lead up to an error. Here's a simple an example: you can see that `f()` calls `g()` calls `h()` calls `i()` which adds together a number and a string creating a error:
```{r, eval = FALSE}
f <- function(a) g(a)
g <- function(b) h(b)
h <- function(c) i(c)
i <- function(d) "a" + d
f(10)
```
When we run this code in Rstudio we see:
![Initial traceback display](traceback-hidden.png)
If you click "Show traceback" you see:
![Traceback display after clicking "show traceback"](traceback-shown.png)
If you're not using Rstudio, you can use the `traceback()` function to get the equivalent information:
```{r, eval = FALSE}
traceback()
# 4: i(c) at error.R#3
# 3: h(b) at error.R#2
# 2: g(a) at error.R#1
# 1: f(10)
```
You read the call stack from bottom to top: the initial call is `f()`, which eventually calls `i()` which triggers the error. If you're calling code that you `source()`d into R, the traceback will also display the location of the function, in the form `filename.r#linenumber`. These are clickable in Rstudio, and will take you to the corresponding line of code in the editor.
Sometimes this is enough information to let you track down the error and fix it. However, it's usually not enough: it shows you where the error occured, but not why. The next useful tool is the interactive debugger, which allows you to pause execution of a function and interactively explore its state.
### Browsing on error
The easiest way to enter the interactive debugger is through RStudio's "Rerun with debug" tool. This reruns the command that created the error, pausing execution where the error occured. You're now in an interactive state just like the regular R console, but you're inside the function, and can interact with any object defined their. You'll see the objects in the current environment in the Environment pane, the traceback in a new traceback pane and you can run arbitrary R code in the console to figure out what went wrong.
As well as any regular R function, there are a few special commands you can use in debug model. You can access them either with the Rstudio toolbar (![](debug-toolbar.png)) or with the keyboard:
* Next, `n`: executes the next step in the function. Be careful if you have a
variable named `n`; to print it you'll need to do `print(n)`.
* Continue, `c`: leaves interactive debugging and continues regular execution
of the function. This is useful if you've fixed the bad state and want to
check that the function proceeds correctly.
* Stop, `Q`: stops debugging, terminates the function and return to the global
workspace.
There are two other useful commands:
* Enter: repeats the previous command. I find this too easy to activate while
debugging, so I turn it off using `options(browserNLdisabled = TRUE)`.
* `where`: prints stack trace of active calls (the interactive equivalent of
`traceback`)
To enter this style of debugging outside of Rstudio, you can use the `error` option. This specifies a function to run when an error occurs. The function most similar to Rstudio's debug is `browser()`: this will start an interactive console in the environment where the error occured. Use `options(error = browser)` to turn it on, re-run the previous command, then use `options(error = NULL)` to return to the default error behaviour. You could automate this with the `browseOnce()` function as defined below:
```{r, eval = FALSE}
browseOnce <- function() {
old <- getOption("error")
function() {
options(error = old)
browser()
}
}
options(error = browseOnce())
f <- function() stop("!")
# Enters browser
f()
# Runs normally
f()
```
There are two other useful functions that you can use with the `error` option:
* `recover` is this is a step up from `browser`, as it allows you to enter the
environment of any of the calls in the call stack. This is useful because
often the cause of the error is a number of calls back.
* `dump.frames` is an equivalent to `recover` for non-interactive code. It
creates a `last.dump.rda` file in the current working directory that you
can load into an interactive later R session using `debugger()`, and
recreates the error as if you had called `recover`. This allows interactive
debugging of batch code.
```{r, eval = FALSE}
# In batch R process
dump_and_quit <- function() {
# Save debugging info to file last.dump.rda
dump.frames(to.file = TRUE)
# Quit R with error status
q(status = 1)
}
options(error = dump_and_quit)
# Then in an interactive R session:
load("last.dump.rda")
debugger()
```
Finally, to reset error behaviour to the default, use `options(error = NULL)`. Then errors will print a message and abort function execution.
### Browsing arbitrary code
As well as entering an interactive console on error, you can enter it at an arbitrary location in your code by using either an Rstudio breakpoint or `browser()`. You can set a breakpoint in Rstudio by clicking to the left of the line number, or pressing `Shift + F9`, or equivalently, add `browser()` when you want execution to pause. Breakpoints behave similarly to `browser()` but they are easier to set (one click instead of nine key presses), and you don't run the risk of accidentally including a `browser()` statement in your source code. There are few places that breakpoints are not equivalent to `browser()`: read [breakpoint troubleshooting](http://www.rstudio.com/ide/docs/debugging/breakpoint-troubleshooting) for more details. One downside of breakpoints is that you can't set them conditionally, whereas you can always put `browser()` inside an `if` statement.
As well as adding `browser()` yourself, there are two functions that will add it to code:
* `debug()` inserts a browser statement in the first line of the specified
function. `undebug()` will remove it, or you can use `debugonce()` to insert
to browse only on the next run.
* `utils::setBreakpoint()` works similarly, but instead of taking a function
name, it takes a file name and line number and finds the appropriate function
for you.
These two functions are both special cases of `trace()`, which inserts arbitrary code at any position in an existing function. `trace()` is occasionally useful when you're debugging code that you don't have the source for. To remove tracing from a function, use `untrace()`. Also note that you can only perform one trace per function.
### The call stack: `traceback(), `where` and `recover()`.
Unfortunately the call stacks printed by `traceback()`, `browser()` + `where` and `recover()` are not consistent. Using the simple nested set of calls below, the call backs look like this table. Note that numbering is different between `traceback()` and `where`, and `recover()` displays calls in the opposite order, and omits the call to `stop()`.
`traceback()` `where` `recover()`
---------------- ----------------------- ------------
4: stop("Error") where 1: stop("Error") 1: f()
3: h(x) where 2: h(x) 2: g(x)
2: g(x) where 3: g(x) 3: h(x)
1: f() where 4: f()
Rstudio displays calls in the same order as `traceback()` but omits the numbers.
```{r, eval = FALSE, echo = FALSE}
f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) stop("Error")
f(); traceback()
options(error = browser); f()
options(error = recover); f()
options(error = NULL)
```
### Other types of failure
There are other ways for a function to fail apart from throwing an error or returning an incorrect result.
* A function may generate an unexpected warning. The easiest way to track down
warnings is to convert them into errors with `options(warn = 2)`. The you can
use the regular debugging tools. When you do this you'll see some extra calls
in the call stack, like to `doWithOneRestart()`, `withOneRestart()`,
`withRestarts()` and `.signalSimpleWarning()`. Ignore these: they are
internal functions used to turn warnings into errors.
* A function may generate an unexpected message. There's no built in tool to
help solve like for warnings, but it's easy to create one (you'll learn how
this function works in the next section):
```{r, error = TRUE}
message2error <- function(code) {
withCallingHandlers(code, message = function(e) stop(e))
}
f <- function() g()
g <- function() message("Hi!")
g()
message2error(g())
traceback()
```
As with warnings, you'll need to ignore some of the calls on the tracback
(i.e. the first two and the last 7).
* A function might never return. This is particularly hard to debug
automatically, but sometimes terminating the function and looking at the
call stack is informative. Otherwise, use the basic debugging strategies
described above.
* The worst scenario is that your code might crash R completely, leaving you
no way to interactively debug your code. This typically indicates a bug with
underlying C code, and the tools are much harder to use. Sometimes an
interative debugger, like `gdb`, can be useful, but describing how to use
one is beyond the scope of this book. If it's in base R code, posting a
reproducible example to R-help is a good idea. If it's in a package, contact
the maintainer. If it's your own C or C++ code, you'll need to use
numerous `print()` statements to narrow down the location of the bug, and
then you'll need to use many more print statements to figure out which
data structure doesn't have the properties that you expect.
## Error handling
Unexpected errors require interactive debugging to figure out what went wrong. Some errors, however, are expected, and you want to handle them automatically. In R, expected errors crop up most frequently when you're fitting many models to different datasets or bootstrap replicates. Sometimes the model might fail to fit and throw an error, but you don't want to stop everything; instead you want to fit as many models as possible and then perform diagnostics after the fact. In R, there are two tools for handling exceptions programmatically: `try()` (simple) and `tryCatch()` (complex).
### Basic error handling with try()
`try()` allows execution to continue even after an error has occured. For example, normally if you run a function that throws an error, it terminates immediately and doesn't return a value:
```{r, error = TRUE}
f1 <- function(x) {
log(x)
10
}
f1("x")
```
However, if you wrap the statement that creates the error in `try()`, the error message will be printed but execution will continue:
```{r}
f2 <- function(x) {
try(log(x))
10
}
f2()
```
You can suppress the message with `try(..., silent = TRUE)`. To pass larger blocks of code to `try()`, wrap them in `{}`:
```{r}
try({
a <- 1
b <- "x"
a + b
})
a
b
```
You can also capture the output of the `try()` function. If successful, it will be the last result evaluated in the block (just like a function); if unsuccessful it will be an (invisible) object of class "try-error":
```{r}
success <- try(1 + 2)
failure <- try("a" + "b")
str(success)
str(failure)
```
`try()` is particularly useful when you're applying a function to multiple elements in a list:
```{r, error = TRUE}
elements <- list(1:10, c(-1, 10), c(T, F), letters)
results <- lapply(elements, log)
results <- lapply(elements, function(x) try(log(x)))
```
There isn't a built-in function for testing for this class, so we'll define one. Then you can easily find the locations of errors with `sapply()` (as discussed in the Functions chapter), and extract the successes or look at the inputs that lead to failures.
```{r}
is.error <- function(x) inherits(x, "try-error")
succeeded <- !sapply(results, is.error)
# look at successful results
str(results[succeeded])
# look at inputs that failed
str(elements[!succeeded])
```
Another useful `try()` idiom is setting a default value if an expression fails. Simply assign the default value outside the try block, and then run the risky code:
```{r, eval = FALSE}
default <- NULL
try(default <- read.csv("possibly-bad-input.csv"), silent = TRUE)
```
The function operators chapter discusses the `failwith()` function operator which makes this pattern particularly useful.
### Advanced error handling with `tryCatch()`
`tryCatch()` is more powerful than `try()`, because as well as dealing with errors, it also allows you to take specific actions for messages, warnings and interrupts. You've seen messages (made by `message()`) and warnings (made by `warn()`) before, but interrupts are new. They can't be generated directly by the programmer, but are raised when the user attempts to terminate execution by by presses Ctrl + Break, Escape, or Ctrl + C (depending on the platform). `tryCatch()` also provides the finally hook to run code regardless of whether or not an error occured.
The `tryCatch()` has three arguments:
* `expr`: the code to run.
* `...`: a set of named functions. If an condition is raised, `tryCatch` will
call the first handler whose name matches one of the classes of the condition.
The only useful names for built-in conditions are `error`, `warning`,
`message`, `interrupt` and `condition`. Handler functions are passed a single
object, representing the condition that was raised.
* `finally`: code to run regardless of whether `expr` succeeds or fails. This
is useful for clean up, as described below. All handlers have been turned
off by the time the `finally` code is run, so errors will propagate as
usual. (Note that this is functionally equivalent to using `on.exit()`
but it can wrap smaller chunks of code than an entire function).
The following examples illustrate the basic properties of `tryCatch`:
```{r, error = TRUE}
# If multiple handlers match, the first is used
tryCatch(stop("error"),
error = function(c) "a",
error = function(c) "b"
)
# If multiple signals are nested, the the most internal is used first.
tryCatch(
tryCatch(stop("error"), error = function(c) "a"),
error = function(c) "b"
)
# Uncaught conditions propagate outwards.
tryCatch(
tryCatch(stop("error")),
error = function(c) "b"
)
# No matter what happens, finally is run:
tryCatch(stop("error"),
finally = print("Done."))
tryCatch(a <- 1,
finally = print("Done."))
# Any errors that occur in the finally block are handled normally
a <- 1
tryCatch(a <- 2, finally = stop("Error!"))
```
Catching interrupts can be useful if you want to take special action when the user tries to abort running code.
```{r, eval = FALSE}
# Don't let the user interrupt the code
i <- 1
while(i < 3) {
tryCatch({
Sys.sleep(0.5)
message("Try to escape")
}, interrupt = function(x) {
message("Try again!")
i <<- i + 1
})
}
```
A handler function can do anything, but typically it will either return a value, or pass the condition along. For example, we can write a simple version of `try` using `tryCatch`:
```{r}
try2 <- function(code, silent = FALSE) {
tryCatch(code, error = function(c) {
msg <- conditionMessage(c)
if (!silent) message("Error: ", c)
invisible(structure(conditionMessage(c), class = "try-error"))
})
}
try2(1)
try2(stop("Hi"))
try2(stop("Hi"), silent = TRUE)
```
The real version of `try` is considerably more complicated to make the error message look more like what you'd see if `tryCatch()` wasn't used.
### Advanced error handling
One of the downsides of most functions in R is that they just call `stop()` with a string. That means if you want to figure out if a particular error occured, you have to look at the text of the error message. This is error prone, not only because the text of the error might change over time, but also because many error messages are translated, so the message might be completely different to what you expect.
However, R has a little known and little used feature that alleviates this problem. Conditions are S3 classes, so you can define your own for specific types of errors. Each function, `stop()`, `warning()` and `message()` can be given either a list of strings, or a custom S3 condition object. Custom condition objects are not used very often, but are very useful because they make it possible for the user to respond to different errors in different ways. For example, "expected" (like a model failing to converge for some input datasets) can be silently ignored, while unexpected errors (like no disk space available) can be propagated to the user.
R doesn't come with a built-in constructer function for conditions, but we can easily add one. Conditions must contain a `message` and `call` components, but can contain anything else that is useful for When creating a new condition, it should always inherit from `condition` and one of `error`, `warning` and `message`.
```{r}
condition <- function(subclass, message, call = sys.call(-1), ...) {
structure(
class = c(subclass, "condition"),
list(message = message, call = call),
...
)
}
is.condition <- function(x) inheritis(x, "condition")
```
You can signal an arbitrary condition with `signalCondition()`, but nothing will happen unless you've instantiated a custom signal handler. Instead, use `stop()`, `warning()` or `message()` as appropriate to trigger the usual handling. (Note that R won't complain if the class of your condition doesn't match the function, but you should avoid this in real code).
```{r, error = TRUE}
c <- condition(c("my_error", "error"), message = "This is an error")
signalCondition(c)
stop(c)
warning(c)
message(c)
```
Note that when using `tryCatch()` with multiple handlers and custom classes, the first handler to match any class in the hierarchy is called, not necessarily the best match. For this reason, you need to make sure to put the most specific handlers first:
```{r}
tryCatch(stop(c),
error = function(c) "error",
my_error = function(c) "my_error"
)
tryCatch(stop(a),
my_error = function(c) "my_error",
error = function(c) "error"
)
```
There is one other way to capture conditions: `withCallingHandlers()`. There are two main differences between `tryCatch()` and `withCallingHandlers()`:
* The default behaviour of `tryCatch()` handlers is handle the error and return
a value, where the return value of `withCallingHandlers()` handlers is
ignored by default:
```{r, error = TRUE}
f <- function() stop("!")
tryCatch(f(), error = function(e) 1)
withCallingHandlers(f(), error = function(e) 1)
```
* The handlers in `withCallingHandlers()` are called in the context of the
call that generated the condition; the handlers in `tryCatch()` are called
in the context of `tryCatch()`:
```{r, eval = FALSE}
f <- function() g()
g <- function() h()
h <- function() stop("!")
tryCatch(f(), error = function(e) print(sys.calls()))
# [[1]] tryCatch(f(), error = function(e) print(sys.calls()))
# [[2]] tryCatchList(expr, classes, parentenv, handlers)
# [[3]] tryCatchOne(expr, names, parentenv, handlers[[1L]])
# [[4]] value[[3L]](cond)
withCallingHandlers(f(), error = function(e) print(sys.calls()))
# [[1]] withCallingHandlers(f(), error = function(e) print(sys.calls()))
# [[2]] f()
# [[3]] g()
# [[4]] h()
# [[5]] stop("!")
# [[6]] .handleSimpleError(function (e) print(sys.calls()), "!", quote(h()))
# [[7]] h(simpleError(msg, call))
```
This also affects the order in which `on.exit()` is called.
These subtle differences are rarely useful, except when you're trying to capture exactly what went wrong and pass it on to another function. For most purposes, you should never need to use `withCallingHandlers()`
### Exercises
* Compare the following two implementations of `message2error()`. What is the
main advantage of `withCallingHandlers()` in this scenario? (Hint: look
carefully at the traceback.)
```{r}
message2error <- function(code) {
withCallingHandlers(code, message = function(e) stop(e))
}
message2error <- function(code) {
tryCatch(code, message = function(e) stop(e))
}
```
## Defensive programming
Defensive programming is the art of making code fail in a well-defined manner even when something unexpected occurs. A general principle for errors is to "fail fast" - as soon as you figure out something as wrong, and your inputs are not as expected, you should raise an error. This is more work for you as the function author, but will make it easier for the user to debug because they get errors early on, not after unexpected input has passed through several functions and caused a problem.
This principle has three main applications in R:
* Be strict about what you accept. If your function is not vectorised in its
inputs, but uses functions that are, make sure to check that the inputs are
scalars. You can use `stopifnot()`, the
[assertthat](https://github.com/hadley/assertthat) package or simple `if`
statements and `stop()`.
* Avoid functions that use special evaluation (e.g. `subset`, `with`,
`transform`). These functions make assumptions to reduce typing, but when
those assumptions are not met, they will often fail with uninformative error
messages.
* Avoid functions that return different types of output depending on their
input. The two biggest offenders are `[` and `sapply`. Whenever using
subsetting a data frame in a function, you should always use `drop = TRUE`
otherwise you will accidentally convert 1-column data frames into vectors.
Similarly, never use `sapply()` inside a function: always use the stricter
`vapply()` which will throw an error if the inputs are incorrect types and
return the correct type of output even if for zero-length inputs.
There is a tension between interactive analysis and programming. When you a doing an analysis, you want R to do what you mean, and if it guesses wrong, then you'll discover it right away and can fix it. When you're programming, you want robust functions with no magic that give you errors as quickly as possible. It's useful to keep this tension in mind when writing functions. If you're making a function to faciliate interactive data analysis, it's free to guess what the analyst wants or recover from minor misspecifications; but if you're making a function to program with, it should be quite strict with its inputs.
### Exercises
* The goal of the `col_means()` function defined below is to compute the means
of all numeric columns in a data frame.
```{r}
col_means <- function(df) {
numeric <- sapply(df, is.numeric)
numeric_cols <- df[, numeric]
data.frame(lapply(numeric_cols, mean))
}
```
However, the function as written, is not robust to unusual inputs. Look at
the following results, decide which ones are incorrect, and modify `col_means`
to be more robust. (Hint: there are two function calls in `col_means` that
are particularly prone to problems.)
```{r, eval = FALSE}
col_means(mtcars)
col_means(mtcars[, 0])
col_means(mtcars[0, ])
col_means(mtcars[, "mpg", drop = F])
col_means(1:10)
col_means(as.matrix(mtcars))
col_means(as.list(mtcars))
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)
```
* The following function "lags" a vector, returning a version of `x` that is `n`
values behind the original. Improve the function so that (1) it returns a
useful error message if `n` is not a vector, (2) it has reasonable behaviour
when `n` is 0 or longer than `x`.
```{r}
lag <- function(x, n = 1L) {
xlen <- length(x)
c(rep(NA, n), x[seq_len(xlen - n)])
}
```