Skip to content

Commit

Permalink
v1.7.7; JStatSoft paper has been published
Browse files Browse the repository at this point in the history
  • Loading branch information
gagolews committed Jul 2, 2022
1 parent 1eb2194 commit 8580977
Show file tree
Hide file tree
Showing 483 changed files with 4,981 additions and 9,114 deletions.
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,5 @@ kate-swp$
^Makefile
^stringi_.*\.tar\.gz$
^CODE_OF_CONDUCT
TODO
^CITATION\.cff$
2 changes: 1 addition & 1 deletion .github/workflows/r-icu-bundle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ jobs:
sudo make tinytest
- name: Check stringi
run: |
sudo make check
sudo make check-cran
6 changes: 0 additions & 6 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,9 @@
Code of Conduct
===============

Come in and make yourself at home!

This is a project conveyed in the authors' free time. It is their little
act of charity to make this world an (even) better place.
It will most likely pass unnoticed, but if you happen to find it useful,
informative, amusing, or stimulating, we're happy for you.

Please be civilised, well-mannered, and courteous. Primum non nocere.
Let us all strive to be better versions of ourselves, exercise forgiveness
and generosity, and assume good faith in others.

We are looking forward to your contributions and ideas.
9 changes: 6 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: stringi
Version: 1.7.6.9002
Date: 2022-06-28
Title: Character String Processing Facilities
Version: 1.7.7
Date: 2022-07-02
Title: Fast and Portable Character String Processing Facilities
Description: A collection of character string/text/natural language
processing tools for pattern searching (e.g., with 'Java'-like regular
expressions or the 'Unicode' collation algorithm), random string generation,
Expand All @@ -10,6 +10,9 @@ Description: A collection of character string/text/natural language
and many more. They are fast, consistent, convenient, and -
thanks to 'ICU' (International Components for Unicode) -
portable across all locales and platforms.
Documentation about 'stringi' is provided
via its website at <https://stringi.gagolewski.com/> and
the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
URL:
https://stringi.gagolewski.com/,
https://github.com/gagolews/stringi,
Expand Down
30 changes: 15 additions & 15 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ install.packages("stringi")

However, due to the overwhelming complexity of the ICU4C library,
upon which *stringi* is based, and the colourful diversity of operating systems,
their flavors, and particular setups, some users may still experience
their flavours, and particular setups, some users may still experience
a few issues that hopefully can be resolved with the help of this short manual.

Also, some additional build tweaks are possible in case we require a more
Also, some additional build tweaks are possible if we require a more
customised installation.


Expand All @@ -32,25 +32,25 @@ If we install the package from sources and either:
the `libicu-devel` rpm on Fedora/CentOS/OpenSUSE,
`libicu-dev` on Ubuntu/Debian, etc.),

* `pkg-config` is fails to find appropriate build settings
* `pkg-config` fails to find appropriate build settings
for ICU-based projects, or

* `R CMD INSTALL` is called with the `--configure-args='--disable-pkg-config'`
argument or environment variable `STRINGI_DISABLE_PKG_CONFIG` is
argument, or environment variable `STRINGI_DISABLE_PKG_CONFIG` is
set to non-zero or
`install.packages("stringi", configure.args="--disable-pkg-config")`
is executed,

then ICU will be built together with stringi.
A custom subset of ICU4C 69.1 is shipped with the package.
We also include ICU4C 55.1 that can be used as a fallback version
We also include ICU4C 55.1 which can be used as a fallback version
(e.g., on older Solaris boxes).


> To get the most out of stringi, you are strongly encouraged to rely on our
> ICU4C package bundle. This ensures maximum portability across all platforms
> (Windows and macOS users by default fetch the pre-compiled binaries
> from CRAN built exactly this way).
> from CRAN built precisely this way).



Expand All @@ -59,7 +59,7 @@ We also include ICU4C 55.1 that can be used as a fallback version
Note that if you choose to use our ICU4C bundle, then -- by default -- the
ICU data library will be downloaded from one of our mirror servers.
However, if you have already downloaded a version of `icudt*.zip` suitable
for your platform (big/little endian), you may wish to install the
for your platform (big/little-endian), you may wish to install the
package by calling:

```r
Expand Down Expand Up @@ -115,8 +115,8 @@ amongst others, `<R_inst_dir>/etc/Makeconf` (e.g., are you using
<https://cran.r-project.org/doc/manuals/r-release/R-admin.html>
for more details.

There is an option of using the fallback version of ICU4C 55.1
which however requires the support of the `long long` type in a few functions,
There is an option of using the fallback version of ICU4C 55.1.
However, it requires the support of the `long long` type in a few functions,
(this is not part of the C++98 standard; works on Solaris, though). Try:

```r
Expand Down Expand Up @@ -155,23 +155,23 @@ Some influential environment variables:
path relative to `<package source dir>/src`; defaults to `icuXX/data`.

* `PKG_CONFIG_PATH`: An optional list of directories to search for
`pkg-config`s `.pc` files.
`pkg-config`'s `.pc` files.

* `R_HOME`: Override the R directory, e.g.,
`/usr/lib64/R`. Note that `$R_HOME/bin/R` point to the R executable.

* `CAT`: The `cat` command used to generate the list of source files to compile.

* `PKG_CONFIG`:The `pkg-config` command used to fetch the necessary compiler
flags to link to and existing `libicu` installation.
flags to link to the existing `libicu` installation.

* `STRINGI_DISABLE_CXX11`: Disable C++11,
* `STRINGI_DISABLE_CXX11`: Disable C++11;
see also `--disable-cxx11`.

* `STRINGI_DISABLE_PKG_CONFIG`: Compile ICU from sources,
* `STRINGI_DISABLE_PKG_CONFIG`: Compile ICU from sources;
see also `--disable-pkg-config`.

* `STRINGI_DISABLE_ICU_BUNDLE`: Enforce system ICU,
* `STRINGI_DISABLE_ICU_BUNDLE`: Enforce system ICU;
see also `--disable-icu-bundle`.

* `STRINGI_CFLAGS`: see `--with-extra-cflags`.
Expand All @@ -191,7 +191,7 @@ Some influential environment variables:

We expect that with a correctly configured C++11 compiler and properly
installed system ICU4C distribution, you should face no problems
with installing the package, especially if you use our ICU4C bundle and you
installing the package, especially if you use our ICU4C bundle and
have a working internet access.

If you do not manage to set up a successful stringi build, do not
Expand Down
34 changes: 4 additions & 30 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,47 +1,21 @@
# What Is New in *stringi*


## 1.7.6.9xxx (under development)
## 1.7.7 (2022-07-02)

* [DOCUMENTATION] ...Paper on *stringi* has been published in
the *Journal of Statistical Software*....
* [DOCUMENTATION] Paper on *stringi* has been published in
the *Journal of Statistical Software*, see <doi:10.18637/jss.v103.i02>.

* [BUGFIX] #473, #397: Fixed buffer overflow in `stri_dup`.
`stri_dup`, `stri_paste`, ... fail more graciously on attempts to
generate strings of length >= 2^31 each.

* [BUGFIX] #480: Using `Rf_isNull` instead of isNull`.
* [BUILD TIME] #480: Using `Rf_isNull` instead of `isNull`.

* [DOCUMENTATION] #462: That the `numeric=TRUE` collator
does not handle negative numbers correctly is now mentioned in the manual.


... checkRd: (-1) stri_trans_nf.Rd:74: Escaped LaTeX specials: \#
... checkRd: (-1) stri_trans_nf.Rd:92: Escaped LaTeX specials: \#
... icu69/common/cstring.h:43:70: warning: 'char* strncpy(char*, const char*, size_t)' output may be truncated copying 156 bytes from a string of length 156 [-Wstringop-truncation]



* [NEW FEATURE] TODO.... #469: `stri_datetime_parse` .. new argument -
`default_time`
a Calendar set on input to the date and time to be used for missing values in the date/time string being parsed

* [BUGFIX] TODO.... #469: `stri_datetime_parse` did not reset the `Calendar` object
when parsing multiple dates.

* [NEW FEATURE] TODO... #476 U_USING_DEFAULT_ERROR on unknown locales

* [NEW FEATURE] TODO... #81 number format

* [NEW FEATURE] TODO... #477 sprintf localised number format

* [NEW FEATURE] TODO... #471: split into overlapping or non-overlapping chunks,
possibly of different lengths





## 1.7.6 (2021-11-29)

* [BUILD TIME] #463: Added loongarch support in ICU's double conversion
Expand Down
14 changes: 9 additions & 5 deletions R/stringi-package.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# kate: default-dictionary en_US

## This file is part of the 'stringi' package for R.
## Copyright (c) 2013-2021, Marek Gagolewski <https://www.gagolewski.com>
## Copyright (c) 2013-2022, Marek Gagolewski <https://www.gagolewski.com>
## All rights reserved.
##
## Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -31,7 +31,7 @@
## EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


#' @title THE String Processing Package
#' @title Fast and Portable Character String Processing in R
#'
#' @description
#' \pkg{stringi} is THE R package for fast, correct, consistent,
Expand Down Expand Up @@ -115,7 +115,7 @@
#' i.e., conversion to lower, UPPER, or Title Case,
#' \code{\link{stri_trans_nfc}} (among others) for Unicode normalization,
#' \code{\link{stri_trans_char}} for translating individual code points,
#' and \code{\link{stri_trans_general}} for other universal yet powerful
#' and \code{\link{stri_trans_general}} for other universal
#' text transforms, including transliteration.
#'
#' \item \code{\link{stri_cmp}}, \code{\link{\%s<\%}}, \code{\link{stri_order}},
Expand Down Expand Up @@ -150,9 +150,13 @@
#' ICU4C was developed by IBM, Unicode, Inc., and others.
#'
#' @references
#' \emph{\pkg{stringi} Package homepage},
#' \emph{\pkg{stringi} Package Homepage},
#' \url{https://stringi.gagolewski.com/}
#'
#' Gagolewski M., \pkg{stringi}: Fast and portable character string
#' processing in R, \emph{Journal of Statistical Software} 103(2), 2022, 1-59,
#' doi:\url{https://dx.doi.org/10.18637/jss.v103.i02}
#'
#' \emph{ICU -- International Components for Unicode},
#' \url{https://icu.unicode.org/}
#'
Expand All @@ -162,7 +166,7 @@
#' \emph{The Unicode Consortium},
#' \url{https://home.unicode.org/}
#'
#' \emph{UTF-8, a transformation format of ISO 10646} -- RFC 3629,
#' \emph{UTF-8, A Transformation Format of ISO 10646} -- RFC 3629,
#' \url{https://tools.ietf.org/html/rfc3629}
#'
#' @family stringi_general_topics
Expand Down
6 changes: 3 additions & 3 deletions R/trans_normalization.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# kate: default-dictionary en_US

## This file is part of the 'stringi' package for R.
## Copyright (c) 2013-2021, Marek Gagolewski <https://www.gagolewski.com>
## Copyright (c) 2013-2022, Marek Gagolewski <https://www.gagolewski.com>
## All rights reserved.
##
## Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -63,7 +63,7 @@
#' character sequences in document formats on the Web.
#' Thus, you will rather not use these functions in typical
#' string processing activities. Most often you may assume
#' that a string is in NFC, see RFC\#5198.
#' that a string is in NFC, see RFC5198.
#'
#' As usual in \pkg{stringi},
#' if the input character vector is in the native encoding,
Expand All @@ -84,7 +84,7 @@
#' \url{https://unicode.org/reports/tr15/}
#'
#' \emph{Unicode Format for Network Interchange}
#' -- RFC\#5198, \url{https://tools.ietf.org/rfc/rfc5198.txt}
#' -- RFC5198, \url{https://tools.ietf.org/rfc/rfc5198.txt}
#'
#' \emph{Character Model for the World Wide Web 1.0: Normalization}
#' -- W3C Working Draft, \url{https://www.w3.org/TR/charmod-norm/}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# [**stringi**](https://stringi.gagolewski.com/)

### THE String Processing Package for *R*
### Fast and Portable Character String Processing in R (with the Unicode ICU)

![Build Status](https://github.com/gagolews/stringi/workflows/stringi%20for%20R/badge.svg)
![RStudio CRAN mirror downloads](http://cranlogs.r-pkg.org/badges/grand-total/stringi)
Expand Down
26 changes: 13 additions & 13 deletions configure
Original file line number Diff line number Diff line change
Expand Up @@ -1376,18 +1376,18 @@ Optional Packages:
--with-PACKAGE[=ARG] use PACKAGE [ARG=yes]
--without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no)
--with-extra-cflags=FLAGS
Additional C compiler flags, see also the
Additional C compiler flags; see also the
STRINGI_CFLAGS environment variable
--with-extra-cppflags=FLAGS
Additional C/C++ preprocessor flags, see also the
Additional C/C++ preprocessor flags; see also the
STRINGI_CPPFLAGS environment variable
--with-extra-cxxflags=FLAGS
Additional C++ compiler flags, see also the
Additional C++ compiler flags; see also the
STRINGI_CXXFLAGS environment variable
--with-extra-ldflags=FLAGS
Additional linker flags, see also the
Additional linker flags; see also the
STRINGI_LDFLAGS environment variable
--with-extra-libs=FLAGS Additional libraries to link against, see also the
--with-extra-libs=FLAGS Additional libraries to link against; see also the
STRINGI_LIBS environment variable
Some influential environment variables:
Expand All @@ -1413,22 +1413,22 @@ Some influential environment variables:
LDFLAGS Purposely ignored.
LIBS Purposely ignored.
STRINGI_DISABLE_CXX11
Disable C++11, see also --disable-cxx11.
Disable C++11; see also --disable-cxx11.
STRINGI_DISABLE_ICU_BUNDLE
Enforce system ICU, see also --disable-icu-bundle.
Enforce system ICU; see also --disable-icu-bundle.
STRINGI_DISABLE_PKG_CONFIG
Enforce our ICU source bundle, see also --disable-pkg-config.
Enforce our ICU source bundle; see also --disable-pkg-config.
STRINGI_CFLAGS
Additional C compiler flags, see also --with-extra-cflags.
Additional C compiler flags; see also --with-extra-cflags.
STRINGI_CPPFLAGS
Additional C/C++ preprocessor flags, see also
Additional C/C++ preprocessor flags; see also
--with-extra-cppflags.
STRINGI_CXXFLAGS
Additional C++ compiler flags, see also --with-extra-cxxflags.
Additional C++ compiler flags; see also --with-extra-cxxflags.
STRINGI_LDFLAGS
Additional linker flags, see also --with-extra-ldflags.
Additional linker flags; see also --with-extra-ldflags.
STRINGI_LIBS
Additional libraries to link against, see also
Additional libraries to link against; see also
--with-extra-libs.
Use these variables to override the choices made by `configure' or to help
Expand Down
Loading

0 comments on commit 8580977

Please sign in to comment.