Skip to content

Commit

Permalink
Merge pull request #720 from sfu-db/docs/change_clean_doc_intro
Browse files Browse the repository at this point in the history
Change the introduction part of clean documentation
  • Loading branch information
jinglinpeng authored Oct 26, 2021
2 parents 7e2a5a0 + 862b447 commit 42bf188
Showing 1 changed file with 135 additions and 2 deletions.
137 changes: 135 additions & 2 deletions docs/source/user_guide/clean/introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,141 @@
" * [Text](clean_text.ipynb)\n",
" * [URLs](clean_url.ipynb)\n",
" * [US Street Addresses](clean_address.ipynb)\n",
" * [Whole DataFrame](clean_df.ipynb)\n"
" * [Whole DataFrame](clean_df.ipynb)\n",
" * [Australian Business Numbers](clean_au_abn.ipynb)\n",
" * [Australian Company Numbers](clean_au_acn.ipynb)\n",
" * [Australian Tax File Numbers](clean_au_tfn.ipynb)\n",
" * [Belgian VAT Numbers](clean_be_vat.ipynb)\n",
" * [Bulgarian National Identification Numbers](clean_bg_egn.ipynb)\n",
" * [Bulgarian VAT Numbers](clean_bg_vat.ipynb)\n",
" * [Belarusian UNP Numbers](clean_by_unp.ipynb)\n",
" * [Canadian Business Numbers](clean_ca_bn.ipynb)\n",
" * [Swiss Einzahlungsschein MIT Referenznummers](clean_ch_esr.ipynb)\n",
" * [Swiss Social Security Numbers](clean_ch_ssn.ipynb)\n",
" * [Swiss Business Identifiers](clean_ch_uid.ipynb)\n",
" * [Swiss VAT Numbers](clean_ch_vat.ipynb)\n",
" * [Chile RUT/RUN Numbers](clean_cl_rut.ipynb)\n",
" * [Chinese Resident Identity Card Numbers](clean_cn_ric.ipynb)\n",
" * [Colombian Identity Codes](clean_co_nit.ipynb)\n",
" * [Costa Rica Physical Person ID Numbers](clean_cr_cpf.ipynb)\n",
" * [Costa Rica Tax Numbers](clean_cr_cpj.ipynb)\n",
" * [Costa Rica Foreigners ID Numbers](clean_cr_cr.ipynb)\n",
" * [Cuban Identity Card Numbers](clean_cu_ni.ipynb)\n",
" * [Cypriot VAT Numbers](clean_cy_vat.ipynb)\n",
" * [Czech VAT Numbers](clean_cz_dic.ipynb)\n",
" * [Czech Birth Numbers](clean_cz_rc.ipynb)\n",
" * [German Company Registry IDs](clean_de_handelsregisternummer.ipynb)\n",
" * [German Personal Tax Numbers](clean_de_idnr.ipynb)\n",
" * [German Tax Numbers](clean_de_stnr.ipynb)\n",
" * [German VAT Numbers](clean_de_vat.ipynb)\n",
" * [German Securities Identification Codes](clean_de_wkn.ipynb)\n",
" * [Danish Citizen Numbers](clean_dk_cpr.ipynb)\n",
" * [Danish CVR Numbers](clean_dk_cvr.ipynb)\n",
" * [Dominican Republic National Identifiers](clean_do_cedula.ipynb)\n",
" * [Dominican Republic Invoice Numbers](clean_do_ncf.ipynb)\n",
" * [Dominican Republic Tax Registrations](clean_do_rnc.ipynb)\n",
" * [Ecuadorian Personal Identity Codes](clean_ec_ci.ipynb)\n",
" * [Ecuadorian Company Tax Numbers](clean_ec_ruc.ipynb)\n",
" * [Estonian Personcal ID Numbers](clean_ee_ik.ipynb)\n",
" * [Estonian KMKR Numbers](clean_ee_kmkr.ipynb)\n",
" * [Spanish Bank Account Codes](clean_es_ccc.ipynb)\n",
" * [Spanish Fiscal Numbers](clean_es_cif.ipynb)\n",
" * [Spanish Meter Point Numbers](clean_es_cups.ipynb)\n",
" * [Spanish Personal Identity Codes](clean_es_dni.ipynb)\n",
" * [Spanish IBANs](clean_es_iban.ipynb)\n",
" * [Spanish Foreigner Identity Codes](clean_es_nie.ipynb)\n",
" * [Spanish NIF Numbers](clean_es_nif.ipynb)\n",
" * [Classification For Businesses In The European Union](clean_eu_nace.ipynb)\n",
" * [European VAT Numbers](clean_eu_vat.ipynb)\n",
" * [Finnish ALV Numbers](clean_fi_alv.ipynb)\n",
" * [Finnish Personal Identity Codes](clean_fi_hetu.ipynb)\n",
" * [Finnish Business Identifiers](clean_fi_ytunnus.ipynb)\n",
" * [French Tax Identification Numbers](clean_fr_nif.ipynb)\n",
" * [French Personal Identification Numbers](clean_fr_nir.ipynb)\n",
" * [French Company Identification Numbers](clean_fr_siren.ipynb)\n",
" * [French TVA Numbers](clean_fr_tva.ipynb)\n",
" * [Stock Exchange Daily Official List Numbers](clean_gb_sedol.ipynb)\n",
" * [English Unique Pupil Numbers](clean_gb_upn.ipynb)\n",
" * [United Kingdom Unique Taxpayer References](clean_gb_utr.ipynb)\n",
" * [United Kingdom VAT Numbers](clean_gb_vat.ipynb)\n",
" * [Greek Social Security Numbers](clean_gr_amka.ipynb)\n",
" * [Greek VAT Numbers](clean_gr_vat.ipynb)\n",
" * [Guatemala Tax Numbers](clean_gt_nit.ipynb)\n",
" * [Croatian Identification Numbers](clean_hr_oib.ipynb)\n",
" * [Hungarian ANUM Numbers](clean_hu_anum.ipynb)\n",
" * [Indonesian VAT Numbers](clean_id_npwp.ipynb)\n",
" * [Irish Personal Numbers](clean_ie_pps.ipynb)\n",
" * [Irish VAT Numbers](clean_ie_vat.ipynb)\n",
" * [Israeli Company Numbers](clean_il_hp.ipynb)\n",
" * [Israeli Personal Numbers](clean_il_idnr.ipynb)\n",
" * [Indian Digital Resident Personal Identity Numbers](clean_in_aadhaar.ipynb)\n",
" * [Indian Permanent Account Numbers](clean_in_pan.ipynb)\n",
" * [Icelandic Identity Codes](clean_is_kennitala.ipynb)\n",
" * [Icelandic VSK Numbers](clean_is_vsk.ipynb)\n",
" * [Italian Code For Identification Of Drugs](clean_it_aic.ipynb)\n",
" * [Italian Fiscal Codes](clean_it_codicefiscale.ipynb)\n",
" * [Italian IVA Numbers](clean_it_iva.ipynb)\n",
" * [Japanese Corporate Numbers](clean_jp_cn.ipynb)\n",
" * [South Korea Business Registration Numbers](clean_kr_brn.ipynb)\n",
" * [South Korean Resident Registration Numbers](clean_kr_rrn.ipynb)\n",
" * [Liechtenstein Tax Code For Individuals And Entities](clean_li_peid.ipynb)\n",
" * [Lithuanian Personal Numbers](clean_lt_asmens.ipynb)\n",
" * [Lithuanian PVM Numbers](clean_lt_pvm.ipynb)\n",
" * [Luxembourgian TVA Numbers](clean_lu_tva.ipynb)\n",
" * [Latvian PVN (VAT) Numbers](clean_lv_pvn.ipynb)\n",
" * [Monacan TVA Numbers](clean_mc_tva.ipynb)\n",
" * [Moldavian Company Identification Numbers](clean_md_idno.ipynb)\n",
" * [Montenegro IBANs](clean_me_iban.ipynb)\n",
" * [Maltese VAT Numbers](clean_mt_vat.ipynb)\n",
" * [Mauritian National ID Numbers](clean_mu_nid.ipynb)\n",
" * [Mexican Personal Identifiers](clean_mx_curp.ipynb)\n",
" * [Mexican Tax Numbers](clean_mx_rfc.ipynb)\n",
" * [Malaysian National Registration Identity Card Numbers](clean_my_nric.ipynb)\n",
" * [BRIN Numbers](clean_nl_brin.ipynb)\n",
" * [Dutch BTW Numbers](clean_nl_btw.ipynb)\n",
" * [Norwegian IBANs](clean_no_iban.ipynb)\n",
" * [Norwegian Bank Account Numbers](clean_no_kontonr.ipynb)\n",
" * [Norwegian VAT Numbers](clean_no_mva.ipynb)\n",
" * [Norwegian Organisation Numbers](clean_no_orgnr.ipynb)\n",
" * [New Zealand IRD Numbers](clean_nz_ird.ipynb)\n",
" * [Peruvian Personal Numbers](clean_pe_cui.ipynb)\n",
" * [Peruvian Fiscal Numbers](clean_pe_ruc.ipynb)\n",
" * [Polish VAT Numbers](clean_pl_nip.ipynb)\n",
" * [Polish National Identification Numbers](clean_pl_pesel.ipynb)\n",
" * [Polish Register Of Economic Units](clean_pl_regon.ipynb)\n",
" * [Portuguese NIF Numbers](clean_pt_nif.ipynb)\n",
" * [Paraguay RUC Numbers](clean_py_ruc.ipynb)\n",
" * [Romanian CF (VAT) Numbers](clean_ro_cf.ipynb)\n",
" * [Romanian Numerical Personal Codes](clean_ro_cnp.ipynb)\n",
" * [Romanian Company Identifiers](clean_ro_cui.ipynb)\n",
" * [Romanian Trade Register Identifiers](clean_ro_onrc.ipynb)\n",
" * [French Company Establishment Identification Numbers](clean_fr_siret.ipynb)\n",
" * [United Kingdom National Health Service Patient Identifiers](clean_gb_nhs.ipynb)\n",
" * [Dutch Citizen Identification Numbers](clean_nl_bsn.ipynb)\n",
" * [Dutch Student Identification Numbers](clean_nl_onderwijsnummer.ipynb)\n",
" * [Belgian IBANs](clean_be_iban.ipynb)\n",
" * [Bulgarian Personal Numbers](clean_bg_pnf.ipynb)\n",
" * [Brazilian Company Identifiers](clean_br_cnpj.ipynb)\n",
" * [Brazilian National Identifiers](clean_br_cpf.ipynb)\n",
" * [Canadian Social Insurance Numbers](clean_ca_sin.ipynb)\n",
" * [Chinese Unified Social Credit Codes](clean_cn_uscc.ipynb)\n",
" * [Estonian Organisation Registration Codes](clean_ee_registrikood.ipynb)\n",
" * [Spanish Real State IDs](clean_es_referenciacatastral.ipynb)\n",
" * [Euro Banknote Serial Numbers](clean_eu_banknote.ipynb)\n",
" * [European Energy Identification Codes](clean_eu_eic.ipynb)\n",
" * [Finnish Association Registry IDs](clean_fi_associationid.ipynb)\n",
" * [Finnish Individual Tax Numbers](clean_fi_veronumero.ipynb)\n",
" * [Dutch Postal Codes](clean_nl_postcode.ipynb)\n",
" * [Norwegian Birth Numbers](clean_no_fodselsnummer.ipynb)\n",
" * [New Zealand Bank Account Numbers](clean_nz_bankaccount.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -62,7 +195,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
"version": "3.6.12"
}
},
"nbformat": 4,
Expand Down

1 comment on commit 42bf188

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataPrep.EDA Benchmarks

Benchmark suite Current: 42bf188 Previous: 7e2a5a0 Ratio
dataprep/tests/benchmarks/eda.py::test_create_report 0.16553030160412138 iter/sec (stddev: 0.07837264203602647) 0.17402520339979452 iter/sec (stddev: 0.017114672484713184) 1.05

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.