Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
georgerohan001 authored Feb 17, 2024
1 parent 4dffff0 commit 351c2bd
Showing 1 changed file with 282 additions and 1 deletion.
283 changes: 282 additions & 1 deletion Project-Report.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,288 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Function: "
"### Function: extract_sheets(parent_folder)\n",
"\n",
"The extract sheets function does what the name suggests, it extracts the sheets. It also does some other small tasks that are all a part of the clean sheet-extraction process. To start, it will need a perform the same tasks individually for each folder within the mother folder. For this we need some way of checking what folders lie within the parent folder. For this, the package `os` has a module called `listdir` that creates a list of items inside a specified folder. So we run a for loop so that it runs on all the folders in the main directory:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for folder_name in os.listdir(parent_folder):\n",
" folder_path = os.path.join(parent_folder, folder_name)\n",
"\n",
" if os.path.isdir(folder_path):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After looking in each folder, it performs a multitude of tasks, namely:\n",
"\n",
"#### 1. Creating required folders:\n",
"\n",
"A simple `for` loop runs through the names of all the necessary folders and then creates them if they don't exist already:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for folder in [\n",
" \"Already Extracted Sheets\", \"Manual Correction Needed\",\n",
" \"Successful Sheets\", \"TXT Files\", \"Unrecognized Sheets\"\n",
"]:\n",
" temp_folder = os.path.join(\n",
" folder_path,\n",
" folder)\n",
" os.makedirs(temp_folder, exist_ok=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Creation of \"Points Log\" and addition of first exercise points:\n",
"\n",
"A `txt` file is created that stores the point balance and creates a log of all points obtained. After that the points for the first exercise are automatically inputted because the first exercise is free points for all members of the course."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Creates points log\n",
"points_log_path = os.path.join(folder_path, \"Points_Log.txt\")\n",
"# Adds points for Sheet 01 Task 01\n",
"if not os.path.exists(points_log_path):\n",
" with open(points_log_path, 'w') as points_log:\n",
" point_balance = 5\n",
" ex01_log = \"Sheet 01 Task 01 IDE Installation: +5 Points\\n\"\n",
" points_log.write(f\"File name: {folder_name}\\n\")\n",
" points_log.write(f\"Point balance: {point_balance}\\n\")\n",
" points_log.write(\"\\nLogs:\\n\")\n",
" points_log.write(ex01_log)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Extraction of sheets\n",
"\n",
"Now comes the part where the function lives upto its name! The extraction. Before extracting, it first checks to see if the exercise sheet in question has already been extracted. If yes, then it will simply skip the extraction and move on. Apart from that, Another problem that we faced was that some students were naming their zip files with the \".zip\" inside their names. So, for example the file would be `sheet01.zip.zip`. If this was extracted, it would extract it to a folder called `sheet01.zip` (even though it is not a zip file and is simpply a directory with .zip in its name). The `os` library cannot tell the difference between a zip file called `sheet01.zip` and a directory called `sheet01.zip` and therefore throws an error saying that two files with the same name cannot exist in the same directory. For that reason, we put the extraction in a try block, and in the event of an error (`FileNotFoundError`for Windows `NotADirectoryError` for mac), we added a few extra steps that fixed this naming problem:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Extracts all 5 exercise sheets\n",
"sheet_names = [\"sheet01.zip\", \"sheet02.zip\", \"sheet03.zip\",\n",
" \"sheet04.zip\", \"sheet05.zip\"]\n",
"for sheet_name in sheet_names:\n",
" sheet_zip_path = os.path.join(folder_path, sheet_name)\n",
" # Warning message if it's already been extracted\n",
" if os.path.exists(sheet_zip_path):\n",
" if os.path.exists(\n",
" os.path.join(\n",
" folder_path,\n",
" \"Already Extracted Sheets\",\n",
" sheet_name)\n",
" ):\n",
" message = (\n",
" f\"{sheet_name} has already been extracted for\"\n",
" f\"{folder_name}.Extraction for this file \"\n",
" \"will be skipped.\"\n",
" )\n",
" messagebox.showinfo(\"Sheet Already Extracted\", message)\n",
" # Adds \"not extracted\" suffix\n",
" zip_files = sheet_name.replace(\n",
" '.zip', ' (Not extracted).zip')\n",
" not_extracted_path = os.path.join(\n",
" folder_path, f\"{zip_files}\")\n",
" os.rename(sheet_zip_path, not_extracted_path)\n",
" continue\n",
" with zipfile.ZipFile(sheet_zip_path, 'r') as zip_ref:\n",
" try:\n",
" zip_ref.extractall(folder_path)\n",
" except (FileNotFoundError, NotADirectoryError):\n",
" folder_to_delete = os.path.join(\n",
" folder_path,\n",
" \"%temp%\")\n",
" os.makedirs(folder_to_delete, exist_ok=True)\n",
" zip_ref.extractall(folder_to_delete)\n",
" for item in os.listdir(folder_to_delete):\n",
" item_path = os.path.join(\n",
" folder_to_delete,\n",
" item)\n",
" if (\n",
" os.path.isdir(item_path)\n",
" and item.endswith(\".zip\")\n",
" ):\n",
" os.rename(item_path, os.path.join(\n",
" folder_to_delete,\n",
" item[:-4]))\n",
" shutil.move(os.path.join(\n",
" folder_to_delete,\n",
" item[:-4]), folder_path)\n",
" os.rmdir(folder_to_delete)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The problem with some of the zip files was that most of them came with outer folders in them that was an additional step before the exercise sheets. For example, a zip folder by the name `exercise01.zip` had a folder called `exercise01` inside it, which has the tasks of the exercise. For this reason, we made our code unpack all folders that were not the folders we made initially, that had python files in them. We then used `rmdir` to delete the now empty folders:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for item in os.listdir(folder_path):\n",
" item_path = os.path.join(folder_path, item)\n",
" if os.path.isdir(item_path) and item not in [\n",
" \"__MACOSX\", \"__pycache__\",\n",
" \"Already Extracted Sheets\",\n",
" \"Manual Correction Needed\",\n",
" \"Successful Sheets\", \"TXT Files\",\n",
" \"Unrecognized Sheets\"]:\n",
" py_file_found = any(\n",
" sub_item.endswith(\".py\") and os.path.isfile(\n",
" os.path.join(item_path, sub_item))\n",
" for sub_item in os.listdir(item_path)\n",
" )\n",
" if py_file_found:\n",
" for sub_item in os.listdir(item_path):\n",
" sub_item_path = os.path.join(\n",
" item_path, sub_item)\n",
" new_item_path = os.path.join(\n",
" folder_path, sub_item)\n",
" os.rename(sub_item_path, new_item_path)\n",
" # Remove the now empty additional folder\n",
" os.rmdir(item_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. TXT Files\n",
"\n",
"We then had all extracted TXT files moved to a folder called `TXT Files` with the name of the exercise sheet they came out of as the suffix:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"already_extracted_path = os.path.join(\n",
" folder_path,\n",
" \"Already Extracted Sheets\",\n",
" sheet_name)\n",
"os.rename(sheet_zip_path, already_extracted_path)\n",
"for extracted_file in os.listdir(folder_path):\n",
" extracted_file_path = os.path.join(\n",
" folder_path, extracted_file)\n",
" # Looks for TXT files\n",
" if extracted_file.lower().endswith(\n",
" '.txt') and extracted_file != \"Points_Log.txt\":\n",
" new_txt_path = os.path.join(\n",
" folder_path, \"TXT Files\",\n",
" f\"{sheet_name.replace('.zip', '')}\"\n",
" f\"_{extracted_file}\") # Adds filename suffix\n",
" # Puts them into the TXT files folder\n",
" os.rename(extracted_file_path, new_txt_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. Exercise correction\n",
"\n",
"We then ran functions responsible for correcting each task in the exercise sheet:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Exercise sheet correction functions\n",
"helloworld(folder_path)\n",
"username(folder_path)\n",
"crosssum(folder_path)\n",
"lifeinweeks(folder_path)\n",
"leapyear(folder_path)\n",
"million(folder_path)\n",
"caesar_cipher(folder_path)\n",
"books(folder_path)\n",
"anagrams(folder_path)\n",
"data(folder_path)\n",
"graph(folder_path)\n",
"zen(folder_path)\n",
"shapes(folder_path)\n",
"zen_word_frequency(folder_path)\n",
"quotes(folder_path) # Remove if needed\n",
"names(folder_path)\n",
"tictactoe(folder_path)\n",
"README(folder_path)\n",
"project(folder_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. Deletion and Final Cleanup\n",
"\n",
"The final step was to delete `__MACOSX` and `__pycache__`, and to put everything else into a folder for unrecognized sheets:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
" # Removes commonly found folders\n",
" folders_to_delete = [\"__MACOSX\", \"__pycache__\"]\n",
" for folder_name in folders_to_delete:\n",
" folder_to_delete_path = os.path.join(folder_path, folder_name)\n",
" if (\n",
" os.path.exists(folder_to_delete_path)\n",
" and os.path.isdir(folder_to_delete_path)\n",
" ):\n",
" shutil.rmtree(folder_to_delete_path)\n",
"# Checks for files with wrong names\n",
"for remaining_item in os.listdir(folder_path):\n",
" remaining_item_path = os.path.join(folder_path, remaining_item)\n",
" if remaining_item not in [\"Already Extracted Sheets\",\n",
" \"Manual Correction Needed\",\n",
" \"Successful Sheets\", \"TXT Files\",\n",
" \"Unrecognized Sheets\", \"Points_Log.txt\"]:\n",
" # Move the item to the \"Unrecognized Sheets\" folder\n",
" new_path = os.path.join(\n",
" folder_path, \"Unrecognized Sheets\", remaining_item)\n",
" shutil.move(remaining_item_path, new_path)"
]
}
],
Expand Down

0 comments on commit 351c2bd

Please sign in to comment.