Biomedical Query Expansion (Data Science Lab, KICS-UET)

Pre-requisties

Install JDK 7 or higher
Install JRE latest version
Install Eclipse
Install manven latest version
- Open terminal and type: sudo apt-get install maven
Download and install the solr-6.6.0 or higher from it official site
Download genomic data repository from TREC 2007 Genomics Track Data

How to configure the solr

Run solr service by default it use port 8983. To check in your browser type: localhost:8983

Create a new core or collection, the default core/colletion directory is /var/solr/data

Once you download your data repositoty, extract them and combine all the files under one directory, its require about 9.8 GB of space
Now Index the data for you created core/collection:
Solr indexed your data according to your default solrconfig.xml schema but you can define and specify your own fields
Specifiy your own fields in solr:
you can Update solrschema.xml and managed-schema located in your new created core/collection directory files by adding new fields
How To add new fields
- Open /var/solr/data/<core/collection name>/conf/ --> managed-schema, solrschema.xml
  - In solrschema.xml file: Search
```
 <requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler"/>
```
  - Add a new field inside the above tag
```
 <str name="capture">body</str>
```
  - To add your own replace the body with your own field name
  - Save and exit the file
  - In managed-schema file: Search
```
 <field name="_text_" type="text_general" multiValued="true" indexed="true" stored="false" />
```
  - Add a new tag uder the "text" field
```
 <field name="body" type="text_general" indexed="true" stored="true"/>
```
  - To add your own replace the "body" with your own field name
- Restart your solr by type command
  sudo service solr restart
  - If your solr get error, Check you configuration files properly
    Note: Name must be same in managed-schema and solrconfig.xml files

How to setup files and compile the code

Open terminal in you project root directory and type
mvn compile

It will compiles all the dependencies in your pom.xml file

To run code properly the following files must be download and extract to their proper place
Following files must be included in you resources dir

Downlaod trecgen2007.gold.standard.tsv.txt
Downlaod 2007topics.txt
Download Wordnet-3.0. Create a new directory in the main project named as data and extract the contants of wordnet-3.0 inside this data dir.
Add a folder name script under the resource dir
Download trecgen2007_score.py and save under script dir
Download the Sementic Types Mappings and Sementic Group File. also create a dir named Mappings under resources dir and put the two Sementic Types and Sementic Group Files.
Now create a dir named DocResult under resource dir --> (This directory will be used for the output of results comparision)

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.settings		.settings
resources		resources
src/pk/edu/kics/dsl/qa		src/pk/edu/kics/dsl/qa
.classpath		.classpath
.gitignore		.gitignore
.project		.project
.pydevproject		.pydevproject
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biomedical Query Expansion (Data Science Lab, KICS-UET)

Pre-requisties

How to configure the solr

Specifiy your own fields in solr:

How To add new fields

How to setup files and compile the code

About

Releases

Packages

Contributors 4

Languages

License

dr-m-wasim/DSL-BiomedicalQueryExpansion

Folders and files

Latest commit

History

Repository files navigation

Biomedical Query Expansion (Data Science Lab, KICS-UET)

Pre-requisties

How to configure the solr

Specifiy your own fields in solr:

How To add new fields

How to setup files and compile the code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages