Skip to content

GSoC 2017 assingment

Valentin Kuznetsov edited this page Mar 7, 2017 · 1 revision

Dear GSoC applicant,

This wiki page contains an initial assignment which will be used to select specific candidate for transfer2go project under GSoC 2017 program. The assignment is based on project description [1] and initial PhEDEx system architecture [2].

We need to build fully distributed system with loosely coupled agents. Their task is to perform file transfer via given protocol/tools, e.g. WLCG Glite FTP [3]. Even though at this moment you don't need to use these tools you can easily mimic their behavior via custom UNIX tools/scripts or used default HTTP protocol for file transfers. The transfer2go project [1] already implements full logic of setting up different agents, schedule tasks and perform transfer among them via HTTP protocol (the others will be implemented later by successful candidate during his/her work on GSoC). You may read how to setup code in project twiki [4].

The assignment would be to write simple router which can intelligently choose which agent to use based on underlying agent conditions, such as (but limited to):

  • agent load
  • link bandwidth
  • agent storage type/latency The router will be part of the agent, such that user request can be placed to any agent. Then router of this agent can find out which other participating agents holds the data (the code for this already exists in [1]), what are the agent parameters/conditions (you may extend existing AgentStatus info) and decide which one to use for data transfer.

As a bonus, you may sketch/implement a possibility for router to use Machine Learning (ML) feedback. The idea is that with time each agent will learn from its own transfers about others and we can use this information to improve router decision, e.g. via ML predictions.

Please prepare proposal (which is required by GSoC rules) and initial prototype. For testing purposes you may consult [4] how to setup code/agents on your local machine, e.g. since [1] is a web-based service, you may setup as many web servers on different ports as you need to, each pointing to own TFC catalog, and run your code on your local machine(s). You'll need to feed your catalog with your files (you may choose any local files), see Client examples in Configuration wiki on [4]. Please note, so far [1] requires GRID certificates to authenticate users, but there is already a PR [5] which allows disable this authentication and run code without GRID certificates (this is what you need for initial testing and prototyping).

  1. https://github.com/vkuznet/transfer2go
  2. https://www.researchgate.net/publication/228732867_Data_transfer_infrastructure_for_CMS_data_taking
  3. https://www.wikiwand.com/en/GLite
  4. https://github.com/vkuznet/transfer2go/wiki
  5. https://github.com/vkuznet/transfer2go/pull/2
Clone this wiki locally