-
Notifications
You must be signed in to change notification settings - Fork 553
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #534 from baidu/master
Merge master
- Loading branch information
Showing
18 changed files
with
358 additions
and
147 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,78 +5,70 @@ | |
|
||
The Baidu File System (BFS) is a distributed file system designed to support real-time applications. Like many other distributed file systems, BFS is highly fault-tolerant. But different from others, BFS provides low read/write latency while maintains high throughout rates. Together with [Galaxy](https://github.com/baidu/galaxy) and [Tera](http://github.com/baidu/tera), BFS supports many real-time products in Baidu, including Baidu webpage database, Baidu incremental indexing system, Baidu user behavior analysis system, etc. | ||
|
||
##背景 | ||
百度的核心数据库[Tera](http://github.com/baidu/tera)将数据持久化在分布式文件系统上,分布式文件系统的性能、可用性和扩展性对整个上层搜索业务的稳定性与效果有着至关重要的影响。现有的分布式文件系统无法很好地满足这几方面的要求,所以我们从Tera需求出发,开发了百度自己的分布式文件系统Baidu File System (BFS)。 | ||
|
||
##设计目标 | ||
1. 高可靠、高可用 | ||
通过将数据副本进行多机房、多地域冗余,实现单个机房、地域遇到严重灾害的情况下,不丢失数据。 | ||
将元数据服务分布化,通过多副本实现元数据服务的高可用,通过Raft等一致性协议同元数据操作日志,实现多副本的一致性。 | ||
2. 高吞吐、低延迟 | ||
通过高性能的单机引擎,最大化存储介质IO吞吐;通过全局副本、流量调度,实现负载均衡。 | ||
3. 可水平扩展至万台规模 | ||
设计支持两地三机房,1万+台机器管理。 | ||
|
||
##系统架构 | ||
系统主要由NameServer、ChunkServer、SDK和bfs_client等几个模块构成。 | ||
其中NameServer是中心控制模块,负责目录树的管理;ChunkServer是数据节点负责提供文件块的读写服务;SDK以静态库的形式提供了用户使用的API;bfs_client是一个二进制的管理工具。 | ||
![架构图](resources/images/bfs-arch.png) | ||
|
||
## 构建 | ||
在百度内部,可以直接运行: | ||
sh internal_build.sh | ||
外部构建请参考.travis.yml中的步骤。 | ||
|
||
## 单机Sandbox测试 | ||
Sandbox目录下包含了运行单机测试的环境和脚本。 | ||
deploy.sh: 在本地部署一个包含4个chunkserver、1个nameserver的集群 | ||
start.sh: 启动部署好的集群 | ||
clear.sh: 清理集群 | ||
small_test.sh 简单的自动化测试脚本,会调用上面三个脚本,并使用bfs_client测试文件系统的基本功能 | ||
|
||
## 系统搭建 | ||
1. 搭建NameServer | ||
Nameserver部署需要1~3台机器($nshost1~3) | ||
Nameserver必须指定的flag: | ||
--nameserver_nodes=$nshost1:8828,$nshost2:8828,$nshost3:8828 | ||
--node_index=$hostid | ||
启动命令: | ||
./nameserver --flagfile=./bfs.flag | ||
2. 搭建Chunkserver | ||
为了保证可用性,chunkserver至少需要4台机器(一台挂掉的情况下,仍然可写) | ||
Chunkserver必须指定的flag: | ||
--nameserver_nodes=$nshost1:8828,$nshost2:8828,$nshost3:8828 | ||
--chunkserver_port=8825 | ||
--block_store_path=/home/disk1/bfs,/home/disk2/bfs | ||
启动命令: | ||
./chunkserver --flagfile=./bfs.flag | ||
3. 查看集群 | ||
有两种方式可以查看集群: | ||
a) 命令行方式 | ||
./bfs_client stat -a | ||
b) Web方式 | ||
用浏览器访问http://$nshost1:8828/dfs | ||
|
||
## 日志规则与说明 | ||
为了简化日志打印,并便于grep, | ||
所有block id的打印使用“#%ld "的格式(即前加#,后加空格) | ||
所有chunkserver id打印使用"C%d "的格式 | ||
所有entry id打印使用"E%ld "的格式 | ||
所有block version打印使用"V%ld "的格式 | ||
|
||
##前世 | ||
突然想写个分布式文件系统~ | ||
1. 支持表格系统的持久化数据存储 | ||
2. 支持混布系统的临时数据存储 | ||
3. 支持mapreduce的大文件存储 | ||
|
||
|
||
想加入的人在这留个名吧: | ||
|
||
yanshiguang~ | ||
yuanyi~ | ||
yuyangquan~ | ||
leiliyuan~ | ||
yangce~ | ||
## Features | ||
1. Continuous availability | ||
* Nameserver is implemented as a `raft group`, no single point failure. | ||
2. High throughput | ||
* High performance data engine to maximize IO utils. | ||
3. Low latency | ||
* Global load balance and slow node detection. | ||
4. Linear scalability | ||
* Support multi data center deployment and up to 10,000 data nodes. | ||
|
||
## Architecture | ||
![架构图](resources/images/bfs-arch2-mini.png) | ||
|
||
## Quick Start | ||
#### Build | ||
./build.sh | ||
#### Standalone BFS | ||
cd sandbox; ./deploy.sh; ./start.sh | ||
|
||
## How to Contribute | ||
1. Please read the [RoadMap](docs/roadmap.md) or source code. | ||
2. Find something you are interested in and start working on it. | ||
3. Test your code by simply running `make test` and `make check`. | ||
4. Make a pull request. | ||
5. Once your code has passed the code-review and merged, it will be run on thousands of servers :) | ||
|
||
|
||
## Contact us | ||
[email protected] | ||
|
||
==== | ||
|
||
[百度文件系统](http://github.com/baidu/bfs) | ||
==== | ||
|
||
百度的核心业务和数据库系统都依赖分布式文件系统作为底层存储,文件系统的可用性和性能对上层搜索业务的稳定性与效果有着至关重要的影响。现有的分布式文件系统(如HDFS等)是为离线批处理设计的,无法在保证高吞吐的情况下做到低延迟和持续可用,所以我们从搜索的业务特点出发,设计了百度文件系统。 | ||
|
||
## 核心特点 | ||
1. 持续可用 | ||
数据多机房、多地域冗余,元数据通过Raft维护一致性,单个机房宕机,不影响整体可用性。 | ||
2. 高吞吐 | ||
通过高性能的单机引擎,最大化存储介质IO吞吐; | ||
3. 低延时 | ||
全局负载均衡、慢节点自动规避 | ||
4. 水平扩展 | ||
设计支持两地三机房,1万+台机器管理。 | ||
|
||
## 架构 | ||
![架构图](resources/images/bfs-arch2-mini.png) | ||
|
||
## 快速试用 | ||
#### 构建 | ||
./build.sh | ||
#### 单机版BFS | ||
cd sandbox; ./deploy.sh; ./start.sh | ||
|
||
## 如何参与开发 | ||
1. 阅读[RoadMap](docs/roadmap.md)文件或者源代码,了解我们当前的开发方向 | ||
2. 找到自己感兴趣开发的的功能或模块 | ||
3. 进行开发,开发完成后自测功能是否正确,并运行make test及make check检查是否可以通过已有的测试case | ||
4. 发起pull request | ||
5. 在code-review通过后,你的代码便有机会运行在百度的数万台服务器上~ | ||
|
||
|
||
## 联系我们 | ||
[email protected] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
#!/bin/bash | ||
|
||
set -e -u -E # this script will exit if any sub-command fails | ||
|
||
######################################## | ||
# download & build depend software | ||
######################################## | ||
|
||
WORK_DIR=`pwd` | ||
DEPS_SOURCE=`pwd`/thirdsrc | ||
DEPS_PREFIX=`pwd`/thirdparty | ||
DEPS_CONFIG="--prefix=${DEPS_PREFIX} --disable-shared --with-pic" | ||
FLAG_DIR=`pwd`/.build | ||
|
||
export PATH=${DEPS_PREFIX}/bin:$PATH | ||
mkdir -p ${DEPS_SOURCE} ${DEPS_PREFIX} ${FLAG_DIR} | ||
|
||
if [ ! -f "${FLAG_DIR}/dl_third" ] || [ ! -d "${DEPS_SOURCE}/.git" ]; then | ||
rm -rf ${DEPS_SOURCE} | ||
mkdir ${DEPS_SOURCE} | ||
git clone https://github.com/yvxiang/thirdparty.git thirdsrc | ||
touch "${FLAG_DIR}/dl_third" | ||
fi | ||
|
||
cd ${DEPS_SOURCE} | ||
|
||
# boost | ||
if [ ! -f "${FLAG_DIR}/boost_1_57_0" ] \ | ||
|| [ ! -d "${DEPS_PREFIX}/boost_1_57_0/boost" ]; then | ||
tar zxf boost_1_57_0.tar.gz | ||
rm -rf ${DEPS_PREFIX}/boost_1_57_0 | ||
mv boost_1_57_0 ${DEPS_PREFIX}/boost_1_57_0 | ||
touch "${FLAG_DIR}/boost_1_57_0" | ||
fi | ||
|
||
# protobuf | ||
if [ ! -f "${FLAG_DIR}/protobuf_2_6_1" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libprotobuf.a" ] \ | ||
|| [ ! -d "${DEPS_PREFIX}/include/google/protobuf" ]; then | ||
tar zxf protobuf-2.6.1.tar.gz | ||
cd protobuf-2.6.1 | ||
./configure ${DEPS_CONFIG} | ||
make -j4 | ||
make install | ||
cd - | ||
touch "${FLAG_DIR}/protobuf_2_6_1" | ||
fi | ||
|
||
#leveldb | ||
if [ ! -f "${FLAG_DIR}/leveldb" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libleveldb.a" ] \ | ||
|| [ ! -d "${DEPS_PREFIX}/include/leveldb" ]; then | ||
rm -rf leveldb | ||
git clone https://github.com/lylei/leveldb.git leveldb | ||
cd leveldb | ||
echo "PREFIX=${DEPS_PREFIX}" > config.mk | ||
make -j4 | ||
make install | ||
cd - | ||
touch "${FLAG_DIR}/leveldb" | ||
fi | ||
|
||
# snappy | ||
if [ ! -f "${FLAG_DIR}/snappy_1_1_1" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libsnappy.a" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/include/snappy.h" ]; then | ||
tar zxf snappy-1.1.1.tar.gz | ||
cd snappy-1.1.1 | ||
./configure ${DEPS_CONFIG} | ||
make -j4 | ||
make install | ||
cd - | ||
touch "${FLAG_DIR}/snappy_1_1_1" | ||
fi | ||
|
||
# sofa-pbrpc | ||
if [ ! -f "${FLAG_DIR}/sofa-pbrpc" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libsofa-pbrpc.a" ] \ | ||
|| [ ! -d "${DEPS_PREFIX}/include/sofa/pbrpc" ]; then | ||
rm -rf sofa-pbrpc | ||
|
||
git clone --depth=1 https://github.com/baidu/sofa-pbrpc.git sofa-pbrpc | ||
cd sofa-pbrpc | ||
sed -i '/BOOST_HEADER_DIR=/ d' depends.mk | ||
sed -i '/PROTOBUF_DIR=/ d' depends.mk | ||
sed -i '/SNAPPY_DIR=/ d' depends.mk | ||
echo "BOOST_HEADER_DIR=${DEPS_PREFIX}/boost_1_57_0" >> depends.mk | ||
echo "PROTOBUF_DIR=${DEPS_PREFIX}" >> depends.mk | ||
echo "SNAPPY_DIR=${DEPS_PREFIX}" >> depends.mk | ||
echo "PREFIX=${DEPS_PREFIX}" >> depends.mk | ||
make -j4 | ||
make install | ||
cd - | ||
touch "${FLAG_DIR}/sofa-pbrpc" | ||
fi | ||
|
||
# cmake for gflags | ||
if ! which cmake ; then | ||
cd CMake-3.2.1 | ||
./configure --prefix=${DEPS_PREFIX} | ||
make -j4 | ||
make install | ||
cd - | ||
fi | ||
|
||
# gflags | ||
if [ ! -f "${FLAG_DIR}/gflags_2_1_1" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libgflags.a" ] \ | ||
|| [ ! -d "${DEPS_PREFIX}/include/gflags" ]; then | ||
tar zxf gflags-2.1.1.tar.gz | ||
cd gflags-2.1.1 | ||
cmake -DCMAKE_INSTALL_PREFIX=${DEPS_PREFIX} -DGFLAGS_NAMESPACE=google -DCMAKE_CXX_FLAGS=-fPIC | ||
make -j4 | ||
make install | ||
cd - | ||
touch "${FLAG_DIR}/gflags_2_1_1" | ||
fi | ||
|
||
# gtest | ||
if [ ! -f "${FLAG_DIR}/gtest_1_7_0" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libgtest.a" ] \ | ||
|| [ ! -d "${DEPS_PREFIX}/include/gtest" ]; then | ||
cd gtest-1.7.0 | ||
./configure ${DEPS_CONFIG} | ||
make | ||
cp -a lib/.libs/* ${DEPS_PREFIX}/lib | ||
cp -a include/gtest ${DEPS_PREFIX}/include | ||
cd - | ||
touch "${FLAG_DIR}/gtest_1_7_0" | ||
fi | ||
|
||
# libunwind for gperftools | ||
if [ ! -f "${FLAG_DIR}/libunwind_0_99" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libunwind.a" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/include/libunwind.h" ]; then | ||
tar zxf libunwind-0.99.tar.gz | ||
cd libunwind-0.99 | ||
./configure ${DEPS_CONFIG} | ||
make CFLAGS=-fPIC -j4 | ||
make CFLAGS=-fPIC install | ||
cd - | ||
touch "${FLAG_DIR}/libunwind_0_99" | ||
fi | ||
|
||
# gperftools (tcmalloc) | ||
if [ ! -f "${FLAG_DIR}/gperftools_2_2_1" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libtcmalloc_minimal.a" ]; then | ||
tar zxf gperftools-2.2.1.tar.gz | ||
cd gperftools-2.2.1 | ||
./configure ${DEPS_CONFIG} CPPFLAGS=-I${DEPS_PREFIX}/include LDFLAGS=-L${DEPS_PREFIX}/lib | ||
make -j4 | ||
make install | ||
cd - | ||
touch "${FLAG_DIR}/gperftools_2_2_1" | ||
fi | ||
|
||
# common | ||
if [ ! -f "${FLAG_DIR}/common" ] \ | ||
|| [ ! -f "${DEPS_PREFIX}/lib/libcommon.a" ]; then | ||
rm -rf common | ||
git clone https://github.com/baidu/common | ||
cd common | ||
sed -i 's/^PREFIX=.*/PREFIX=..\/..\/thirdparty/' config.mk | ||
sed -i '/^INCLUDE_PATH=*/s/$/ -I..\/..\/thirdparty\/boost_1_57_0/g' Makefile | ||
make -j4 | ||
make install | ||
cd - | ||
touch "${FLAG_DIR}/common" | ||
fi | ||
|
||
|
||
cd ${WORK_DIR} | ||
|
||
######################################## | ||
# config depengs.mk | ||
######################################## | ||
|
||
echo "PBRPC_PATH=./thirdparty" > depends.mk | ||
echo "PROTOBUF_PATH=./thirdparty" >> depends.mk | ||
echo "PROTOC_PATH=./thirdparty/bin/" >> depends.mk | ||
echo 'PROTOC=$(PROTOC_PATH)protoc' >> depends.mk | ||
echo "PBRPC_PATH=./thirdparty" >> depends.mk | ||
echo "BOOST_PATH=./thirdparty/boost_1_57_0" >> depends.mk | ||
echo "GFLAG_PATH=./thirdparty" >> depends.mk | ||
echo "GTEST_PATH=./thirdparty" >> depends.mk | ||
echo "COMMON_PATH=./thirdparty" >> depends.mk | ||
echo "TCMALLOC_PATH=./thirdparty" >> depends.mk | ||
|
||
######################################## | ||
# build tera | ||
######################################## | ||
|
||
make clean | ||
make -j4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Roadmap | ||
|
||
## Basic functions | ||
- [x] Basic files, directory operations(Create/Delete/Read/Write/Rename) | ||
- [x] automatic recovery | ||
- [x] Nameserver HA | ||
- [ ] Split the Metaserver from the Nameserver | ||
- [ ] disk loadbalance | ||
- [ ] Dynamic load balancing of chunkserver | ||
- [ ] File Lock & Directory Lock | ||
- [x] Simple multi-geographical replica placement | ||
- [ ] sdk lease | ||
- [ ] Skip slow nodes while reading a file | ||
|
||
## Posix interface | ||
- [x] mount support | ||
- [ ] fuse lowlevel | ||
- [x] Basic read and write operations(not include random writes) | ||
- [x] Small file random write, support vim, gcc and other applications | ||
- [ ] Large file random write | ||
|
||
## Application support | ||
- [x] Tera | ||
- [ ] Shuttle | ||
- [ ] Galaxy |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.