Commit 8b60e218 authored by Martin K. Petersen's avatar Martin K. Petersen
Browse files

Merge patch series "Add Command Duration Limits support"

Niklas Cassel <nks@flawful.org> says:

This series adds support for Command Duration Limits.
The series is based on linux tag: v6.4-rc1
The series can also be found in git: https://github.com/floatious/linux/commits/cdl-v7

=================
CDL in ATA / SCSI
=================
Command Duration Limits is defined in:
T13 ATA Command Set - 5 (ACS-5) and
T10 SCSI Primary Commands - 6 (SPC-6) respectively
(a simpler version of CDL is defined in T10 SPC-5).

CDL defines Duration Limits Descriptors (DLD).
7 DLDs for read commands and 7 DLDs for write commands.
Simply put, a DLD contains a limit and a policy.

A command can specify that a certain limit should be applied by setting
the DLD index field (3 bits, so 0-7) in the command itself.

The DLD index points to one of the 7 DLDs.
DLD index 0 means no descriptor, so no limit.
DLD index 1-7 means DLD 1-7.

A DLD can have a few different policies, but the two major ones are:
-Policy 0xF (abort), command will be completed with command aborted error
(ATA) or status CHECK CONDITION (SCSI), with sense data indicating that
the command timed out.
-Policy 0xD (complete-unavailable), command will be completed without
error (ATA) or status GOOD (SCSI), with sense data indicating that the
command timed out. Note that the command will not have transferred any
data to/from the device when the command timed out, even though the
command returned success.

Regardless of the CDL policy, in case of a CDL timeout, the I/O will
result in a -ETIME error to user-space.

The DLDs are defined in the CDL log page(s) and are readable and writable.
Reading and writing the CDL DLDs are outside the scope of the kernel.
If a user wants to read or write the descriptors, they can do so using a
user-space application that sends passthrough commands, such as cdl-tools:
https://github.com/westerndigitalcorporation/cdl-tools

================================
The introduction of ioprio hints
================================
What the kernel does provide, is a method to let I/O use one of the CDL DLDs
defined in the device. Note that the kernel will simply forward the DLD index
to the device, so the kernel currently does not know, nor does it need to know,
how the DLDs are defined inside the device.

The way that the CDL DLD index is supplied to the kernel is by introducing a
new 10 bit "ioprio hint" field within the existing 16 bit ioprio definition.

Currently, only 6 out of the 16 ioprio bits are in use, the remaining 10 bits
are unused, and are currently explicitly disallowed to be set by the kernel.

For now, we only add ioprio hints representing CDL DLD index 1-7. Additional
ioprio hints for other QoS features could be defined in the future.

A theoretical future work could be to make an I/O scheduler aware of these
hints. E.g. for CDL, an I/O scheduler could make use of the duration limit
in each descriptor, and take that information into account while scheduling
commands. Right now, the ioprio hints will be ignored by the I/O schedulers.

==============================
How to use CDL from user-space
==============================
Since CDL is mutually exclusive with NCQ priority
(see ncq_prio_enable and sas_ncq_prio_enable in
Documentation/ABI/testing/sysfs-block-device),
CDL has to be explicitly enabled using:
echo 1 > /sys/block/$bdev/device/cdl_enable

Since the ioprio hints are supplied through the existing I/O priority API,
it should be simple for an application to make use of the ioprio hints.

It simply has to reuse one of the new macros defined in
include/uapi/linux/ioprio.h: IOPRIO_PRIO_HINT() or IOPRIO_PRIO_VALUE_HINT(),
and supply one of the new hints defined in include/uapi/linux/ioprio.h:
IOPRIO_HINT_DEV_DURATION_LIMIT_[1-7], which indicates that the I/O should
use the corresponding CDL DLD index 1-7.

By reusing the I/O priority API, the user can both define a DLD to use per
AIO (io_uring sqe->ioprio or libaio iocb->aio_reqprio) or per-thread
(ioprio_set()).

=======
Testing
=======
With the following fio patches:
https://github.com/floatious/fio/commits/cdl

fio adds support for ioprio hints, such that CDL can be tested using e.g.:
fio --ioengine=io_uring --cmdprio_percentage=10 --cmdprio_hint=DLD_index

A simple way to test is to use a DLD with a very short duration limit,
and send large reads. Regardless of the CDL policy, in case of a CDL
timeout, the I/O will result in a -ETIME error to user-space.

We also provide a CDL test suite located in the cdl-tools repo, see:
https://github.com/westerndigitalcorporation/cdl-tools#testing-a-system-command-duration-limits-support

We have tested this patch series using:
-real hardware
-the following QEMU implementation:
https://github.com/floatious/qemu/tree/cdl
(NOTE: the QEMU implementation requires you to define the CDL policy at compile
time, so you currently need to recompile QEMU when switching between policies.)

===================
Further information
===================
For further information about CDL, see Damien's slides:

Presented at SDC 2021:
https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf

Presented at Lund Linux Con 2022:
https://drive.google.com/file/d/1I6ChFc0h4JY9qZdO1bY5oCAdYCSZVqWw/view?usp=sharing

================
Changes since V6
================
-Rebased series on v6.4-rc1.
-Picked up Reviewed-by tags from Hannes (Thank you Hannes!)
-Picked up Reviewed-by tag from Christoph (Thank you Christoph!)
-Changed KernelVersion from 6.4 to 6.5 for new sysfs attributes.

For older change logs, see previous patch series versions:
https://lore.kernel.org/linux-scsi/20230406113252.41211-1-nks@flawful.org/
https://lore.kernel.org/linux-scsi/20230404182428.715140-1-nks@flawful.org/
https://lore.kernel.org/linux-scsi/20230309215516.3800571-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/20230124190308.127318-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/20230112140412.667308-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/20221208105947.2399894-1-niklas.cassel@wdc.com/

Link: https://lore.kernel.org/r/20230511011356.227789-1-nks@flawful.org


Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
parents 7907ad74 18bd7718
Loading
Loading
Loading
Loading
+22 −0
Original line number Diff line number Diff line
@@ -95,3 +95,25 @@ Description:
		This file does not exist if the HBA driver does not implement
		support for the SATA NCQ priority feature, regardless of the
		device support for this feature.


What:		/sys/block/*/device/cdl_supported
Date:		May, 2023
KernelVersion:	v6.5
Contact:	linux-scsi@vger.kernel.org
Description:
		(RO) Indicates if the device supports the command duration
		limits feature found in some ATA and SCSI devices.


What:		/sys/block/*/device/cdl_enable
Date:		May, 2023
KernelVersion:	v6.5
Contact:	linux-scsi@vger.kernel.org
Description:
		(RW) For a device supporting the command duration limits
		feature, write to the file to turn on or off the feature.
		By default this feature is turned off.
		Writing "1" to this file enables the use of command duration
		limits for read and write commands in the kernel and turns on
		the feature on the device. Writing "0" disables the feature.
+4 −4
Original line number Diff line number Diff line
@@ -5524,16 +5524,16 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
		bfqq->new_ioprio_class = task_nice_ioclass(tsk);
		break;
	case IOPRIO_CLASS_RT:
		bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
		bfqq->new_ioprio = IOPRIO_PRIO_LEVEL(bic->ioprio);
		bfqq->new_ioprio_class = IOPRIO_CLASS_RT;
		break;
	case IOPRIO_CLASS_BE:
		bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
		bfqq->new_ioprio = IOPRIO_PRIO_LEVEL(bic->ioprio);
		bfqq->new_ioprio_class = IOPRIO_CLASS_BE;
		break;
	case IOPRIO_CLASS_IDLE:
		bfqq->new_ioprio_class = IOPRIO_CLASS_IDLE;
		bfqq->new_ioprio = 7;
		bfqq->new_ioprio = IOPRIO_NR_LEVELS - 1;
		break;
	}

@@ -5830,7 +5830,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
				       struct bfq_io_cq *bic,
				       bool respawn)
{
	const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
	const int ioprio = IOPRIO_PRIO_LEVEL(bic->ioprio);
	const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
	struct bfq_queue **async_bfqq = NULL;
	struct bfq_queue *bfqq;
+3 −0
Original line number Diff line number Diff line
@@ -170,6 +170,9 @@ static const struct {
	[BLK_STS_ZONE_OPEN_RESOURCE]	= { -ETOOMANYREFS, "open zones exceeded" },
	[BLK_STS_ZONE_ACTIVE_RESOURCE]	= { -EOVERFLOW, "active zones exceeded" },

	/* Command duration limit device-side timeout */
	[BLK_STS_DURATION_LIMIT]	= { -ETIME, "duration limit exceeded" },

	/* everything else not covered above: */
	[BLK_STS_IOERR]		= { -EIO,	"I/O" },
};
+3 −3
Original line number Diff line number Diff line
@@ -33,7 +33,7 @@
int ioprio_check_cap(int ioprio)
{
	int class = IOPRIO_PRIO_CLASS(ioprio);
	int data = IOPRIO_PRIO_DATA(ioprio);
	int level = IOPRIO_PRIO_LEVEL(ioprio);

	switch (class) {
		case IOPRIO_CLASS_RT:
@@ -49,13 +49,13 @@ int ioprio_check_cap(int ioprio)
			fallthrough;
			/* rt has prio field too */
		case IOPRIO_CLASS_BE:
			if (data >= IOPRIO_NR_LEVELS || data < 0)
			if (level >= IOPRIO_NR_LEVELS)
				return -EINVAL;
			break;
		case IOPRIO_CLASS_IDLE:
			break;
		case IOPRIO_CLASS_NONE:
			if (data)
			if (level)
				return -EINVAL;
			break;
		default:
+200 −4
Original line number Diff line number Diff line
@@ -665,12 +665,33 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct ata_device *dev)
	return block;
}

/*
 * Set a taskfile command duration limit index.
 */
static inline void ata_set_tf_cdl(struct ata_queued_cmd *qc, int cdl)
{
	struct ata_taskfile *tf = &qc->tf;

	if (tf->protocol == ATA_PROT_NCQ)
		tf->auxiliary |= cdl;
	else
		tf->feature |= cdl;

	/*
	 * Mark this command as having a CDL and request the result
	 * task file so that we can inspect the sense data available
	 * bit on completion.
	 */
	qc->flags |= ATA_QCFLAG_HAS_CDL | ATA_QCFLAG_RESULT_TF;
}

/**
 *	ata_build_rw_tf - Build ATA taskfile for given read/write request
 *	@qc: Metadata associated with the taskfile to build
 *	@block: Block address
 *	@n_block: Number of blocks
 *	@tf_flags: RW/FUA etc...
 *	@cdl: Command duration limit index
 *	@class: IO priority class
 *
 *	LOCKING:
@@ -685,7 +706,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct ata_device *dev)
 *	-EINVAL if the request is invalid.
 */
int ata_build_rw_tf(struct ata_queued_cmd *qc, u64 block, u32 n_block,
		    unsigned int tf_flags, int class)
		    unsigned int tf_flags, int cdl, int class)
{
	struct ata_taskfile *tf = &qc->tf;
	struct ata_device *dev = qc->dev;
@@ -724,11 +745,20 @@ int ata_build_rw_tf(struct ata_queued_cmd *qc, u64 block, u32 n_block,
		if (dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLED &&
		    class == IOPRIO_CLASS_RT)
			tf->hob_nsect |= ATA_PRIO_HIGH << ATA_SHIFT_PRIO;

		if ((dev->flags & ATA_DFLAG_CDL_ENABLED) && cdl)
			ata_set_tf_cdl(qc, cdl);

	} else if (dev->flags & ATA_DFLAG_LBA) {
		tf->flags |= ATA_TFLAG_LBA;

		/* We need LBA48 for FUA writes */
		if (!(tf->flags & ATA_TFLAG_FUA) && lba_28_ok(block, n_block)) {
		if ((dev->flags & ATA_DFLAG_CDL_ENABLED) && cdl)
			ata_set_tf_cdl(qc, cdl);

		/* Both FUA writes and a CDL index require 48-bit commands */
		if (!(tf->flags & ATA_TFLAG_FUA) &&
		    !(qc->flags & ATA_QCFLAG_HAS_CDL) &&
		    lba_28_ok(block, n_block)) {
			/* use LBA28 */
			tf->device |= (block >> 24) & 0xf;
		} else if (lba_48_ok(block, n_block)) {
@@ -2367,6 +2397,139 @@ static void ata_dev_config_trusted(struct ata_device *dev)
		dev->flags |= ATA_DFLAG_TRUSTED;
}

static void ata_dev_config_cdl(struct ata_device *dev)
{
	struct ata_port *ap = dev->link->ap;
	unsigned int err_mask;
	bool cdl_enabled;
	u64 val;

	if (ata_id_major_version(dev->id) < 12)
		goto not_supported;

	if (!ata_log_supported(dev, ATA_LOG_IDENTIFY_DEVICE) ||
	    !ata_identify_page_supported(dev, ATA_LOG_SUPPORTED_CAPABILITIES) ||
	    !ata_identify_page_supported(dev, ATA_LOG_CURRENT_SETTINGS))
		goto not_supported;

	err_mask = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE,
				     ATA_LOG_SUPPORTED_CAPABILITIES,
				     ap->sector_buf, 1);
	if (err_mask)
		goto not_supported;

	/* Check Command Duration Limit Supported bits */
	val = get_unaligned_le64(&ap->sector_buf[168]);
	if (!(val & BIT_ULL(63)) || !(val & BIT_ULL(0)))
		goto not_supported;

	/* Warn the user if command duration guideline is not supported */
	if (!(val & BIT_ULL(1)))
		ata_dev_warn(dev,
			"Command duration guideline is not supported\n");

	/*
	 * We must have support for the sense data for successful NCQ commands
	 * log indicated by the successful NCQ command sense data supported bit.
	 */
	val = get_unaligned_le64(&ap->sector_buf[8]);
	if (!(val & BIT_ULL(63)) || !(val & BIT_ULL(47))) {
		ata_dev_warn(dev,
			"CDL supported but Successful NCQ Command Sense Data is not supported\n");
		goto not_supported;
	}

	/* Without NCQ autosense, the successful NCQ commands log is useless. */
	if (!ata_id_has_ncq_autosense(dev->id)) {
		ata_dev_warn(dev,
			"CDL supported but NCQ autosense is not supported\n");
		goto not_supported;
	}

	/*
	 * If CDL is marked as enabled, make sure the feature is enabled too.
	 * Conversely, if CDL is disabled, make sure the feature is turned off.
	 */
	err_mask = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE,
				     ATA_LOG_CURRENT_SETTINGS,
				     ap->sector_buf, 1);
	if (err_mask)
		goto not_supported;

	val = get_unaligned_le64(&ap->sector_buf[8]);
	cdl_enabled = val & BIT_ULL(63) && val & BIT_ULL(21);
	if (dev->flags & ATA_DFLAG_CDL_ENABLED) {
		if (!cdl_enabled) {
			/* Enable CDL on the device */
			err_mask = ata_dev_set_feature(dev, SETFEATURES_CDL, 1);
			if (err_mask) {
				ata_dev_err(dev,
					    "Enable CDL feature failed\n");
				goto not_supported;
			}
		}
	} else {
		if (cdl_enabled) {
			/* Disable CDL on the device */
			err_mask = ata_dev_set_feature(dev, SETFEATURES_CDL, 0);
			if (err_mask) {
				ata_dev_err(dev,
					    "Disable CDL feature failed\n");
				goto not_supported;
			}
		}
	}

	/*
	 * While CDL itself has to be enabled using sysfs, CDL requires that
	 * sense data for successful NCQ commands is enabled to work properly.
	 * Just like ata_dev_config_sense_reporting(), enable it unconditionally
	 * if supported.
	 */
	if (!(val & BIT_ULL(63)) || !(val & BIT_ULL(18))) {
		err_mask = ata_dev_set_feature(dev,
					SETFEATURE_SENSE_DATA_SUCC_NCQ, 0x1);
		if (err_mask) {
			ata_dev_warn(dev,
				     "failed to enable Sense Data for successful NCQ commands, Emask 0x%x\n",
				     err_mask);
			goto not_supported;
		}
	}

	/*
	 * Allocate a buffer to handle reading the sense data for successful
	 * NCQ Commands log page for commands using a CDL with one of the limit
	 * policy set to 0xD (successful completion with sense data available
	 * bit set).
	 */
	if (!ap->ncq_sense_buf) {
		ap->ncq_sense_buf = kmalloc(ATA_LOG_SENSE_NCQ_SIZE, GFP_KERNEL);
		if (!ap->ncq_sense_buf)
			goto not_supported;
	}

	/*
	 * Command duration limits is supported: cache the CDL log page 18h
	 * (command duration descriptors).
	 */
	err_mask = ata_read_log_page(dev, ATA_LOG_CDL, 0, ap->sector_buf, 1);
	if (err_mask) {
		ata_dev_warn(dev, "Read Command Duration Limits log failed\n");
		goto not_supported;
	}

	memcpy(dev->cdl, ap->sector_buf, ATA_LOG_CDL_SIZE);
	dev->flags |= ATA_DFLAG_CDL;

	return;

not_supported:
	dev->flags &= ~(ATA_DFLAG_CDL | ATA_DFLAG_CDL_ENABLED);
	kfree(ap->ncq_sense_buf);
	ap->ncq_sense_buf = NULL;
}

static int ata_dev_config_lba(struct ata_device *dev)
{
	const u16 *id = dev->id;
@@ -2534,13 +2697,14 @@ static void ata_dev_print_features(struct ata_device *dev)
		return;

	ata_dev_info(dev,
		     "Features:%s%s%s%s%s%s%s\n",
		     "Features:%s%s%s%s%s%s%s%s\n",
		     dev->flags & ATA_DFLAG_FUA ? " FUA" : "",
		     dev->flags & ATA_DFLAG_TRUSTED ? " Trust" : "",
		     dev->flags & ATA_DFLAG_DA ? " Dev-Attention" : "",
		     dev->flags & ATA_DFLAG_DEVSLP ? " Dev-Sleep" : "",
		     dev->flags & ATA_DFLAG_NCQ_SEND_RECV ? " NCQ-sndrcv" : "",
		     dev->flags & ATA_DFLAG_NCQ_PRIO ? " NCQ-prio" : "",
		     dev->flags & ATA_DFLAG_CDL ? " CDL" : "",
		     dev->cpr_log ? " CPR" : "");
}

@@ -2702,6 +2866,7 @@ int ata_dev_configure(struct ata_device *dev)
		ata_dev_config_zac(dev);
		ata_dev_config_trusted(dev);
		ata_dev_config_cpr(dev);
		ata_dev_config_cdl(dev);
		dev->cdb_len = 32;

		if (print_info)
@@ -4766,6 +4931,36 @@ void ata_qc_complete(struct ata_queued_cmd *qc)
			fill_result_tf(qc);

		trace_ata_qc_complete_done(qc);

		/*
		 * For CDL commands that completed without an error, check if
		 * we have sense data (ATA_SENSE is set). If we do, then the
		 * command may have been aborted by the device due to a limit
		 * timeout using the policy 0xD. For these commands, invoke EH
		 * to get the command sense data.
		 */
		if (qc->result_tf.status & ATA_SENSE &&
		    ((ata_is_ncq(qc->tf.protocol) &&
		      dev->flags & ATA_DFLAG_CDL_ENABLED) ||
		     (!(ata_is_ncq(qc->tf.protocol) &&
			ata_id_sense_reporting_enabled(dev->id))))) {
			/*
			 * Tell SCSI EH to not overwrite scmd->result even if
			 * this command is finished with result SAM_STAT_GOOD.
			 */
			qc->scsicmd->flags |= SCMD_FORCE_EH_SUCCESS;
			qc->flags |= ATA_QCFLAG_EH_SUCCESS_CMD;
			ehi->dev_action[dev->devno] |= ATA_EH_GET_SUCCESS_SENSE;

			/*
			 * set pending so that ata_qc_schedule_eh() does not
			 * trigger fast drain, and freeze the port.
			 */
			ap->pflags |= ATA_PFLAG_EH_PENDING;
			ata_qc_schedule_eh(qc);
			return;
		}

		/* Some commands need post-processing after successful
		 * completion.
		 */
@@ -5398,6 +5593,7 @@ static void ata_host_release(struct kref *kref)

		kfree(ap->pmp_link);
		kfree(ap->slave_link);
		kfree(ap->ncq_sense_buf);
		kfree(ap);
		host->ports[i] = NULL;
	}
Loading